Tertiary Education – A Comparison over Countries

15 01 2012

Not mine this time (the one I wrote in 1997 is still being referenced, but is out of date because the German degree system has changed) but the OECD’s from October 2011, based on 2009 data, which I have just discovered. The Washington Post published in September 2011 a startling graphic, accompanying an article on the report to which was linked in an essay today by Nicolas Kristoff of the NYT. (Kristoff is a member of my college. In his journalistic wanderings around some of the poorest, most disadvantaged parts of the world, he sometimes seems to me like a modern Wilfred Thesiger, a former member.)

I should note, first, in reference to the Washington Post article that the US term “college” refers to all higher-education which leads to a qualification called a degree. This includes “community colleges”, tax-supported institutions which provide the equivalent of the first two years of a four-year university education and which grant degrees called “associate degrees” to successful students, as well as universities, which may be four-year or six-year institutions, as for example the California State University system is, granting Bachelor’s and Master’s degrees, or “research universities” such as the University of California which also grant Ph.D. degrees.

I recall British Prime Minister Blair saying in 1997 (do I?) that the Labor government intended to push degree achievement rates up to 35% of the population, up from the 15-18% or so which it was when I graduated in 1973. I didn’t realise until I looked at the WP graphic, based on the 2009 data, that this had been achieved. I herald it as a major national accomplishment.

(I get the figure of 15-18% as follows. This 2000 report by David Greenaway and Michelle Haynes says that about 400,000 young people were in tertiary education then. If one takes the average lifetime, a little under 80 years, considers that 3 years is a twenty-fifth of that, and that the population of Britain is about 60 million, one would expect 2.4 million people of university-visiting age. 400,000 is thus one in six, about 17%. I should perhaps mention that Laura Spence, who was rejected by “Oxford” but given a scholarship at Harvard, had in fact applied to my college. Not the greatest marketing moment in history).

Similarly, I had, until today, oft quoted the rate of young people in the US entering higher education as a sign of what I thought was desirable, and used the figure of 55% of school leavers. I doubt if this has changed significantly. But I am disturbed to find out that that apparently only 41%, about three-quarters, complete to some sort of degree. Considering that includes associate degrees, which are only two-year courses of study, that does not bode well for the US, if you believe as I do that the more people learning skills in a short time which they otherwise would not have, then the greater the productivity of their society, in the richness of hobbies and other pursuits in life and not just in stuff measured in standard economic measures.

I am intrigued by the Box on p18 of the OECD report entitled “Germany rethinks its assumptions about education and social equity”. Yes, indeed! People here were quite convinced about the “quality” of the education system, despite the obvious inequities and inadequacies apparent to those of us with wider experience, until the PISA reports on comparative achievement in secondary education started appearing from 2000 on, which showed German school achievement in a poor light compared with Germany’s economic peers. Then it couldn’t be ignored any more, and it wasn’t.

PISA was to do with secondary education. I am still somewhat disturbed by the relatively poor showing of Germany in tertiary education, at 26%. Some comments on that, some of which I have made before.

We currently have huge building projects going on around our Bielefeld University campus, which is itself huge (put “Bielefeld University, Universitätsstrasse 25, Bielefeld, Germany” into Google Maps). The main university building, in which almost everything goes on, is some third of a kilometer long, as you can see. Two new campuses are being constructed, one adjacent to the old building on a parking lot just to the north of the main building, between the two branches of what is labelled “Universitätsstrasse, some two hundred meters long and the better part of a hundred meters wide, and one “over the road”, almost a kilometer away, in (Google Maps again) “Lange Lage, Bielefeld, Germany”, which is also large, and will house the University of Applied Sciences (what the Brits used to call a “Polytechnic” and Germans a “Fachhochschule”), a teaching university which does not grant research degrees, and which is now largely scattered in old and often unsuitable buildings around town. This all amounts to a huge public works (which Google Street View does not yet show). And, if the above figure is to be believed, this will only be usable by a quarter of the young adults in the city and surrouding areas.

Do we have a town-and-gown problem? Less so than we did, I think, but more so than we might. The university does some outreach, including a science fair each year called Geniale (some pictures of GENIALE 2011 – the German for “pictures” is “Bilder”), spread over selected spots in the Old Town. But why aren’t most of the young people in this area passing through some part of this enormous spreading campus to take part in something? After all, they and their parents pay the taxes that create all these large buildings and pay their occupants. Future auto mechanics and hairdressers could surely benefit personally from participating in a course on 1960′s popular music, couldn’t they? Germany has no equivalents to Brian Patten, Roger McGough, Adrian Henri or Carol Ann Duffy, but we have plenty of slam poetry (link only in German, unfortunately), indeed a local slam poet who has turned into a valued writer and raconteur, Mischa-Sarim Verollet (also only German). Here is the announcement for the next one in April 2012.

Such educational offerings are available through the Volkshochschule Bielefeld, the Community Further Education Center, but this is largely less formal – courses are not assessed, the qualifications of course-offerers fulfil no standards (either experiential or formal), one doesn’t obtain a transcript of courses completed, and, importantly, it does not constitute the kind of accomplishment which a prospective employer expects to see on an applicant’s résumé. I am thinking that all these things should happen. I am also thinking about the impoverished financing of the Volkshochschule compared with the heroic building works around the university campus.

I cannot see that expensive tertiary education can thrive unless it includes way more than the elite. We are well past the days when people said “well, that’s for them rich and clever kids” and turned their backs. Nowadays, people say “I pay taxes too; why can’t I come in here?” and I think that question is very well founded. Especially when the expenditure is so massively visible, as it is in Bielefeld.

German university education has changed, though, massively in the last decade. The previous system has been more or less junked, and every university now offers Bachelor’s and Master’s degrees, instead of the old Vordiplom/Diplom, which were not recognised outside Germany for what they were (a Vordiplom was like a US associate degree, and a Diplom like a Master’s, but with nothing in between). It is astonishing how everyone just threw the old tradition away in the early 2000′s and went with what, for most here, was a completely foreign system with which they had little or no experience. I did find out why from a colleague in Sociology, though. They had over a 90% drop-out rate in their Diplom course. And this in one of the most well-reputed Sociology faculties in the country that invented it.

I think student contact with the rest of Europe was also slowly bringing a new perspective. German university students were finding themselves relatively immobile compared with their peers in other European countries, because the organisation of their degrees did not easily translate. For example, in the late 1990′s, students studying for degrees in my faculty returning from studying abroad for a year in the ERASMUS program still had to take an oral degree examination in the studies they had completed abroad to have it count for our degree, even though they had already been assessed by the foreign institution for that work and the EU ERASMUS agreement requires that we honor that assessment. To those who came to me, I asked for the transcript, or equivalent document showing successful completion, asked them to tell me about what interested them in the work, and passed them. In other words, the exam was purely formal, and the result identical to what they had already achieved. That is the best way I could see to fulfil the EU requirement, which our internal faculty procedures at that time still contradicted.

Besides that, successful graduates (the Sociologists’ 10%; our proportion in Informatics was much, much higher!) were leaving tertiary education with a degree equivalent to a Master’s at the age of 26-28 (and some even older), whereas their British and US peers were obtained such qualifications at the ages of 22-24. People on the ERASMUS exchange were noticing they were somewhat older than their local peers, and those starting Ph.D. programs in other countries noticed it even more.

Now, we have Bachelor’s and Master’s degrees, credit points for each course, and credit points are transferable between all European tertiary-educational establishments.

I cannot necessarily say that the quality of education has improved, however. With the more extensive evaluation requirements (per course, now), much of this is being farmed out to tutors and other helpers, and the quality of that education and assessment does not seem to be monitored as I feel it should be. I monitor the courses in my group, which are all based on lab work, or seminars which consist largely of student contributions with commentary from the lecturer, and my group has considerable continuity in our student tutors, who were picked for (or, better said, who picked themselves by) their enthusiasm and capabilities. But some of our larger courses appear to have problems (one of my bright people, who has coauthored an important chapter in our system safety text, is on his third attempt at one of the required practical courses, for what appear to me to be spurious reasons).

The throughput has, however, improved. One reason in the past was the introduction of modest fees, some few hundred euros per semester. Suddenly, all our 6-year and 7-year students (of which we had plenty) wanted to finish – and most did. And the fee money was directly given to the Faculty, in which a largely student committee, which did include the Dean, decided what to do with the money to finance improved teaching. More tutors for some courses. Lab equipment – my lab was built with this money. The faculty also hired a highly motivated and very successful lecturer whose courses are loved by students and who does lots of lab work, indeed he uses the lab which we built.

The other reason is that students in our Bachelor’s and Master’s programs are spending much of the day in courses, and most of the rest of their time doing the homework. Their time is filled with study-related work. This is very different from ten years ago. But I think it is a benefit, more on a par with what their peers do in other countries with a higher percentage of college graduates.



Michael

10 01 2012

Michael. Everyone knew him as Michael.

I was a freshman at Oxford in mathematics, interested in logic. I had been reading Chomsky in my first quarter because I had been told Chomsky had mathematised language. My tutor in algebra, Ian Macdonald (same jacket as in the picture!), an algebraic geometer, suggested I could look at a logic textbook he recommended (which I read with some difficulty over the Christmas break). Derek Goldrei, a graduate student tutoring in logic at my college Magdalen, suggested I listen to Michael’s lectures in set theory.

Michael didn’t lecture. Michael thought out loud. He distributed notes telling his listeners what he was going to be thinking about during that appointment. I learnt, by watching and listening, how to think. About set theory. About inference rules. About non-classical logic (Michael was drawn to intuitionist thinking about mathematics, because he thought it was right to base your assertions on the concrete evidence you had).

I had been attending freshman mathematics lectures, which went “Theorem” “Proof” “Let x be…” and had despaired of ever being the kind of person who thought like that. Then I attended Michael’s thinking-out-loud sessions and understood what really went on in people’s minds; how the symbols were shorthand for notating thoughts. And, in my second year, I could do it! Just like Michael! Actually, not just like Michael. Not anywhere near “just like Michael”. For, as John Mackie is reported to have said in The Times’s obituary, Michael was a genius. Michael was ineffable.

Michael was different. A mass of wavy white hair, he would array himself longitudinally on a bench in the lecture hall and clean his cigarette holder while leaning on an elbow, with his head just above the seat backs, and crack jokes about his friends and colleagues while waiting for the lecture to begin. At which point the jokes would reduce in number as he concentrated on what was being said. If there is anything any undergraduate wished to be in the course of study he had in large part created, Maths and Philosophy, it was to be “just like Michael”.

Simply put, Michael taught me how to think, in logic; by extrapolation, in mathematics. About the deep philosophical questions concerning truth, mathematics, the use of language. Differently put, I learned how to think by watching and listening to him.

When I graduated in 1973, I attended a ceremony in the Sheldonian Theatre, in Latin, much foreshortened from the original, during which my degree was conferred. A ceremony designed over centuries to give its recipients the indelible impression: you have done it! I had done it! I felt it and they’d said it in Latin! After the ceremony, I went straight across the road in my academic dress to purchase a copy of Michael’s new book, on Frege’s philosophy of language. Michael had shown how to think about these matters in pellucid English prose.

I went right afterwards to the other side of the Northern Hemisphere, to Berkeley in California. Michael had helped me get there, for he had written me a recommendation for graduate school. I have no idea what he said, but I it can’t have been all disastrous. (I can imagine: “Ladkin is mortal and does OK for one. But I’m afraid I don’t really know much about mortals.”)

I was required at the end of my first year by Bill Craig, my advisor in Berkeley, he of Craig’s Interpolation Theorem, to take the qualifying exam in philosophy. I protested and threw tantrums and all that, but you know you can’t really rebel. Bill said “you will do it” so I did it. I read Michael’s book, and its seemingly impenetrable prose. And I read it again. And understood more. And again. And more. And again. It wasn’t that Michael’s prose was impenetrable. Michael wrote exactly what he was thinking and his thinking was non-trivial and exact. It took me a while to absorb his train of thought. His prose was, indeed, pellucid. When I had done so, I went into the exam room (actually the philosophy library) for six hours and wrote exactly what I thought about the matters about which I had learned from reading Michael’s book so carefully. Non-trivially and exactly. I think Ernie Adams graded the exam. I passed. Turns out I was the first student in the history of Tarski’s program to pass the philosophy exam in my first year. Thank you, Michael!

(You have to understand – I was rotten at written exams. I got so nervous I couldn’t even read the questions straight. It’s a miracle I ever got into and out of Oxford, at which assessment is based on a student’s brilliance at written exams.)

I saw Michael in Berkeley once. He gave an evening lecture which I attended. I did get to exchange a brief word, amongst all the others earnest to talk with him.

I saw him again in 2009, at the 40th anniversary reunion of Maths and Philosophy graduates in Oxford, of the course which he had done so much to establish, and to which I owe my subsequent career. Derek Goldrei was the First Graduate (he switched in his final year; graduating in 1969 when the course was established). I in 1973. I was one of only two or three from that era at the reunion and felt quite The Establishment. Michael was there, and Dana Scott. Michael was old and frail. Gave an endearing and well-constructed speech. When I approached him after the dinner, he didn’t remember who I was, but then so many had passed through the gate since I had. I simply thanked him. He accepted graciously.

Michael is gone, on 27 December 2011. For me, he was Philosophy. When he was with us, Philosophy was alive. Now he is gone, Philosophy is gone. Maybe not, but it sure feels like it. It turns out I seem to have assumed he was immortal. Apparently not. It is -let me say- hard for me to adjust.

Here is The Guardian’s take. The Times has a fine obituary, forwarded to me by Chris Miller, but it lies behind a paywall, just as now Michael does, though with a currency which I only wish I had. As an atheist without this currency, I can only say: God be with you, as you wished.

Some Coincidences.

Racism. Two of the killers of Stephen Lawrence were convicted in early January 2012. Here is a poem about it by Poet Laureate Carol Ann Duffy. Michael and his wife Ann devoted themselves to race relations in 1960′s and early 1970′s Britain, efforts well documented in the obituaries. He only returned to philosophical work after he felt the efforts to turn Britain away from racist habits had failed. But they haven’t failed, Michael, and neither had you.

Brains. Apparently some people claim now that our brains start to go downhill at age 45 It is not clear this is news: The Guardian had something about it 12 years ago. Michael published his first book at 48, and there followed many more, all of them worth reading very carefully indeed.

Note Added 11.01.2012

It’s not just philosophy. Thinking it over, there are three fundamental developments in technical elementary logic which I have kept coming back to throughout my career. Things which are simple, clear, brilliant, which increase one’s understanding almost instantly, and continually prove to be useful. One is Dana Scott’s Consequence Relations, a formulation of logics which, to me, turns out to be the most efficient way to perform formal deductions, the raw material of logic. I keep meaning to translate into LaTeX the mimeographed notes which Dana handed out almost 40 years ago now. Another is Saul Kripke’s possible-worlds semantics for normal modal logics, and his similar epistemic-worlds semantics for logics of belief and evidence, such as inference in intuitionistic mathematics, and the inferences of Pen Maddy’s “Second Philosopher”. I learnt these partly from Michael. The third is Michael’s and John Lemmon’s formal correspondence between the modal logics from S4 to S5 and the propositional logics between intuitionist and classical.

Second Note Added 11.01.2012

Timothy Williamson, Michael’s successor in the Wykeham chair of Logic (David Wiggins came between Michael and Tim), pointed me to a series of tributes in the New York Times Opinionator blog last week.



The Accident to Qantas Flight 72, VH-QPA, in October 2008

21 12 2011

The Airbus A330-303 VH-QPA experienced uncommanded nose-down pitch commands while in cruise at FL370. Lots of unsecured people were thrown to the ceiling, and some were injured severely. The aircraft declared an emergency and landed as soon as practicable, at Learmonth, where the injured were treated and several hospitalised. It has been known for a while that the accident was caused by data anomalies from a air data computer (ADIRU) which were not filtered out by the primary flight control computers (FLight Control Primary Computers, FCPC, also known as PRIM). However, it has been a mystery – and remains so – how the anomalous data values were generated. It has happened three times: twice with the unit on VH-QPA, and once on another unit on another aircraft, also Qantas, also in Western Australia, within a couple of months of this incident.

The fix is apparently to modify the BITE test of the ADIRU specifically to look for such anomalies, and to modify the data-filtering algorithms of the Flight Control Primary Computers (FCPC, also known as PRIM) of the A330.

The Final Report is now available on the ATSB WWW site.

There was a note from Andrew Heasley in Risks 26-67 with a title saying the accident was “Blamed on Software“, pointing to a newspaper article. I find this claim misleading. The problem which arose had nothing to do with anything for which any software engineer would have been responsible.

The fixes were implemented in both SW and HW, but fixes to non-SW problems are very often implemented in SW.

The PRIMs ran a data-assurance algorithm for data received from three different ADIRUs, which are electronic boxes built by a different manufacturer. This data assurance algorithm had a specific vulnerability to spiky angle-of-attack (AoA) data presented in a particular time-sequential manner, which was exploited during the occurrence. The algorithm, which uses AoA data from three ADIRUs, filters out multiple data spikes from a unit which occur within a specific time frame. Spikes on the culprit ADIRU occurred with similar values just over the boundary of this time frame, and were thus taken as veridical by the PRIMs. The resolution algorithms for the AoA data (with that from the other ADIRU units) in the PRIMs let these values through, and the PRIMs reacted accordingly by commanding sudden nose-down pitch.

Responsibility for the design of such algorithms lies clearly with those who are experts on the engineering of electronic data generation and transmission equipment, not on any software engineers.

To give a similar example with which I been recently involved, it turns out that signals of certain frequencies in AC electric circuits can bypass the Type A and Type B circuit protection equipment (circuit breakers) that are required in most electric circuits (household and industrial) in Germany. A committee on which I sit has recently considered attaching equipment which is, as far as we know, theoretically capable of generating such frequencies to such circuits. A similar situation, how to handle anomalous signals, but no SW in sight. Pure electrical engineering.

Concerning my earlier note here on Certification Requirements for Commercial Airplanes, I find it interesting and commendable that the Bureau considered likelihoods of events in their summary (quoted below). However, I don’t believe they formulated it in quite the words I would have liked to have read.

They give reason to classify the event as “hazardous”, and with a fleet operating experience of 28 million flight hours this occurrence fits within the expected value (a technical term) of the operating time within which the effects of a hazardous event may occur (defined to be less than or equal to one occurrence within ten million operating hours), according to the acceptable means to determine compliance with certification criteria (now known as AMC 25). Notice it is not the event itself of which they assess the occurrence – that has occurred three times – but the deleterious effects upon safety of the event, which have only occurred once.

They speak of “certification requirements“. Strictly speaking, this is incorrect. The certification requirements are expressed in CS 25 and do not involve probabilities. The severity classification terms “catastrophic”, “”hazardous” etc and their associated acceptable/unacceptable frequencies occur in risk-matrix-type form in the Acceptable Means of Compliance document which accompanies the certification requirements (AMC 25), not the requirements themselves. (I note that these documents were called something slightly different at A330 certification time, 1993).

The certification requirements themselves are quite clear: the airplane shall behave in such-and-such a manner. If a wing falls off, or a flight control computer sends it into a loop, it is obviously not behaving in that manner; thus violating certification requirements. However, it is accepted that one cannot provide proof that such untoward things will never ever happen (will the sun rise tomorrow? Will your steering wheel come off in your hands? WIll your control sidestick come out of its holder in your hand?), so a less strenuous regime based on arguing likelihoods is defined as an “Acceptable Means of Compliance” with the regulations for purpose of certification.

This is not hair-splitting. It has consequences, in particular in this case, for how anomalies are dealt with, as follows.

If the requirement were that, say, “hazardous effects shall only occur on average once in between 10^7 and 10^9 operating hours“, which is what the AMC says you have to show to demonstrate compliance acceptably, then it would have been open to the manufacturer to do nothing in reaction to the QF72 event: the hazardous effects occurred only within the expected time value of their occurrence. If you think about it, it would also be open to a manufacturer to do nothing until the second occurrence of any hazardous or indeed catastrophic effects, even if the problem occurred first within the early experience of flying the aircraft! This is simply a consequence of the meaning of the probabilistic concepts used.

Whereas, as things now stand, separating requirements, which are absolute, from acceptable compliance (which may be based on occurrence frequency) any in-flight anomalous behavior must be fixed or the airworthiness certificate will be withdrawn. This is because such behavior violates the written requirements, that the aircraft shall not behave that way. To repeat, the conditions on behavior are absolute, not likelihood-based.

And that is how one wants things: The requirements are absolute, but it is accepted that in science and engineering you are often only convinced to some degree, so it is regarded as acceptable to argue your conviction up to a certain degree, and not to have to prove it, which would likely be impossible. But if something does go wrong, you want it fixed right away.

One can argue that any given set of occurrences is compatible with any probability requirement whatever, and thus that probabilistic requirements are inappropriate to determine airworthiness in any case. However, I don’t think such an argument works. Say these three events had occurred within 3 million operating hours, each with damage. One could estimate the likelihood that an piece of equipment fulfilling the condition of an expected value of at most once in 10 million operating hours to exhibit three events within 3 million operating hours. One would conclude that it is unlikely, say with small probability P. It follows that the situation that the aircraft fulfills the acceptable-compliance criterion has the same probability P. The small probability P that the aircraft acceptably complied with certification requirements would provide good reason for withdrawing the airworthiness certificate.

Concerning the data anomaly itself stemming from the ADIRU, its cause remains a mystery. The report says:


Some of the potential triggering events examined by the investigation included a software ‘bug’, software corruption, a hardware fault, physical environment factors (such as temperature or vibration), and electromagnetic interference (EMI) from other aircraft systems, other on-board sources, or external sources (such as a naval communication station located near Learmonth). Each of these possibilities was found to be unlikely based on multiple sources of evidence. The other potential triggering event was a single event effect (SEE) resulting from a high-energy atmospheric particle striking one of the integrated circuits within the CPU module. There was insufficient evidence available to determine if an SEE was involved, but the investigation identified SEE as an ongoing risk for airborne equipment.

The report says that the manufacturer is developing a modification to the BITE to detect such failure modes:


Without knowing the exact failure mechanism, there was limited potential for the ADIRU manufacturer to redesign units to prevent the failure mode. However, it will develop a modification to the BITE to improve the probability of detecting the failure mode if it occurs on another unit.

Here is the executive summary. It is well and concisely written. I include the three paragraphs about seat belts and the investigative process for completeness.

Executive Summary

At 0132 Universal Time Coordinated (0932 local time) on 7 October 2008, an Airbus A330-303 aircraft, registered VH-QPA and operated as Qantas flight 72, departed Singapore on a scheduled passenger transport service to Perth, Western Australia. At 0440:26, while the aircraft was in cruise at 37,000 ft, ADIRU 1 started providing intermittent, incorrect values (spikes) on all flight parameters to other aircraft systems. Soon after, the autopilot disconnected and the crew started receiving numerous warning and caution messages (most of them spurious). The other two ADIRUs performed normally during the flight.

At 0442:27, the aircraft suddenly pitched nose down. The FCPCs commanded the pitch-down in response to AOA data spikes from ADIRU 1. Although the pitch-down command lasted less than 2 seconds, the resulting forces were sufficient for almost all the unrestrained occupants to be thrown to the aircraft’s ceiling. At least 110 of the 303 passengers and nine of the 12 crew members were injured; 12 of the occupants were seriously injured and another 39 received hospital medical treatment. The FCPCs commanded a second, less severe pitch-down at 0445:08.
The flight crew’s responses to the emergency were timely and appropriate. Due to the serious injuries and their assessment that there was potential for further pitch-downs, the crew diverted the flight to Learmonth, Western Australia and declared a MAYDAY to air traffic control. The aircraft landed as soon as operationally practicable at 0532, and medical assistance was provided to the injured occupants soon after.

FCPC design limitation

AOA is a critically important flight parameter, and full-authority flight control systems such as those equipping A330/A340 aircraft require accurate AOA data to function properly. The aircraft was fitted with three ADIRUs to provide redundancy and enable fault tolerance, and the FCPCs used the three independent AOA values to check their consistency. In the usual case, when all three AOA values were valid and consistent, the average value of AOA 1 and AOA 2 was used by the FCPCs for their computations. If either AOA 1 or AOA 2 significantly deviated from the other two values, the FCPCs used a memorised value for 1.2 seconds. The FCPC algorithm was very effective, but it could not correctly manage a scenario where there were multiple spikes in either AOA 1 or AOA 2 that were 1.2 seconds apart.

Although there were many injuries on the 7 October 2008 flight, it is very unlikely that the FCPC design limitation could have been associated with a more adverse outcome. Accordingly, the occurrence fitted the classification of a ‘hazardous’ effect rather than a ‘catastrophic’ effect as described by the relevant certification requirements. As the occurrence was the only known case of the design limitation affecting an aircraft’s flightpath in over 28 million flight hours on A330/A340 aircraft, the limitation was within the acceptable probability range defined in the certification requirements for a hazardous effect.

As with other safety-critical systems, the development of the A330/A340 flight control system during 1991 and 1992 had many elements to minimise the risk of a design error. These included peer reviews, a system safety assessment (SSA), and testing and simulations to verify and validate the system requirements. None of these activities identified the design limitation in the FCPC’s AOA algorithm.

The ADIRU failure mode had not been previously encountered, or identified by the ADIRU manufacturer in its safety analysis activities. Overall, the design, verification and validation processes used by the aircraft manufacturer did not fully consider the potential effects of frequent spikes in data from an ADIRU.

ADIRU data-spike failure mode

The data-spike failure mode on the LTN-101 model ADIRU involved intermittent spikes (incorrect values) on air data parameters such as airspeed and AOA being sent to other systems as valid data without a relevant fault message being displayed to the crew. The inertial reference parameters (such as pitch attitude) contained more systematic errors as well as data spikes, and the ADIRU generated a fault message and flagged the output data as invalid. Once the failure mode started, the ADIRU’s abnormal behaviour continued until the unit was shut down. After its power was cycled (turned OFF and ON), the unit performed normally.

There were three known occurrences of the data-spike failure mode. In addition to the 7 October 2008 occurrence, there was an occurrence on 12 September 2006 involving the same ADIRU (serial number 4167) and the same aircraft. The other occurrence on 27 December 2008 involved another of the same operator’s A330 aircraft (VH-QPG) but a different ADIRU (serial number 4122). However, no factors related to the operator’s aircraft configuration, operating practices or maintenance practices were found to be associated with the failure mode.

Many of the data spikes were generated when the ADIRU’s central processor unit (CPU) module intermittently combined the data value from one parameter with the label for another parameter. The exact mechanism that produced this problem could not be determined. However, the failure mode was probably initiated by a single, rare type of trigger event combined with a marginal susceptibility to that type of event within the CPU module’s hardware. The key components of the two affected units were very similar, and overall it was considered likely that only a small number of units exhibited a similar susceptibility.

Some of the potential triggering events examined by the investigation included a software ‘bug’, software corruption, a hardware fault, physical environment factors (such as temperature or vibration), and electromagnetic interference (EMI) from other aircraft systems, other on-board sources, or external sources (such as a naval communication station located near Learmonth). Each of these possibilities was found to be unlikely based on multiple sources of evidence. The other potential triggering event was a single event effect (SEE) resulting from a high-energy atmospheric particle striking one of the integrated circuits within the CPU module. There was insufficient evidence available to determine if an SEE was involved, but the investigation identified SEE as an ongoing risk for airborne equipment.

The LTN-101 had built-in test equipment (BITE) to detect almost all potential problems that could occur with the ADIRU, including potential failure modes identified by the aircraft manufacturer. However, none of the BITE tests were designed to detect the type of problem that occurred with the air data parameters.

The failure mode has only been observed three times in over 128 million hours of unit operation, and the unit met the aircraft manufacturer’s specifications for reliability and undetected failure rates. Without knowing the exact failure mechanism, there was limited potential for the ADIRU manufacturer to redesign units to prevent the failure mode. However, it will develop a modification to the BITE to improve the probability of detecting the failure mode if it occurs on another unit.

Use of seat belts

At least 60 of the aircraft’s passengers were seated without their seat belts fastened at the time of the first pitch-down. Consistent with previous in-flight upset accidents, the injury rate, and injury severity, was substantially greater for those who were not seated or seated without their seat belts fastened.

Passengers are routinely reminded every flight to keep their seat belts fastened during flight whenever they are seated, but it appears some passengers routinely do not follow this advice. This investigation provided some insights into the types of passengers who may be more likely not to wear seat belts, but it also identified that there has been very little research conducted into this topic by the aviation industry.

Investigation process

The Australian Transport Safety Bureau investigation covered a range of complex issues, including some that had rarely been considered in depth by previous aviation investigations. To do this, the investigation required the expertise and cooperation of several external organisations, including the French Bureau d’Enquêtes et d’Analyses pour la sécurité de l’aviation civile, US National Transportation Safety Board, the aircraft and FCPC manufacturer (Airbus), the ADIRU manufacturer (Northrop Grumman Corporation), and the operator.



Dealing With Nuclear Waste

2 12 2011

The Independent reports today on a written statement by UK Energy Minister Hendry to Parliament on what the Government is deciding to do with its radioactive waste from nuclear power generation.

The British government has decided for a project to convert plutonium waste into MOX fuel, maybe for “a new generation of nuclear power plants“.

The decision, which ends decades of uncertainty on how to deal with a growing stockpile of more than 112 tonnes of plutonium waste, was presented as a written Parliamentary statement by the energy minister, Charles Hendry.

Indeed for half a century Britain, like many other countries with nuclear power plants, has not known what to do with nuclear power’s most toxic waste product.

Nuclear power relies on highly radioactive “fuel”, formed usually in the shape of rods, which engage in a chain reaction in the core of a nuclear reactor and produce heat. The chain reaction converts substances eventually into other substances which are no longer suitable for purpose; the fuel is “spent” and must be replaced. But the “spent fuel” remains highly radioactive. It is very toxic, must be carefully shielded from the environment and people, and this must go on with current spent fuel for (the most optimistic minimum estimate) 10,000 years (the level at which radioactivity has reduced to that of the originally-mined uranium and the original basis for US standards).

What do you do with it? Where do you put it?

It is not clear that anyone has come close to solving this problem. Nuclear power has been around for half a century, this waste has been accumulating, and the nation with the most plants, the US, has no solution. There are and have been many proposals, but so far none has turned out to be workable. Most of the spent fuel is still stored on-site in pools filled with water (water is pretty good at stopping the neutrons which are the main product of radioactivity in nuclear fuel rods. You only need a few meters of it to trap all but a few which get lost in the background). No one thinks that is a solution for more than a few decades, let alone a minimum of 10,000 years. There is a movement to store as much as possible in so-called “dry casks”: sealed physical containment vessels which are self-cooling after the spent fuel has been sitting around for some number of years. But you still have to put the casks somewhere where they will be safe for a minumum of 10,000 years. Yucca Mountain in Nevada was for many years the preferred prospective location. One wonders, however, about the stability of any structure in a seismically active area of recent volcanism. Eight volcanoes have erupted within 50km of the site in the last million years (op. cit.), but maybe it’s OK for 10,000 years? That is the main point: nobody really knows. No one with a decent set of choices could reasonably choose a place in a seismically and volcanically active area. That says, correctly in my view, that there is no decent set of choices. That is the way it has been for half a century.

It is a problem in Germany also. Germany processes spent fuel in France (and soon in GB) and transports the processed product in dry casks (called “Castor”) by rail back into Germany. The transport has been regularly plagued by protests which block the rail lines, and a transport typically takes days to weeks. Protesters used to aim for Germany’s withdrawal from nuclear power. Now that the German Government has committed to that, what is the latest protest (ongoing at time of writing) about? The protesters are apparently not content with the “temporary” storage site at Gorleben in Lower Saxony (it is in an underground salt deposit, which they claim with some reason is geologically unstable over the long term) and apparently want it to be stored at a reactor site at Philippsburg, near Karlsruhe. That is unlikely to be long term (in the sense of 10,000 years) either, since most authorities judge that any long-term site must be underground, in geologically stable ground. The storage issue has not been solved in Germany, either.

What about Britain? The Independent speaks of

……..decades of uncertainty on how to deal with a growing stockpile of more than 112 tonnes of plutonium waste, was presented as a written Parliamentary statement by the energy minister, Charles Hendry.
Plutonium waste has been a headache for successive governments because it is a highly dangerous radioactive material that can be converted into weapons-grade material, making it a security risk. It’s also expensive to store.

So Britain doesn’t have a long-term solution either. Who does? (Maybe France or Japan?) What to do with the waste is a major unsolved issue with nuclear power.

According to the Independent, the “uncertainty” has gone. It’s going to be converted into “mixed oxide” (MOX) fuel. Fuel? Yes, for reactors which have not yet been built. So you solve the waste problem by building new reactors – which, um, then don’t create waste? Of course they do. You are thus using the present waste in a process which will ultimately generate even more waste, as well of course as some electricity. So, problem solved? Obviously not.

Suppose one just wants to store MOX fuel, not use it. Is it, say, less toxic than spent fuel? No. Can be stored more easily? Not as far as I know. Can be used somehow? Yes, in those new nuclear power plants; we’ve just been that route.

Does this solve the nuclear-waste-product problem in any reasonable way? No. Since the UK government is full of clever people who can think at least this far, it could be that there is another explanation for this decision.

One thought. Somebody will be paid £3bn pounds for doing it, if it happens. Money goes somewhere, and I imagine the prospective recipients might be rather keen on their share. The new waste generated by the new reactors that use the MOX fuel that came from the old waste is, well, a problem for someone who comes along later. Science will solve everything, won’t it?

But it’s not going to be clear sailing. The Independent continues:

But although Mr Hendry made it clear that the Government sees the “Mox option” as a priority, it is not certain that a new £3bn plant to convert the plutonium into Mox fuel will ever be built.

Mindful of the financial and technological disaster of the current Mox fuel plant at Sellafield in Cumbria, which has cost £1.34bn and produced a tiny fraction of the fuel it was scheduled to make, Mr Hendry said that a clear case has still to be made for a second Mox plant at Sellafield.

Oh. So the first, smaller attempt to do this kind of thing failed?

Well, let me qualify that. £1.34bn went somewhere, somebody got it for doing something, so that all went OK. But it apparently didn’t go into the ostensible goal of processing X amount of plutonium into MOX.

And on the basis of that experience apparently the best option is to try again, more and bigger?

I am sure the mistakes made in building the first reprocessing plant will all have been cataloged. I am also sure that attempts will be made assiduously to avoid them when building the second, bigger plant. I have also studied troubled large projects, indeed giving evidence before a UK Parliamentary committee on one. Many big projects fail to deliver on the goals at the time of commencement. Indeed, it’s a first for me to see someone suggest a larger second project on the back of a failed, smaller first one. Surely it should be received wisdom by now that any serious, careful estimate of the cost of such a second, bigger plant be accompanied with an equally serious, careful estimate of the likelihood of success or failure?

Given that this plan for apparently “dealing with” nuclear waste leaves all the questions open about how one ultimately deals with the waste, could something else be going on? What could it be?

First, contractors earn money for building the plant, whether it works or not, so they would be happy. Second, a current government can be seen to be “doing something” about the problem, no matter how superficial. Third, by processing and reusing fuel, the issue of what finally to do about the nuclear waste is put off into the future. (That strategy has clearly worked for governments in the past!)

Let us, though, be clear what the situation is. There is a real scientific and social problem of what on earth one can do with the highly toxic waste products of fission reactors. One cannot expect the current UK government, indeed any government at all, to implement a true solution when none is known yet to exist.

So maybe the Independent is being inappropriately forthright when it claims that uncertainty is at an end. Here is what Energy Minister Hendry actually wrote, as reported by the Independent:

“Only when the Government is confident that its preferred option could be implemented safely and securely, that is affordable, deliverable, and offers value for money, will it be in a position to proceed with a new Mox plant,” Mr Hendry said. In its response to a public consultation on Britain’s plutonium problem, the Government has not rejected other options. One is to convert the 112 tonnes of plutonium dioxide powder stored at Sellafield into glass or concrete blocks that could be buried permanently in a deep waste repository. Another is to use the plutonium directly as fuel for fast reactors, if these can be developed commercially in the coming decade.

“While converting the plutonium into Mox is the most credible and technologically mature option, the Government remains open to any alternative proposals for plutonium management that offer better value to the taxpayer, and will seek to gather more details on all options,” Mr Hendry said.

That seems less than certain to me. According to this, the UK government has set priorities on the “viable” options. It has not actually decided to do anything.

So am I (and the Independent) making a lot of fuss about not very much? Here’s a thought. We all agree that something does indeed need to be done about nuclear waste. Suppose somebody “does something”, what is it going to be? Well, it’s going to be starting to implement this “plan”, since, as the priority option, it is obviously the thing to pick if anything is to be done.

But options remain open. In case a detractor says “why on earth are you doing this? It makes no sense“, the Energy Minister can reply “only when we are confident, etc, etc, the Government remains open to any alternative proposals, etc.

And when a sufficient amount of money has been spent, someone can say “oh look, we’ve got half a MOX plant! Well, better get on and finish it, then! Don’t like to waste money…..

Maybe it’s just the time of year. I haven’t hung my Christmas lights either. Or maybe the UK government has been reading its seasonal literature and the nuclear contractors hired a lobbyist name of Bob Cratchit.



Assurance of Cyber-Physical Systems

17 11 2011

I attended Seminar 11441 on Science and Engineering of Cyber-Physical Systems at the Leibniz Centre for Informatics at Schloss Dagstuhl in the Saarland on 1-4 November, 2011. It was organised by Holger Giese, Bernhard Rumpe, Bernhard Schätz and Janos Sztipanovits. There is huge interest in cyber-physical systems in the US at the moment, backed by plenty of research resources, and in Germany also, although on a lesser scale, somewhat more industrially-oriented and mostly concentrated in the South, it appears.

I attached myself to the subgroup concerned with the assurance and certification of such systems.

We all seemed to have a whale of a time figuring out what a cyber-physical system (CPS) is. Tom Maibaum and others wondered how they might differ from embedded systems. People said, well, it is important that there are lots of subsystems interacting more loosely than with a hierarchically-developed complex embedded system. So John Fitzgerald wondered whether they were mostly systems of systems. (Actually, the “so” is causally misplaced. John, being an “F”, had his one-minute say before Tom, being an “M”). Social systems of mostly artificial agents, of which many examples were given, seemed to fit the “cyber-physical” conception, so CPS includes at least those. Platooning road and rail vehicles, swarms of robotic aircraft or ground robots, coordinated flying or other motion, coordinated searching tasks, and so on. There are enough examples to point and say “that’s what we mean!”.

I also learnt, once again (strange how short one’s memory can be!) to avoid uttering the phrase “emergent behavior”, at the risk of inciting a riot, or at least the closest one can come to a riot at a Dagstuhl seminar.

So what about assurance of such systems? Sadly, as I was on my way back, having had a beautiful bike ride back over the Hunsrück to Trier and caught the train, there occurred a horrendous road accident in Britain on the M5. You can read commentary about it on the York safety-critical systems mailing list. Go to The 2011 collection, sort by date, read the contributions on Sunday 6 November through Tuesday 8 November including “M5 Road Accident” in the title, or go to Paul Cleary’s initiating query and follow the thread(s) through (there are two slightly different titles, but the thread-following links persist through). I also had some private correspondence with Gérard Le Lann, who now works on road-vehicle platooning algorithms and associated questions.

As a result of the Dagstuhl discussions, and the e-mail discussions of the accident, I was able more concretely to formulate what I think is a new assurance problem which arises with (this conception of) cyber-physical systems. It is a little too long for a blog post, so I wrote it in a note called The Assurance of Cyber-Physical Systems: Auffahr Accidents and Rational Cognitive Model Checking and put it on the RVS WWW site Publications page.



The Definition of Risk – Yet Again

16 11 2011

In a message to the York Safety-Critical Systems Mailing List, Tracy White recounted a discussion with someone from the field of “Risk Management” who was taking a course he was giving on system safety. There is apparently a series of international standards, designated ISO 31000, on “Risk Management” (so says Wikipedia ). Tracy says

The term ‘risk’ in 31000 is described as the ‘effect of uncertainty on objectives’ where one of the ‘effects’ can be ‘a deviation from the expected’ (4360 describes it more succinctly as: ‘a chance of something happening’). These ‘risk’ definitions differ markedly from…..

…the standard definition which has been around for 300 years and 10 months: Abraham de Moivre, De Mensura Sortis, or On the Measurement of Chance, Phil. Trans. Roy. Soc No. 329, January, February, March 1711, reprinted with a commentary by O. Hald in International Statistical Review 52(3):229-262, 1984, which may be retrieved from JSTOR. The definition given there is, in modern terms, that risk is the expected value of loss. “Expected value” is a technical term from probability. I give the word-for-word de Moivre definition below.

This definition is also that used for “risk” in finance. See Peter L. Bernstein, Against the Gods: The Remarkable Story of Risk, John Wiley & Sons, 1996/1998. Which book, as the publisher proudly proclaims on the cover, was a “Business Week, New York Times Business, and USA Today Bestseller” and includes praise from reviews by Galbraith, Heilbronner, the NYT, the WSJ and The Economist on its cover. (Indeed, Bernstein is where I got my original lead to Le Moivre).

The meaning of the term in system safety is always close to that of de Moivre, but usually avoids the explicit arithmetic of finance, expected value of loss, by saying “combination of” likelihood and severity. There are good reasons for being somewhat vague, namely that in many cases in system safety the numbers are not there to enable a calculation of expected value. Especially, for example, in a completely new type of system. (An example I am currently working on is the recharging systems for electric road vehicles. There aren’t many around, so in particular there are no reliable numbers on frequencies of untoward things happening.) In response to this common situation, engineers have developed “qualitative” and “semi-quantitative” methods for assessing risk.

One of the issues then becomes what you take the word to mean in technical contexts. Any definition which is not equivalent to the expected value of loss defines a different concept from that, but the same word, “risk”, is used. For good reason: most definitions are conceptually related and the main issue is to get “close” while not having all the numbers.

So what do you do when some branch of human activity, indeed apparently some standard, takes the same word, “risk”, and uses it to mean something different? (I don’t actually know what “effect of uncertainty on objectives” is supposed to mean. I don’t see how “objectives” can be affected by uncertainty. I can see how your chances of attaining them are.)

Well, maybe you cite de Moivre, the finance industry, and system safety use, and say to your correspondant “you mean something different. I think that is unhelpful; and indeed our notion has historical precedence, so for the purposes of this conversation let’s use a different word for your new notion.” Or heshe could say the same to you. In any case, you agree to use two different words.

And for good measure, you write a blog post about it, as here.

This is not a new issue. Here’s a story from six and a half years ago. In the May/June 2005 issue of IEEE Software, Richard Fairley proposed a definition of risk for the Software Engineering Glossary of the IEEE (which is supposed to be canonical, although it turns out that Prof. Fairley doesn’t think so):

(Richard Fairley, proposed IEEE Software Engineering Glossary): The probability of incurring a loss or enduring a negative impact.

So a risk is a to be a probability, which means all risks have values between 0 and 1. Tell that to Lehmann Brothers. Well, I guess you can’t any more. Try Bear Stearns and Morgan Stanley. But we’re talking software, not money.

In common use, someone talking to his teenager speaking of “the risk of your not catching the bus in time” is likely talking about the chances of that event. Someone talking of “the risk that Lehman Brothers will go under” is likely also meaning the chances. But someone talking of “the risk of Lehmann Brothers going under” is likely also thinking of the repercussions as well as just the chances. So much meaning can a relative pronoun versus a copula+gerund carry! As with any other term you wish to be a technical term, you need to decide which meaning (of, here, two) you are going to use. And stick with it. What should be clear is that software engineers working in safety-critical systems need to speak both of likelihoods or chances, and about expected levels of loss. It seems obvious to use “chance” or “likelihood” or “probability” for the former, and some other word for the latter. Since it has been called “risk” for 300 years, why not carry on doing so? And so it is. But some people choose differently. If one is then going to use “risk” to mean “likelihood”, what word does one choose to mean the combination of likelihood and severity? There is not an obvious candidate. But you do need a word for it.

I wrote to the author, Prof. Fairley, Richard Thayer, the person overall responsible for the SW Glossary, and Merlin Dorfman, I believe the IEEE editor responsible for the section, pointing out de Moivre’s definition, the definition from Nancy Leveson’s book Safeware (Addison-Wesley, 1995), and that from the standard for functional safety of E/E/PE systems, IEC 61508, which all cohere modulo the caveats above.

Here is de Moivre:

The Risk of losing any sum is the reverse of Expectation, and the true measure of it is, the product of the Sum adventured multiplied by the Probability of the Loss

Here is Nancy Leveson:

the hazard level combined with (1) the likelihood of the hazard leading to an accident… and (2) hazard exposure or duration…

[The notion of hazard level is] the combination of severity and likelihood of occurrence.

Here is IEC 61508:

combination of the probability of the occurrence of harm and the severity of that harm

I also copied my note to Fairley in this note to the York Safety-Critical Mailing List.

Dorfman agreed that the definition could be misunderstood, but that “I believe the reader is given a fair, complete, and accurate picture of the use of terminology in this area.”. “Accurate”?

What do you do if you are a sofware engineer working in safety-critical systems? Use the IEEE SE Glossary definition, or use the IEC 61508 definition? Use different definitions for different meetings, depending on who is there? And what happens if you misjudge your audience?

Thayer was dismissive. The entire content of his reply:

The overall title of the glossary is Software Engineering Glossary.  This covers it I believe. 

In other words, he doesn’t care much for the dilemma of the software engineer working in safety-critical systems. One could well wonder why he is editing this vocabulary if he doesn’t care about such issues.

I responded to Thayer and Dorfman:

The use in finance and in PRA of the notion of risk equates it to the expected value of loss. A partial list of standards that use some version of this notion is

* IEC 61508, the international standard on functional safety of E/E/PE
safety-related systems
* IEC 300, the international standard on dependability management, in
Part 3, Section 9, “Risk analysis of technological systems”
* IEEE 1228, the standard for software safety plans
* the American Institute of Chemical Engineers guidelines for safe
automation of chemical processes
* US DoD MIL STD 882C, System Safety Program Requirements
* USAF Systems Command, Software Risk Abatement
* CENELEC 50129, Railway applications: Safety related electronic systems
for signalling (the European norm for railways; derivative from IEC
61508)
* European Space Agency Glossary of Terms
* UK Ministry of Defence Standards 00-56, safety
management requirements for defence systems; and Def Stan 00-58,
HAZOP studies on systems containing programmable electronics
* German Standards Institute (DIN), DIN-V-VDE 0801, Principles for
computers in safety-related systems

In particular, I expressed my concern that the IEEE as an organisation had publically given two meanings for risk pertaining to software engineering: one in IEEE 1228 on software safety plans, and another in the Glossary proposed by Prof. Fairley. I got no response.

Prof. Fairley responded, inter alia:

Concerning my definition of risk:  In most, if not all, situations encountered in software engineering, “risk” is the composite result of numerous factors.  In the glossary, I characterize these as “risk factors,” each of which is assigned a probability and an impact (or a range of each).  Risk factors are usually interrelated (e.g., an inaccurate size estimate affects schedule, budget, memory usage; an inaccurate schedule estimate affects product quality) so overall risk (i.e., probability of suffering loss) must be calculated using conditional probabilities or Bayesian analysis.  It is not possible to characterize a situation by a simplistic pair of numbers, unless one is dealing with a narrow, well-defined situation such as a game of chance.  It is dangerous and misleading to attempt to characterize a complex situation in this way.

Given the constraints of a glossary, it was not possible to explain the rationale for my definition or why it differs from the traditional definition; nor was it possible to explain the basis of definition for the other terms in the glossary.

Which to my mind is confused. If risk is “the composite result of a number of factors” each of which is “assigned a probability and an impact”, why ignore the impact and define it as a probability? Either it is a probability simpliciter, or it is the composition of a number of items, each of which exhibits a probability and an “impact”. It can’t be both.

That was it. End of story. The section editor thinks the definition is “accurate”; the Glossary editor is unconcerned; the author is confused. No one seems to worry about the IEEE proposing two incompatible definitions of risk in software contexts.

I wrote to some colleagues I thought might be interested: Dave Parnas, John Knight and Bev Littlewood (as well as a couple of German colleagues), explaining my dissatisfaction with this state of affairs.

Dave sympathised with my frustration, which was similar to his. He said he had seen lots of examples, and that he considered trying to write a glossary for SW terms a fool’s errand, and explained why. John thought this situation to be serious, the Fairley definition of risk wrong, and deserving of public correction. He also said that many people are concerned about a lack of precision and took Dave’s comments to reflect that. Bev strongly agreed with both John and Dave. He was particularly concerned about the dismissive response.

Continuing along the same lines, here is the definition of risk from the US National Research Council study Understanding Risk: Informing Decisions in a Democratic Society (National Academies Press, 1996), p215 (you can read this study on-line):


A concept used to give meaning to things, forces or circumstances that pose danger to people or to what they value. Descriptions of risk are typically stated in terms of the likelihood of harm or loss from a hazard and usually include: an identification of what is “at risk” and may be harmed or lost (e.g., health of human beings or of an ecosystem, personal property, quality of life, ability to carry on an economic activity); the hazard that may occasion this loss; and a judgement about the likelihood that harm will occur.

So descriptions include a likelihood of harm and an identification of what may be harmed or lost. Unless you are a software engineer using the IEEE Glossary (but not IEEE 1228), in which case it’s just a number between 0 and 1.

Here is the definition from a standard text, Probabilistic Risk Assessment and Management for Engineers and Scientists, Hiromitsu Kumamoto and Ernest J. Henley, IEEE Press (them again!) 1996, a book “sponsored by the IEEE Reliability Society”, p2:

Primary Definition of Risk: A weather forecast such as “30% chance of rain tomorrow” gives two outcomes together with their likelihoods: (30%, rain) and (70%, no rain). Risk is defined as a collection of such pairs of likelihoods and outcomes:

{(30%,rain), (70%, no rain)}

So they don’t even go for the combination of likelihood and outcome, nor do they designate certain outcomes as harmful. But if you do designate certain outcomes as harmful, then you can combine these values to calculate de Moivre risk and system-safety risk from this set.

The standard textbook Probabilistic Risk Analysis: Foundations and Methods, Tim Bedford and Roger Cooke, Cambridge University Press, 2001 (not the IEEE for a change :-) ), discusses the definition of risk over some three pages in Section 1.2. They base their notion on that of S. Kaplan and B.J. Garrick, On the Quantitative Definition of Risk, Risk Analysis 1:11-27, 1981.

A risk analysis tries to answer the questions
(i)What can happen?
(ii)How likely is it to happen?
(iii)Given that it occurs, what are the consequences?

Kaplan and Garrick … define risk to be a series of scenarios s_i, each of which has a probability p_i and a consequence x_i.If the scenarios are ordered in terms of increasing severity of the consequences, then a risk curve can be plotted [of severity against probability of at least that level of severity]. The risk curve illustrates what is the probability of at least a certain number of casualities in a given year. Kaplan and Gattrick…. further refine the notion of risk in the following way [to talk about frequency of an event instead of probability, and then uncertainty associated with a frequency]

Again, this concept is somewhat different from that of a number between 0 and 1.

John suggested I contact the then-editor of IEEE Software, Warren Harrison, which I did. Warren suggested that the appropiate action would be a letter to the editor, allowing the author and the section and glossary editors to respond if they wished.

I never did so. I regret it.

So six and a half years later, here I am writing a blog post on it. I doubt the issue will go away. Neither will this note. I do think the IEEE should work to get its definitional house in order.



John McCarthy

11 11 2011

John McCarthy has died. The great John McCarthy. Brilliant and entertaining, fun to be around, accessible unlike many of his stature, who carried an aura about him which blessed you with the feeling, if you came within it, that you were doing the Thinking That Really Mattered. Even if you were just flapping around at a loss for ideas.

The German Wikipedia describes him as a logician and computer scientist. The English version as a computer scientist and cognitive scientist. The German has it right.

John used to be quite happy to get in discussions with everyone about anything and became well known for it as Internet news groups really got started in the mid-1980′s. He had a knack for posing simple questions that turned out to be hard to answer.

And not just in AI. For example, check out his proposal for a new civil right on what counts for him as his personal page:

Remark: Ideally one would put all the information that one considers public about oneself on a page like this. When asked to fill out a form, one would simply put down the URL in place of any information that is on the page and tell the recipient of the form to just look it up.

One step beyond that is that any program needing this public information would just take it from the somewhat standardized web page.

More precisely, here’s a proposed new civil right. No Government agency, educational institution or business should ever be able to require anyone to supply anew information that the institution already has or is publicly available.

Typically for John, it is simple, doable, but somehow not done, and has significant social consequences. Let’s consider it a little further.

There are inadvertent violations. I tried to hand in a technical review of a paper submitted to an IEEE Transactions to the IEEE ScholarOne “system” (I use the word loosely) and found it wouldn’t let me do it without requiring me to fill in a lot of personal information. (I sent the review by email, and someone else has now tweaked the system enough to let me file, apparently.)

But the phenonemon is also used – and this, I suspect, is an insight of John – for political purposes. I had been asked on five or six occasions in the last year by a grant-supported institution with which I am associated to deliver information about activities (publications and talks and so on). Always the same stuff, but somehow not in quite the right format, or not quite the right selection. I began to suspect that someone is looking for a “reason” to ease me out, and so it turned out. Bureaucracy-overload as a political instrument; and of course always deniable.

The focus of this institution is, well, the successor “discipline” (I use the word loosely) to AI. John would have loved it!

Jon Hind informed a mailing list on Tuesday 25th October of the Guardian obituary that had just come out. There is a joint obituary with Dennis Ritchie in The Economist which appeared in the Novermber 5th print edition.

The Economist suggests that John did not suffer fools gladly. That is not quite how it was, as I recall. He engaged with all sorts of people – students mouthing off on Internet bulletin boards, for example, which nobody else did at the time. But he didn’t condescend. Anyone could talk to and argue with John, but he didn’t adjust his intellect to your capabilities – you had to adjust yours to his; for almost everyone an impossibly tall order. As well as being exactly what bright Stanford students need.

The Guardian article seems to me to miss most of what John was about during the 1980′s and 1990′s (the Economist, unusually, even more). Of course, after the invention of LISP, still the longest living programming language with over half a century of use (C, eleven years its junior, still has to catch up), one could regard anything else as a coda. But it was just a start. I’ll talk about the decade I know about, from the mid-1980′s to the mid-1990′s.

John had discovered, or invented, the Frame Problem, with Pat Hayes, and then came up with the cleanest purely technical proposal for solving it, Circumscription. Unfortunately, Circumscription didn’t turn out to solve the Frame Problem sui generis, but it did start a little industry all of its own. This little industry frustrated people such as Danny Bobrow, then-editor in the 1980′s of the premier journal in the field, Artificial Intelligence. Danny is a programmer through and through, who feels that to do AI you have to build stuff, that is, to program. The Circumscription industry consisted of a largish collection of mostly ex-Eastern European logicians, many of them eminent and all of them both capable and productive, who wrote great technical papers in mathematical logic and sent them to the Artifical Intelligence journal- where of course they had to be sent off to be refereed by other members of the group, and they took over about the third of the journal with all that ***** Math!! All good stuff no doubt, but it didn’t seem to some as if much was getting built………

It mirrored a significant split in AI, indeed in all computer science, which continues to this day. There are people who incline to solve problems intellectually before they solve them practically, and there are people who attempt practical solutions and solve, or resolve, issues as they come up to them. In AI in the 1980′s, they were known respectively as “neats” and “scruffies”. The neats have it right in that you cannot program solutions you do not have. The scruffies have it right in that a computational solution to a problem consists in an implementation. You might imagine that they could agree on a division of labor, but research is a little messier than that. The neats have it wrong in that abstraction is also a fine way of subtly changing the problem to fit the solution you happen to have, and the scruffies have it wrong in that a clever programmer can build wonderful programs which fail to solve the problem they set themselves, but “almost” do – the permanent, ineffable “almost”, which turns out to mean “never”.

John’s view on progress was that you knew a field was technically mature when you couldn’t understand the work of someone working on a different problem from you. Let’s turn that on itself. In some sense the division of AI research into neats and scruffies, say Danny’s frustrated view of all that math, could thereby have constituted a proof of some sort of maturity, although the way the squabbles were conducted left many wondering if that was the word for it! Maybe that was John’s point?

And John was the living contradiction to this view on progress. Of course. He could explain to anyone with a modicum of understanding of propositional calculus exactly what he was interested in and what problems he thought were worth solving. Check it out at his group WWW site. They were all so simple! Until you realised that, John being John, if they were really as simple as they looked, he would have solved them already. I recall one evening after dinner at the IJCAI conference in Detroit in 1989 when a bunch of us formal people were chatting away after late dinner. Along comes John. Says, “you know, I was thinking about this…… do you know how to do it?” and posed, as usual, what appeared to be a simple problem in propositional logic. Well, after a few minutes, everyone else made their excuses and left. I couldn’t solve it. Then came another problem. And another. All simple, all propositional logic, all needing to be solved if machines were going to mimic human decision capabilities. And, of course, AI meant that machines should be able to do this, somehow, so even if you programmed them with genetic algorithms or neural-network problem solvers, they would still just have been solving John’s “easy” problems in propositional logic. Surely a problem posed in logic can be solved in logic? Well, sometimes.

This went on for an hour and longer. At such sessions one could choose to feel stupid and frustrated at not being able to solve anything, or to revel in the creativity exhibited before your very eyes. For anyone can solve problems, but very, very few people know how to ask exactly the right questions. John was one. Astonishing performances, puzzles rolling off his tongue as if he were discussing the bus timetable. Anyone – and there were many – who claimed that “symbolic AI” was dead just hadn’t been listening properly. Symbolic AI wasn’t and isn’t “dead”. John’s simple problems need to be solved one way or another. But no longer by him, unfortunately.

Circumscription? Let me have a go. Circumscription is a syntactic (and therefore computationally feasible) way of doing the following. Say you have a description of part of the world in front of you, and what is going on in it. Say your description is in some language which allows deductive inference. Circumscription is a way of drawing rich inferences about features (“predicates”) of that scenario under the supposition that the world doesn’t have anything else in it but those objects expressly described plus whatever else needs to be there for the description to be accurate. Not just rich inferences, more than you could obtain with deduction alone, but rich, correct inferences. To logicians, it is a set of inference rules about what is true in certain minimal models of the set of sentences.

That is logically very important. Modern logic arose with Frege considering the logic of arithmetic, of counting and adding and so on. But in Frege’s logic, it turns out that you can’t just restrict your talk to the positive whole numbers. Circumscription was a way of trying to do just that for “worlds” which had a finite number of objects in them. It resolves many of the issues in the Frame Problem (maybe more appropriately called the Framing Problem), by implicitly defining what you are framing. However, it doesn’t neatly resolve the conundrum posed in 1986 by Steve Hanks and Drew McDermott and known as the Yale Shooting Problem. That was first resolved by using other principles. The conundrum it posed has now dropped out of fashion, as far as I know.

To see the rich the tradition around Circumscription, one may look at the Stanford Encyclopedia of Philosophy entries on Non-Monotonic Logic, on Defeasible Reasoning, on Logic and Artificial Intelligence, on the Frame Problem, on Ceteris Paribus laws (that is, rules based on “all other things being equal”).

John was not concerned merely with mimicking intelligence with machines, but with the more elusive reasoning phenomenon of common sense, which occurred in the title to a collection of his papers in 1990. There was a whole branch of AI research devoted to “common-sense” reasoning about the world; which in turn spawned a branch of reasoning called Qualitative Physics: how the world works ; check out for example this book chapter by Ken Forbus. (John, though, would have distinguished common-sense physics from qualitative physics.) If you put a round ball on a slanting table, it will roll down the slope and drop off the edge and hit the floor just beyond the edge, and how far beyond depends largely on its speed relative to the table when it drops. This phenomenon is known to every two-year-old a couple of decades before they can understand the Newtonian version, but we adults have far more trouble getting a handle on the qualitative reasoning than we do on Mr. Newton’s mathematics. Yet another delicious irony.

And one could go on. Maybe without end. Qualitative physics will not end; it’s a phenomenally hard problem. It may go out of fashion, but it’ll come back. And somebody will have to solve all those common-sense physics problems as well, and maybe differently. Circumscription didn’t solve the Yale Shooting Problem, but it did open up the study of rigorous forms of defeasible reasoning.

And always there was an irony, a delightful little joke in the tail. Somehow, you never quite knew whether you were thinking about a new subject or an elaborate joke. Look closely at the picture in the Guardian. Can you, also, maybe, see the slight smirk that I always thought I saw? Maybe, just maybe, AI was his very biggest joke of all…..



Ensuring Safety Requirements Fulfilment in Possibly-Imperfect Software

16 10 2011

Ludi Benner just asked me privately about the feasibility of dumping stack traces from operating SW in flight. I concluded that it is not a very practical idea for a number of reasons. First, there is a lot of it. Second, you can’t analyse them for every flight, because there aren’t human resources for it, and no automatic tools which can detect coding errors from stack traces. Third, even if you analysed them in the case of an accident, there has as yet been no accident in which coding error was suspected (although there have been accidents and incidents in which requirements or design failure of computer-based systems was a causal factor), so even had they been available, no one would have needed to look at them.

Looking at stack traces is also a primarily a measure for assessing software quality. You can tell from a stack trace maybe whether the SW was doing what it was required/designed to do, and thus detect coding errors. But in safety-critical systems you are not interested primarily in deviation from requirements in general, you are interested primarily in deviation from the safety requirements.

There is a general method for formulating safety requirements:
(I) identify hazards (however you might define them), and
(II) then formulate a safety requirement per hazard H as S.H = [either avoid hazard H or exit out of hazard H within Q.H time period], and
(III) define the safety requirements as ( /\/\ S.H), the conjunction being taken over all hazards H

Nancy Leveson defines hazards as states of the system such that……… (Leveson, Safeware, Addison-Wesley 1995, Chapter 9). Others speak of states of the system+environment such that ……  or events such that…… (see for example Chapter 4 of my 2001 on-line book, and Chapter 5 , and this set of definitions from Causalis)

Let’s use the Leveson definition.

The fundamental insight is this. Suppose you have relatively complete safety requirements ( the definition from an earlier blog post ). Then you can insert monitoring SW to look at SW-state, detect hazard states H when they occur, (this can be achieved by techniques for run-time verification – I shall call it here logical monitoring) and then you trap to SW which either exits or mitigates H with worst-case execution time (WCET) Q.H.

For this to work unfailingly, the following conditions have to be fulfilled

(a) your safety requirements are relatively complete,
(b) the hazard-detection is perfect (a “perfect oracle”), and
(c) the (H-exit or H-mitigation) SW is perfect,

This suffices to ensure that the SW does not engender dangerous behavior. Assumption (a) ensures that fulfilment of the safety requirements suffices to avoid dangerous behavior. Assumptions (b) and (c) ensure the safety requirement associated with hazard H is fulfilled. The assumption of perfection for the detection SW in (b) and the avoidance/mitigation software in (c) is critical. As is the condition that, when the perfect oracle detects the presence of H, the trap to the avoidance/mitigation software is also perfect. However, such traps are HW-based and a failure of such a trap could occur due to a HW problem. (Of course, the victims are unlikely to care whether a trap failure is classified as HW or SW).

Logical monitoring is, I propose, to the point at which (b) and (c) are practical. John Rushby notified me of this in December 2009, pointing me to a brief survey of his which I found helpful. I don’t belong to the run-time verification “community”, although I knew about it in general (Manuel Blum and others at Berkeley whom I know had been working on it theoretically a couple of decades ago). So I am proposing it as it were on hearsay rather than through personal experience. It seems to me to be plausible that one can synthesise perfect oracles as well as perfect avoidance/mitigation software.

Such software added to safety-critical SW would be possibly-perfect software in the Strigini sense, that is, software which you would like to be perfect, which you have good reason to think is perfect, and the question is mainly the confidence you have in your judgement that it is. Possible-perfect software and its use in achieving demonstrably-ultrahigh-reliability software has been recently discussed in Littlewood and Rushby’s forthcoming paper in the IEEE Transactions on Software Engineering which I think is a landmark paper.

There are then two questions.

One is assessing the level of confidence we could have that such logical-monitoring software is indeed perfect, and how that would affect the level of confidence we have in the exceptionless fulfilment of the safety requirements for the otherwise possible imperfect SW in which this logical monitoring is inserted. I suspect that techniques such as exhibited in the Littlewood-Rushby paper are applicable.

The second question is also twofold:

(i) whether, for every hazard H, it is the case that a safety requirement of the form S.H = [either avoid hazard H or exit out of hazard H within Q.H time period] suffices to avoid all dangerous consequences of H, and
(i) whether it is possible to produce such avoidance/mitigation SW with WCET less than Q.H

Concerning (i), logical monitoring SW cannot help in avoiding H. It detects when H is present (step (b) ). So if harmful consequences of H can occur within a shorter time period than it could possibly take to detect, trap and exit H, this approach cannot be guaranteed to fulfil the safety requirements for the SW. However, in such a case I suspect very much that the software should be redesigned to ensure the avoidance of H. Since H is a SW state, I see no reason why this should not be generally possible.



Software Quality and Fitness for Purpose

26 08 2011

Following on to my recent post on certification requirements for commercial aircraft, John Rushby and I have been discussed a paper of his, on commercial aircraft software and the guidelines DO178B, in the invited session on certification at EMSOFT 2011.

John is concerned with whether DO178B “works”, that is, leads to high-quality code which is fit for purpose, and, if so, why and how. I think that is a hard and important question and I commend his bravery in addressing it squarely (rather than hiding behind a blog format, as I do :-) ). I commend his paper to people when he publishes it – I imagine it will be on his publications page sometime in October 2011. The paper is not long, but it is dense. Rather as if it were a poem, I had to read it multiple times, carefully. It took me a week to respond. I concluded I do really prefer Goethe, but then he didn’t talk about avionics SW.

John suggests that DO178B is focused on assuring the correctness of the executable code. I found that surprising; I think both Martyn Thomas and I are concerned that DO178B is focused largely on processes which people thought and still think (often, we claim, without much scientific evidence) correlate with code that is fit for purpose.

I use the term “fit for purpose” as does Martyn. John suggests that is a British term not used in the US. I prefer to use it to using the term “correct”, so let me translate. There are at least two ways in which code may be said to be “correct”. One is that it fulfils its specification; let us call this correct-1. The second is that the code causes the system to behave in a manner appropriate for the task at hand; let us call this correct-2. I call the correctness-1 properties of code its quality, and correctness-2 properties fitness for purpose. But let me say correct-1 and correct-2 for the remainder of this note.

The specification may not subsume the “task at hand” in all cases: quality, correct-1, does not imply fitness for purpose, correct-2. Indeed, there is good evidence dating back some twenty years now, say, in work of Robyn Lutz published in 1993 , that most failures of correctness-2 in moderate to complex systems are not failures of correctness-1.

Martyn and I would agree on the lack of evidence that “clean and tidy” SW development processes entail any concrete property of the resulting code (such as fitness for purpose). Martyn would also suggest, citing work of Andy German of QinetiQ, that higher DALs in DO178B do not necessarily correlate with higher-quality software. Here is what Martyn says, information from personal communication with Andy:


Here is some data from the formal analysis of the avionics software of a current American military aircraft that was certified against DO-178B Levels A and B for use in civil airspace.

The following defects were among those reported in the software after certification:

Erroneous signal de-activation.
Data not sent or lost
Inadequate defensive programming with respected to untrusted input data
Warnings not sent
Display of misleading data
Stale values inconsistently treated
Undefined array, local data and output parameters
Incorrect data message formats
Ambiguous variable process update
Incorrect initialisation of variables
Inadequate RAM test
Indefinite timeouts after test failure
RAM corruption
Timing issues – system runs backwards
Process does not disengage when required
Switches not operated when required
System does not close down after failure
Safety check not conducted within a suitable time frame
Use of exception handling and continuous resets
Invalid aircraft transition states used
Incorrect aircraft direction data
Incorrect Magic numbers used
Reliance on a single bit to prevent erroneous operation

The worst module had a defect density greater than 1 defect in 10 lines of code. the best had 1 defect in 250 lines.

The problem, as I see it, is that most software assurance relies on testing, although we have known for at least 40 years that testing can only show the presence of errors and not their absence. Until software assurance is mostly based on mathematically rigorous analysis of the software (which can be done at no increase in cost if the software is developed with this in mind) these unacceptable rates of software defects will continue.

Notice that these are errors in the sense of correctness-1.

I hold it to be significant that a number of these errors could not have occurred had the SW been written in a strongly-typed language and the compiler correctly implemented the strong typing. This has been an issue for forty years, ever since the Algol project gave up. This is one of the demonstrably best-known ways to avoid certain well-defined classes of program error. If DO178B is truly focused on the correctness of the implemented code, why doesn’t it require development in strongly-typed source, and use of a demonstrated type-correct compiler?

Andy also claimed in a paper in Crosstalk 16(11), the Journal of Defence Software Engineering, November 2003 that “no significant difference” had been found with respect to levels of correctness-1 in code developed according to DO178A Design Assurance Level A and Level B. Development according to DAL A is regarded as significantly more resource-intensive that development to DAL B, in part because DAL A requires so-called MC/DC testing (see the helpful tutorial by Kelly Hayhurst, pointed out to me by Mike Holloway), which is quite hard work.

BTW, that edition of Crosstalk also includes a fine article on the so-called Ravenscar profile, for interprocess communication in Ada which admits straightforward static analysis, by Brian Dobbing and Alan Burns.

The big UK certification effort on which I understand much of the QinetiQ work was performed was for an aircraft manufactured by Lockheed Martin (BTW, one barely calls them “manufacturers” any more, but rather “system integrators”). John said by way of anecdote that he had indications that internal data of both Boeing and Airbus show “more issues” with DAL B software than with DAL A software.

Assuming these observations are correct, the question here would be how two experienced companies develop quality-improved software using the extra requirements for DAL A, but a third experienced company does not see any improvement.

The answer must be that there are hidden factors at work, factors which actually do lead to an improvement in SW quality, which in two companies are associated with the extra effort required for DAL A development, but which in a third company are not.

Since DO178B misses those factors (for otherwise all companies would show improved quality in DAL A development over DAL B development), isn’t it important to find out what they are, and then write them explicitly into DO178C, which is currently in its final stages?

BTW, if you want my view on what an ideal SW safety standard should say (thank you for asking :-) ), check out slide 22 of my Ada Connection keynote talk of 21 June 2011.

I might point out that it is much shorter than the 150pp of IEC 61508 Part 3 Version 2, which I mutter about on the previous slide.



Coda, Interdisciplinary Work, and Scientific Publishing

15 08 2011

It sounds like a mish-mash, doesn’t it? will probably read like a mish-mash, too.

Because true interdisciplinary work always looks that way, I think. That is one of the main points I wish to get across. But first, let me get there.

Concerning my last post, Leslie noted that the condition he labels “FAA requirement” in his slide 4, for 10-10 probability of failure per hour was actually a NASA requirement for the SIFT research. SIFT was the first digital flight control computer, and SRI was supposed formally to verify its operating system. The project didn’t succeed in this original goal, over a decade but, as is often the case, we computer people learned far more, and more fruitfully, from this failure, than we ever would have had it “succeeded”. For example, I am not aware of any formal proof that such-and-such a non-trivial system S is guaranteed free of Byzantine failures, for any system S that is not artificially constructed just for the proof. And that’s thirty years after the papers were published! Conclusions: Lamport and co put their fingers on some things that we just can’t do. Not only that, but they classified a cross-disciplinary problem in a new way. Byzantine failures, as spoken of by Driscoll et al., are a system problem, a mixture of phenomena which have to do with the electronic design, as well as the materials, of which system components are made. Transistors get cracks in them and turn into condensers (a Space Shuttle Byzantine agreement problem). But Lamport et al. turned their efforts to a pure algorithmic problem and published in pure computer-science journals (indeed, the best). Leslie is not a computer scientist who deals with avionics, he is a computer scientist who deals with computer science.

But right on the boundaries also. One of his most insightful (and to my mind, one of the best) pieces of work he ever did was on the collection of issues about arbitration in converting continuous (“analog”) data into discrete (“digital”) data: Buridan’s Principle, whose purely technical contribution rests on a mathematical theorem he proved with Dick Palais, his thesis advisor. You can read Leslie’s account of the odd results of his attempts to publish. He gave it to me sometime in the 1980′s. But since the 1990′s, everyone can know about it and read it at will, because he put it on the WWW. Thank heavens for the WWW!

And that is a point about interdisciplinary work with which I have been struggling now for almost twenty years. One writes a paper on the causal analysis of a computer-related aircraft accident using the Lewis semantics (the Counterfactual Test). One sends it to a computer science journal. Review: “that’s got aeronautics in it, no one in computer science understands aeronautics, better to try an aeronautics journal”. One does. Review: “that’s got logic in it, no one in aeronautics understands logic, better to try a logic journal”. One is not stupid, but if one were, one might try to do so. Anticipated review: “that’s got computer science and aeronautics in it, no one in logic understands computer science and aeronautics, better to try a computer-science-and aeronautics journal.”

And that’s all true and that’s all reasonable. Indeed no one in computer science reads aeronautics journals. No one in aeronautics reads logic journals, and so on. That’s why many engineers working on avionics bus systems still do not know about Byzantine failures, 30 years on.

The result is that most of what I write gets on the WWW and stays there. One can spend one’s time writing, or chasing one’s tail around such publishing conventions, but doing one takes time and effort away from the other, and I prefer writing.

Just to give an indication, one of the pieces of work I performed in the last year of which I am most proud is the analysis of causal explanations of the Concorde accident and assessments of responsibility, which I wrote about in my post Concorde, Ten Years On, Part 2. I see there a series of pressing social and technical issues and their interplay, which people have not satisfactorily come to grips with and I regard that piece as some kind of a start. As I said I’m proud of it. One can’t do that kind of work every day, or at least I can’t. One has to sieze the moment and I did. Actually, that is the way many successful researchers work in math or computer science. Or philosophy, for that matter. You spend most of your time laying some kind of groundwork as best you can, and then you are somehow handed a moment and you sieze it: “I can do that!” and you do. Some more than others.

This wasn’t ever different. Disciplines were partitioned, especially academic disciplines. But one would have thought, as I guessed 15 years ago, that the WWW would make everything different. Mais, plus ça change, plus c’est la même chose.

Some more examples.

I recently organised a Workshop on the Fukushima nuclear accident, inviting largely sociologists and computer-system-safety people. People who read my blog know why I laud the sociologists for their insights into technical matters. When I was thinking we could do this, I asked people about funding. The Scientific Board of CITEC, where I am a PI until November 2012, thought it was a cracking good idea and very relevant and offered financial support. My colleagues at the Centre for Software Reliability in Newcastle upon Tyne, when I called them to apologise that we were withdrawing from their exhibition at the Ada Connection in order to put the money in the Workshop, also offered financial support. Thank you all! And I did approach the German central funding agency for scientific research, the DFG, which had circulated an e-mail saying that in the wake of the tsunami there were instruments available to support cooperative research on the matter in the very short term.

Naive as I am, I took this message literally. I contacted the responsible administrator whose address was on the note. He graciously explained that his “instruments” were limited and didn’t support my workshop idea, and passed my request on to, amongst others, the administrator responsible for the support of engineering research, who replied forthwith in one sentence: “from the point of view of engineering, I don’t see any possibility of support” What? The world has just experienced one of the two most devastating engineering accidents ever, German politicians scrambled over each other to devise our exit from nuclear power, and the prestigious German academic research support agency says it ain’t interested? I put in a carefully worded query asking whether this could really be so, and received no reply.

Now, me being me, I would think they should be ashamed of themselves. But if I said that, I’m sure it would be indicated to me how inappropriate that would be, and really that I don’t understand the formal courtesy structures at play, and so on. Maybe all true. But the fact is that I have an international reputation in accident research, here was a biggie with major political consequences, I invited a bunch of top people to discuss it, they all said yes by return e-mail, and the engineering research support organisation said it wasn’t interested. There is no way around that fact, no matter how pretty the words.

And that illustrates something that I feel is going more and more wrong with academic-type research over the years in which I have been involved with it. I suspect it is particularly acute in Germany. Academic research here after the first degree is performed by scientific employees, by people in temporary jobs. There are no “graduate students” (although that is beginning to change: there are now narrowly-defined graduate colleges which offer competitive scholarships. We have one in Bioinformatics and Genome Research, another in Situated Communication, which I think is now over, and another in Cognitive Interaction Technology, which I think has absorbed it). You want to offer a research topic in an American university, you do so, to all the graduate students, and some one will be attracted to it and come and talk to you. In Germany, you have to apply for funding (mostly from the DFG) for a temporary position to perform the research, then wait for the job applications, and hire someone on the basis of an interview. It’s a lot more work for the faculty member; there isn’t the same personal connection to the bright young people you already know are capable of the work; it’s less flexible (I got three quarters of the way through two other thesis topics before I hit on the one I could finish, and none of them were connected with each other. You can’t do that in a job. Indeed, it took me three jobs!); and I believe the quality of the product suffers (but then, I was at Berkeley. Unfair comparison? Well, no. No German university makes the top fifty in any of the more well-known rankings and I’m talking about possible reasons for that).

Let me amplify a little on that parenthetical comment. I had a colleague here in Bielefeld with over twenty or twenty-five “scientific assistants” in his group, people working at temporary jobs who hoped thereby to get their doctoral degree. At Berkeley, people, even Turing-award winners, had at most four or five doctoral candidates whom they supervised. The key word here is “supervised”. No one person can supervise twenty-five doctoral candidates to anything like the Berkeley norm. Indeed, supervision, such as it was, was mostly delegated to the post-docs. Of which, to achieve the same ratio, one would need five or six or seven (I recall there were three or four). And these doctoral-work supervisors were not Turing-award and like winners, not even NSF Young-Investigator Award winners, such as at Berkeley. They were people who had got their first research qualification and were mostly at the beginning of seeing whether they could make any kind of name for themselves.

A couple of years after I got to Bielefeld, I discovered that somebody in that group had just written a doctoral thesis on temporal reasoning for artifical agents. Temporal reasoning for artificial agents? That’s the very work that I was known for, partly on the basis of which I was hired (here, one is not hired but “called”). This guy had never talked to me. Curious, I looked at the work. After I read the statement of the problem, it was obvious how to solve it. Then I looked at his solution and it wasn’t anywhere near as good. (But there was some program code behind it.) Happens here. Happens quite a lot here. Doesn’t happen at Berkeley, by and large.

I faulted the research structure. The guy had a job, with a job description. He was a nice, friendly and capable guy. At the end of the job was the expectation of a doctoral degree. Which was duly awarded after satisfying the appropriate formal criteria. All very neat and clean. DFG money apparently well spent. But the sum total to the world’s knowledge of how to solve temporal reasoning problems with artificial agents was essentially nil. His energies, and the funding support, would surely have been better spent had he talked to me, and then worked on a problem of the same level of difficulty, but to which the solution was not known.

This is already a lot of anecdotes. But it is hard to see how to get at the point without recounting lots of anecdotes. For each anecdote has its individual answer: it’s a special case; or I misinterpreted; or I was sour at someone; or I’m just being arrogant; or I’m looking for excuses for something I haven’t done or don’t do. Maybe all true, but it is the number of anecdotes, interpreted as the weight of evidence, that persuades reasonable people that there is something to the set-up which encourages all this.

Indeed, I am convinced that the model in which aspiring researchers pick their own topic from amongst those offered, make personal connections with a senior researcher who is able to judge whether they might be capable of completing the work, encouraged indeed required to correspond with more accomplished others who have worked on and solved similar issues, along with the freedom to change topic completely when the current one won’t work out, is a better way to induce productive research than the research-as-job model.

But this heavy structurally-constrained interpretation of what constitutes effective research goes much further. Recall my anecdote about DFG support for my workshop, above. Along those lines, consider the following. I am a Principal Investigation in CITEC (above), whose charter is coming up for renewal and the proposal is about to be submitted. It turns out that the business of saying what my group (essentially of two: me and my post doc) have accomplished and what we will accomplish in the next five-year period was delegated to a young colleague, whose job is supported through the institute through the five-year cycle, as indeed now are all professorial jobs in Germany (tenure has gone) and is thus dependent on the success of the upcoming proposal.

Despite offers to help, my colleague didn’t talk to me at all. Indeed, it took me a certain amount of effort to find out who was writing what about our work in the proposal, since apparently none of the stuff I wrote was going to make it in. He wrote one sentence about the work my group had accomplished over the four years (with apologies that he couldn’t find more). And he found no relevant publications, despite (he indicates) trawling our publications page. Well, during the course of the last few months I have been asked variously for one key publication; for five key publications; for ten publications not necessarily within the CITEC remit, all by various people none of whom are he. The Coordinator of CITEC (effectively the director) asked for a meeting, to explain to me that without any publications it didn’t look good for the proposal to include me.

What? People can’t find stuff I’ve written on the safety of mobile automatic devices in the last five years? Well, of course they can, but you see it doesn’t count. The DFG says peer-reviewed journal articles only.

There we go again. Structural constraints. Nobel-memorial-prize-winning economists, and sociologists, and political scientists, and legal scholars all write blogs. Hundreds and thousands of people read them and comment, including their peers, often in their blogs. Peer-reviewed? Most obviously! I just received a copy of a journal article (counts!) written by two colleagues about two essays (cited) I wrote in this blog. Other colleagues read my posts and they comment!

Another example. We started a mailing list in March 2011 on the Fukushima accident and I recently summarised my contributions, which amounted to 117 A4 pages in 12-pt type, for the workshop proceedings. Now, every word I wrote on that mailing list has been read by eminent colleagues on the list, and they have commented, frankly (it is a closed list). And I have commented on their writing in turn. That’s what you do on such a list, if you’re one of the people who do it (others prefer just to read). Peer review? How much more is it possible to get? And more easily?

The WWW has been pervasive for fifteen years and e-mailing lists for thirty. And there is still no measure of quality of contributions that is acceptable to the German research funding agency? (It is not the only one with such a view.) Astounding! It is not as if this is a hard problem. It would get to be a hard problem if what you want to try to define is The Definitive Measure of Scientific Quality, because there can’t be one. But judging the quality of blog posts or sustained mailing-list contributions is no easier or harder a job than judging the quality of peer-reviewed journal publications, indeed it’s often easier because you can ask more people.

Actually, what happened with the CITEC thing is this. Bernd, Jan and I figured a while ago that our textbook on Safety of Computer-Based Systems, which was been solicited by a major publisher some years ago, would be written and out by now. And we thought one book would likely suffice to show what we’d done. One book is not five published papers; in this case it’s more like fifteen, and there will be more. But it isn’t out. Since it is a text, we need to be sure that the techniques introduced can actually be used by the target audience, students and engineers, and so some of our contributors belong to that target audience (not all textbook writers do this, but I happen to think it’s a very good idea). They are not necessarily as experienced writers as I am, so it simply takes longer than we’d thought. Quite understandable, one would have thought. But apparently there is no reasonable way to say to the DFG that the book is almost finished. (Someone might even want to say that a textbook isn’t research. But this one is, you know, just like Nancy Leveson’s new text. Read that one too!)

Structural constraints, and how they hinder effective support of effective research. Is everyone convinced by now? At least convinced to look at the issue more closely? Shall I stop here then?

Not quite. One more word, back to the original topic. “Interdisciplinary” is one of the buzz words of the new modes of research support. But the problems indicated above of support, publication and assessment of work which crosses traditional discipline boundaries, or the new boundaries left in place by a country’s Scientific Wise Owls and Funding Agencies, are deeper than a buzz word, or even than a buzz concept. The logicians can’t read aeronautics and the aeronauticists can’t read formal logic and the computer scientists don’t understand aerodynamics and the engineers don’t understand the sociologists and I doubt that is going to change rapidly under the hierarchically-directed research-as-job model, buzz word or no.