Security Vulnerabilities in Commercial-Aircraft SATCOM Kit

14 08 2014

There has been some press in advance of last week’s Black Hat conference speaking of vulnerabilities in commercial-aircraft flight management systems and possible implications for the safety of flight, for example in a Reuters article by Jim Finkle from August 4. The article is technically fairly accurate on the claims made and the manufacturer’s response, but it also includes comments such as this

Vincenzo Iozzo, a member of Black Hat’s review board, said Santamarta’s paper marked the first time a researcher had identified potentially devastating vulnerabilities in satellite communications equipment.

“I am not sure we can actually launch an attack from the passenger inflight entertainment system into the cockpit,” he said. “The core point is the type of vulnerabilities he discovered are pretty scary just because they involve very basic security things that vendors should already be aware of.”

Which sort of says what the Black Hat program committee know about airworthiness certification of avionics: not very much, if anything at all. The phrases “potentially devastating” and “pretty scary” are to my mind completely out of place. I have also seen some public discussion of the vulnerability claims which suggests the sky could, or is at least theoretically able to, or maybe possibly theoretically able to, fall. I figure it is worth saying a couple words about it here.

This note may seem ponderous, but I think it is important to give the complete background and references. Aviation airworthiness certification is one of the more developed safety assessment regimes and some public discussion is obviously ignorant of it. For example, some contributions fail to make the basic distinction between a vulnerability (which could pose a hazard) and the possible consequences of exploitation of that vulnerability (the severity of the hazard).

This distinction is basic to safety and security analysis for half a century or more. Its necessity is easy to see. People can demonstrate hacking bank ATMs at security conferences and have them spill banknotes all over the stage. But that doesn’t mean the hacker has access to all the networks at the bank in question and can embezzle trillions from their transaction systems. Indeed, no one thinks it does. The vulnerability is that a bank ATM can be compromised; the severity is (at least) that it loses its contents, and maybe more (maybe hackers can gain access to the central control SW). A bank can routinely cope with losing all the bank notes in an ATM; by all accounts attempts at fraud in financial transaction systems are orders of magnitude more severe and have been for decades. Vulnerability and consequences are connected but separate, and both or either could be rightly or wrongly assessed in any given proposal.

It appears vulnerabilities do exist in the systems investigated by the company IOActive and its associate Ruben Santamarta, but the severities of any such vulnerabilities have already been assessed by regulators during airworthiness certification and have been found to be negligible or minor.

There is a White Paper on their work from the company IOActive. It concerns vulnerabilities in satellite-communications (SATCOM) systems in general, mostly about ships and land-equipment for the military. There is one aviation application, as far as I can see. They claim to have compromised Cobham Aviator 700 and Aviator 700D devices. This kit contains software certified to DO-178B Design Assurance Level (DAL) E, respectively DAL D, they say. They also say it is installed on the military C-130J.

The first paragraph of “Scope of Study” in the company White Paper says that the researcher(s) didn’t have access to all the devices, but “reverse-engineered” those to which they didn’t have access and found vulnerabilities in their reverse-engineered copies.

DAL D software is that installed on kit whose malfunction could have at most a “minor effect“. DAL E software is that installed on kit whose malfunction could have at most “no effect”. These are technical terms: the notion of “effect” is the aviation-certification term for the possible consequences of a failure and corresponds with the more common term “severity” used in other safety-related engineering disciplines. A good general reference on certification of aviation equipment is Chapter 4 of Systematic Safety, E. Lloyd and W. Tye, CAA Publications, London, 1982. Lloyd and Tye categorise a Minor Effect as one “in which the airworthiness and/or crew workload are only slightly affected” and say that “Minor Effects … are not usually of concern in certification”. They don’t include them in the risk matrix which they use to illustrate the certification requirements. The risk matrix shows the slightly differing characterisations of the FAA and JAA certification regimes. The JAA was the former de facto certification authority in Europe and has been subsequently replaced by EASA. Most countries accept FAA and EASA airworthiness certification as adequate demonstration of airworthiness.

The Cobham Aviator series is kit which may or may not be fitted to any specific aircraft. The Cobham WWW site contains a number of data sheets about the Aviator series. It appears to be available for (retro)fit to the Dassault Falcon bizjet series and apparently NASA Armstrong FRC has some: here is a related purchase order.

The airworthiness of the Cobham Aviator 700 and 700D systems is governed by 14 CFR 25.1309 in the US, and Certification Specification 25 (CS-25) clause 25.1309 in Europe. There is an FAA Advisory Circular defining the acceptable means of compliance with this regulation, which includes the definitions of effects and their allowable probabilities: AC 25.1309-1A System Design and Analysis, issued 21 June 1988.

The specific definition of “Minor Effect” from AC 25.1309-1A is

Failure conditions which would not significantly reduce airplane safety, and which involve crew actions that are well within their capabilities. Minor failure conditions may include, for example, a slight reduction in safety margins or functional capabilities, a slight increase in crew workload, such as routine flight plan changes, or some inconvenience to occupants.

The CS-25 definition is similar.

The general vulnerabilities IOActive claim to have found in the Cobham Aviator devices are listed in Table 1 of their report:

Weak Password Reset
Insecure Protocols
Hardcoded credentials

IOactive has informed US-CERT about the vulnerabilities it has found in the Cobham Aviator 700 and 700D kit. The US-CERT entry in the Vulnerability Notes Database contains a rather more precise statement of the vulnerabilities found. The note says that the identified vulnerabilities are

CWE-327: Use of a Broken or Risky Cryptographic Algorithm – CVE-2014-2943
IOActive reports that Cobham satellite terminals utilize a risky algorithm to generate a PIN code for accessing the terminal. The algorithm is reversible and allows a local attacker to generate a superuser PIN code.

CWE-798: Use of Hard-coded Credentials – CVE-2014-2964 
IOActive reports that certain privileged commands in the the satellite terminals require a password to execute. The commands debug, prod, do160, and flrp have hardcoded passwords. A local attacker may be able to gain unauthorized privileges using these commands.

The Common Weakness Enumeration (CWE) derives from Mitre, and an explanation of, for example, CWE-327 is to be found on the CWE WWW site, as is an explanation of CWE-798.

IOactive says the following about the vulnerabilities of the Cobham 700 and 700D devices. I quote their report in full.

The vulnerabilities listed in Table 1 could allow an attacker to take control of both the SwiftBroadband Unit (SBU) and the Satellite Data Unit (SDU), which provides Aero- H+ and Swift64 services. IOActive found vulnerabilities an attacker could use to bypass authorization mechanisms in order to access interfaces that may allow control of the SBU and SDU. Any of the systems connected to these elements, such as the Multifunction Control Display Unit (MCDU), could be impacted by a successful attack. More specifically, a successful attack could compromise control of the satellite link channel used by the Future Air Navigation System (FANS), Controller Pilot Data Link Communications (CPDLC) or Aircraft Communications Addressing and Reporting System (ACARS). A malfunction of these subsystems could pose a safety threat for the entire aircraft.

This is the entire statement. IOActive is thus explicitly disagreeing with the regulators: they say the vulnerabilities “could pose a safety threat for the entire aircraft” whereas the regulators have determined during airworthiness certification that the consequences of any malfunction of the Aviator 700 and 700D are “No Effect”, respectively a “Minor Effect”.

It is certain that regulator and vendor have a significant amount of paperwork on file purporting to establish the severity of malfunctions of the Cobham Aviator 700 and 700D kit. Much of that will refer in detail to the kit, and therefore will contain proprietary information and will not be available to the general public.

In contrast, IOActive has merely asserted, as above, its deviant view of the severity, as far as I can tell without providing any reasoning to back up its claim.

The vendor has provided the following statement to US-CERT:

Cobham SATCOM has found that potential exploitation of the vulnerabilities presented requires either physical access to the equipment or connectivity to the maintenance part of the network, which also requires a physical presence at the terminal. Specifically, in the aeronautical world, there are very strict requirements for equipment installation and physical access to the equipment is restricted to authorized personnel.

The described hardcoded credentials are only accessible via the maintenance port connector on the front-plate and will require direct access to the equipment via a serial port. The SDU is installed in the avionics bay of the aircraft, and is not accessible for unauthorized personnel.

Cobham SATCOM will continue to evaluate any potential vulnerabilities with its equipment and implement increased security measures if required.

In other words, they don’t think the discovered vulnerabilities affect the use of its kit much at all, and presumably the regulator agrees – that is, it has already agreed in advance during airworthiness certification, and sees no reason to change its mind.

US-CERT judges


A local unauthenticated attacker may be able to gain full control of the satellite terminal.


The CERT/CC is currently unaware of a practical solution to this problem.

I would disagree with use of the words “problem” and “solution” here. Indeed the entire categorisation seems to be somewhat puzzling. Obviously the vendor could fix the vulnerabilities by using better crypto in places, and by using device-access authentication that is not hard-coded; that would surely constitute a “practical solution” and surely CERT is as aware of this as I am and the vendor is. It also appears that neither vendor nor regulator sees the need to undertake any action in response to the revelations. There is no record that the airworthness certification of the kit has been withdrawn and I presume it hasn’t been.

Summary: IOActive and US-CERT have said “you’re using risky or broken crypto, and you’re hard-coding authentication”. Vendor and (implicitly) airworthiness regulator have said “so what?”. End of Story, probably.

None of this is to say that airworthiness certification always gets it right. Indeed, it is clear that every so often it is gotten wrong. But it is a lot more effective than what people without any experience of it seem to be assuming in discussion.

Don Hudson and PBL on the ITU’s proposal for real-time flight data transmission

10 04 2014

The International Telecommunications Union has been conducting its four-yearly meeting. Its president has apparently promised everyone to make possible the real-time transmission of flight data from commercial transport aircraft in flight. This has been supported by the Malaysian delegate. All according to this news report: MH370: ITU Commits to Integrate Flight Data Recorders with Big Data and Cloud, writes Vineeta Shetty from Dubai

Captain Don Hudson is a valued colleague for well over a decade. He has some 20,000 or so flying hours, has flown a variety of transports, including Lockheed L1011, Airbus A320 and varieties of A330 and A340 machines and is an active professional pilot although formally retired from scheduled airline flying. While with Air Canada he contributed significantly to the development of the airline’s safety-management and flight-quality systems while he was a captain on intercontinental flights.

Don points out, as have others, that the technology exists to do what the ITU proposes. However, he finds the proposal problematic, as do I, and economically and from a safety point of view barely justified, if at all.

It almost goes without saying that people expert in the standardisation of telecommunications are not necessarily expert with the human and organisational factors involved with aviation safety programs. Don is expert. I recommend that ITU delegates read what he has to say below. Some comments of mine follow.

Captain Don Hudson: Response to the ITU’s proposal for real-time flight data transmission

Some important issues have not been addressed in the ITU’s suggestions.

Aside from any commercial priorities and processing/storage/retrieval issues regarding DFDR and CVR data, a number of important issues are not addressed in this announcement.

I suspect that each individual country would “own” their carriers’ data. Given the difficulties with establishing even a “local” distributed archive of flight data within one country, even purely for safety purposes and with limited access, I doubt that such a flight-data archive will be hosted by a world body anytime soon. Within such a proposed arrangement lie a number of serious impediments to international negotiation, not the least of which is the potential for legal prosecution of one country’s flight crews by other countries. Such data could be a legal goldmine for various interested parties and that is not what flight data recorders and CVRs are for. Their purpose is expressly for flight safety.

I submit that such a suggestion to stream flight (and CVR?!) data would be an initiative to solve the wrong problem – the one of disappearance, of which there have been only two very recent cases among millions of flights and billions of passengers, all carried under a safety record that is enviable by any standards.

The main problem that many are trying to come to grips with is certainly real. We needed to know what occurred to AF447 and the results of that knowledge have materially changed the industry in many, many ways. We need to know what happened on board, and to, MH370.

What makes more sense, in place of a wholesale sending of hour-by-hour flight data from every flight at all times, is a monitoring function something along the lines of the work Flight Data Monitoring people perform on a regular, routine basis, but do so on-board the aircraft, using a set of sub-routines which examine changes in aircraft and aircraft system behaviours to assess risk on a real-time basis.

Flight data analysts look for differences-from-normal – a rapid change in steady states or a gradual change towards specified boundaries, without return to normal within expected durations. It is possible to define a set of normal behaviours from which such differences can trigger a higher rate of capture of key parameters. This approach is already in use on the B787 for specific system failures. A satellite transmission streams the data set until either the differences no longer exist or end-of-flight.

Flight data quantity from aircraft such as the B777 is in the neighbourhood of 3 – 4 MB per flight hour. Most events prior to an accident and relevant to that accident are moments to minutes in duration.

The two industry events which have triggered the ITU interest were a rapid departure from cruise altitude & cruise Mach (AF447) and MH370, with which a sudden change in aircraft routing concurred with a loss in normal air-ground routine transmission by automated equipment, (transponder, ADS-B). Both these events lasted moments and would be events that would initiate data-capture and transmission in my proposed scenario. In the MH370 case the transmission would remain active until end-of-flight. If the AF447 aircraft had been recovered from the stall and had made destination, the data would still be in place off-aircraft. Loss of satellite signal occurred on AF447 but the problem would not have prevented an initial data stream (See the BEA Interim Report No. 2 on AF447, p39).

The flight phases defined as “critical” by AF447 and MH370 are from the top-of-climb to the top-of-descent phases, in other words, the cruise phase. From the takeoff phase through the climb to cruise altitude and the descent/approach and landing, no such need for this kind of system exists, because any accident site is going to be within about 150nm or about one-half-hour’s flight time from departure and arrival.

An “on-condition” data transmission would be more practical and cheaper than full-time transmission of flight data, which would bring the notions expressed by many regarding these issues a bit closer to implementation.

Besides flight data, there is cockpit voice recording (CVR). The data issues with CVR transmission require a parallel and separate examination.

Don Hudson

[End of Hudson's Essay]

PBL Comments on Hudson

Concerning the ITU’s statement, I fail to see what either big-data analysis or cloud computing have to do with real-time data transmission from aircraft in flight. I suspect an infection by buzzword.

The ITU is suspected by many people concerned with the Internet, and certainly by most in the IETF, to be a political body more interested in control than in enabling technology. This current announcement does seem opportunistic, and, as one can see by connection to “cloud” and “big data”, some of its senior officers apparently don’t really understand the technology which they want to regulate.

There are further questions which Don does not address above.

Who is going to pay for it, and will the expense be justified? Likely will the satellite companies gleefully support the proposal (see the article’s comment on Inmarsat), given that through it they would be rolling in cash. But where shall that cash come from in a boom-and-bust industry such as airlines? Likely from the passengers through an increase in fares. So one would be asking passengers to pay for a service on their flight for which the rest of the world would only benefit if the flight crashed and said passengers died. That seems to be stretching everyday altruism to its limits.

As Don points out, such a proposal would have helped in precisely two cases in five years. But Air France was found, at a cost according to this Newy York Times article of €115m, and it currently looks as if MH 370 will also be found, so someone will be able to estimate the cost of that. A cost-benefit analysis (CBA) is thus possible, though I guess there would be lengthy argument over the components of the calculations. With a CBA a decision on implementation would come down to attempting to reduce, respectively to raise, the cash paid to satellite companies, and that seems to me to be a commercial issue and not one which governments would or should care to resolve in the absence of demonstrated need. I doubt governments would want to end up paying out of taxes. Surely, any individual government would prefer to put such resources into improving its surveillance capability, and expect to use those covertly in the very rare cases such as MH 370?

To the main point. How would real-time data transmission kit have helped in the search for MH 370?

Likely it would not have helped at all, if the current hypothesis of deliberate human action is validated. Such a system, like the transponder and ADS-B, can be turned off. For reasons of basic electrical safety, a discipline established over 100 years ago, you have to be able to turn off any given electrical circuit in flight. You can make it easier or harder, but no certifiable design can allow that it be prevented. Thus no such system is resilient against intentional human action.


Hijacking a Boeing 777 Electronically

17 03 2014

John Downer pointed me to an article in the Sunday Express, which appears to be one of their most-read: World’s first cyber hijack: was missing Malaysia Airlines plane hacked with mobile phone? by James Fielding and Stuart Winter.

The answer is no. To see why, read on.

The authors interviewed a Dr. Sally Leivesley, who is said to run “her own company training businesses and governments to counter terrorist attacks” and is “a former Home Office scientific adviser“.

…Dr Sally Leivesley said last night: “It might well be the world’s first cyber hijack.”

Dr Leivesley, a former Home Office scientific adviser, said the hackers could change the plane’s speed, altitude and direction by sending radio signals to its flight management system. It could then be landed or made to crash by remote control.

Apparently Ms. Leivesley thinks one can can hijack the Flight Management System on a Boeing 777 with a mobile phone.

First point. Suppose you could. When the pilots noted that the aircraft was not doing what they wanted, they would turn off the Flight Management System. Problem solved. It’s nonsense to suggest that the aircraft “could then be landed or made to crash by remote control“.

One needs to make the distinction, clear to people who know anything about aviation systems, between the Flight Management System (FMS) and the Flight Control System (FCS). If you could take over the Flight Control System you would be able to make the airplane land or crash. The Boeing 777 is an aircraft with a computer-controlled FCS, so it is reasonable to ask whether it is vulnerable. Indeed, I did, on a private list which includes the designers of the AIMS critical-data bus (a Honeywell SAFEbus, standardised as ARINC 659; ARINC is an organisation which standardises aviation electronic communication technologies). The FMS and other systems on the Boeing 777 such as the Electronic Flight Instrument System (EFIS) and the Engine Indicating and Crew Alerting System (EICAS) use the AIMS bus to transfer critical data and control commands (from the LRMs, the computers processing the data on the AIMS bus) between components. A good generally-accessible reference is Digital Avionics Systems, Second Edition by Cary R. Spitzer, McGraw-Hill, 1993.

Second point. Both FMS and FCS are electronically well-shielded. They have to be – it is part of their certification requirements. They are not vulnerable to picking up signals from a device such as a mobile phone. In fact, there are laboratories where you can park an airplane in the middle of banks of very powerful electromagnetic radiators and irradiate it, to see whether your electronics are shielded. These installations are mostly used for military aircraft, against which aggressors might use such powerful radiators, but they are used by civilian aircraft too.

Third point. Any communication requires a receiver. If you want an electronic system S to pick up and process signals from a radiating device such as a mobile phone, there has to be a receptive device attached to S. So anyone wanting to put spoof data including control commands on a bus must do so through such a receptive device. Either there is one there already (such as in one of the components sharing the bus) or someone has to insert one (a physical tap). And it has to “speak” the communications protocol used by, in this proposed case, an intruding mobile phone. As far as I know, none of the usual components sharing the critical buses on the Boeing 777 has a mobile-phone-communications receiver/transmitter, but some airlines do attach such devices to their systems so that they can download data from the Quick Access Recorder (QAR, a flight-data recording device, such as a “black box” but with fewer parameters and not a critical component) at destination airports after a flight, for quality management purposes. As far as I know, a QAR is a passive device that is not able itself to put data on the bus, so hacking the QAR through its transmitter wouldn’t give you access.

Fourth Point. Could you pre-install a receiving device somewhere on the buses, to allow someone in flight to communicate with the bus, to perform what is called a “man in the middle” (MitM) attack? An MitM spoofs one or more components transferring data on the bus. Well, theoretically someone on the ground with maintenance-type access could install such a component, but let’s ask what it then can do. The AIMS bus carries pure data between components who know what the data means. The bus is clocked and slotted, which means that data is transferred between components according to a global schedule in specific time slots; the bus time is given by a clock, of which values all bus users have to be aware. Components communicate according to certain “timing constraints”, that is, slot positions, which constraints/positions are only “known” to the components themselves. So to spoof any component you need to reverse-engineer the source code which drives that component to find out what its specific “timing constraints” are. So you need the source code.

Not only that, but the AIMS bus, for example, runs at 30 Mb/s (million bits per second), and there is a 2-bit gap between words on the bus; that is, one fifteen-millionth of a second. It is questionable whether the available mobile-phone protocols are fast enough. The fastest protocol in the CDMA2000 family is EV-DO Rev. B at 15.67 Mb/s, so no; there is no time to perform computations to determine a “gap” and start putting your bits on the bus. (Basic CDMA2000 is 144kb/s, extendable to 307kb/s; two orders of magnitude slower.) The HPSA+ protocol, from another mobile-phone protocol family, gets 28Mbp/s upstream and 22 Mb/s downstream, so has half a chance of being compatible. But, really, to synchronise with something running at 30Mb/s and a 2-bit sync gap you’d need something running at more like double that speed, I should think. The wireless computer-network protocols IEEE 802.11g or 802.11n could do it in their faster versions. You’d need a device (and receiver) speaking these (why would it be a mobile phone?)

To visualise what has to happen, imagine trying to merge with traffic travelling on a densely-packed motorway at rush hour – on your bicycle. Even if you are going fast, you’re not likely to be successful.

Then you need to figure out, not only the timing constraints, but what data you want to put on the bus. Again, to know this, you need to reverse-engineer the source code for the components you want to spoof. Indeed, to put the spoof data on the bus consistent with the timing constraints for the component you are spoofing, it would probably be easier to have stolen the SW and use that to satisfy the very narrow timing constraints. All millions of lines of it.

Rather than tap in to the bus for a MitM attack, it would seem more reasonable to install a receiver/transmitter surrepticiously on one of the LRMs (“line replaceable modules”, the processors in the AIMS cabinet) and swap it in.

Summary: to achieve a MitM attack on the critical buses of the Boeing 777, you would need in advance to modify physically some of the hardware processors which run the bus so that they have a transmitter/receiver for WiFi signals (of the more powerful 802.11 standards family), and someone has to install such a modified LRM on the target aircraft beforehand. Then, the SW for the various components which the attacker intends to spoof must be obtained and reverse-engineered to obtain the timing constraints and the data-interchange formats, many million lines of code in all. That must all be installed on a portable device, probably not a mobile phone, which you then use in flight.

Dr. Leivesley refers to Hugo Teso’s demonstration in Amsterdam a year ago, in which he showed how to transmit messages from a mobile phone to a flight simulator running on a PC, and to modify FMS commands on the simulator via ACARS and ADS-B vulnerabilities. Neat demo, but it didn’t show you could take over a flight control system, as pointed out by many commentators. For one thing, the PC already has relevant communications devices (Bluetooth, WiFi, and one can insert a USB dongle for mobile-phone reception). Second, it’s just flight simulation software. Who knows what kinds of vulnerabilities it has and who cares? It is not the certified SW running on an actual airplane, or on the actual aircraft HW, and has not been developed, analysed and tested to anywhere near the same exacting standards used for flight control software (Design Assurance Level A, according to the RTCA DO-178B standard to which I believe the Boeing 777 flight control system was certified. Consider even the exacting and resource-intensive MCDC assessment required for DAL A. If you don’t know what MCDC is, let me recommend Kelly Hayhurst’s NASA tutorial. Nobody performs MCDC assessment on PC-based flight simulation SW).

To summarise, the demo was a little like hacking an X-box F1 road race game. It doesn’t mean you can take over Sebastian Vettel’s car in the middle of a real race.

Unlike the authors of the newspaper article, I have put most of these thoughts past a group of experts, including the designers of the SAFEbus. I think Philippe Domogala, John Downer, Edward Downham, Kevin Driscoll, Ken Hoyme, Don Hudson, Michael Paulitsch, Mark Rogers, Bernd Sieker, and Steve Tockey for a very enlightening discussion.

There has been some thought about whether it is feasible for an interception aircraft with transmission capability to fly formation with MH 370, so that there would only be one blip on primary radar, and accomplish such an electronic takeover. The considerations above would still apply, as far as I can see – you still need to modify the physical HW in advance on the target airplane to allow electronic intrusion from outside.


Pete Seeger

28 01 2014

Pete Seeger died early today. It popped up on my iPad as I was reading the morning news.

There is lots to say about Pete, most of it not by me. The New York Times’s obituary by Jon Pareles does justice to the man. His music speaks for itself. Because, as he would probably say, it’s not his music, it’s our music, of which he was one of the greatest exponents. So here are some samples.

He had a number-one hit in the US before I was born, with a Leadbelly song, Goodnight, Irene. As he wrote in his songbook American Favorite Ballads (Oak, 1961), “six months after Leadbelly died, this song of his sold two million copies on the hit parade.“. The Weavers played a reunion concert in Carnegie Hall in 1980, shortly before Lee Hayes died. Here is a video of The Weavers singing Goodnight, Irene at the reunion.

There is a splendid version of House of the Rising Sun recorded by Pete in 1958 on American Favorite Ballads volume 2. What a voice! These recordings now belong to the Smithsonian Institution, which took over Folkways Records. Pete writes in his book of the same name (Oak, 1961) that he learned it from Alan Lomax. It’s there as “The Rising Sun” in Alan’s book The Penguin Book of American Folk Songs (Penguin, 1964). The credits say it was originally in Lomax&Lomax’s Our Singing Country (Macmillan, NY, 1941). Lomax says “A ragged Kentucky Mountain girl recorded this modern Southern white song for me in 1937 in Middlesborough, Kentucky, the hardboiled town in the Cumberland Gap on the Tennessee border. This blues song of a lost girl probably derives from some older British piece. At any rate, the house of the Rising Sun occurs in several risqué English songs, and the melody is one of several for the ancient and scandalous ballad Little Musgrave“.

I sing and play in a band, and we sing The Fox, about a fox who steals a goose and a duck out of the farmer’s pen to take home for his cubs to eat: “Daddy, Daddy, you gotta go back again ‘cos it must be a mighty fine town-O!” As Pete says (op.cit.), “it’s nice to find the fox for once treated as the hero“. We also sing the ubiquitous song “Rye Whiskey”, otherwise known as “Drunken Hiccoughs” – here is Pete’s version, also on AFB. And just last night we were working on a version of Turn, Turn, Turn, a setting of Ecclesiastes 3:1-8 from the King James Bible, one of the great works of English literature. It turns out to be a very difficult song to sing well, but for Pete it seems to be effortless. There is a version by Judy Collins and of course the Byrds’ Top 20 hit (with a notably slim David Crosby), by which I remember being very struck as a teenager in the 1960′s. Fifty years on, the Seeger version stands out as timeless.

So what’s my connection, besides re-singing the music? Thinner than I would like. I never saw, heard or met Pete. But I have a Seeger number of 3, same as my Erdös number. (See the footnote for “Seeger number” – I couldn’t get internal reference to work.)

How I came by my Seeger number. When I was a kid, I heard Malvina Reynolds’s Little Boxes, which was a big hit on the BBC. Malvina was a collaborator of Pete. When I got to Berkeley in the 1970′s, I remember visiting San Francisco and going down towards Daly City I saw all these “little boxes … all the same” on the hillside. It turns out that that Daly City development was the exact inspiration for the song. It’s not such a coincidence, more a déjà vu – on the original video for “Little Boxes” played on the BBC there was a photo of these very same houses. Malvina’s accompanist on that video, and at many gigs, was a musician and composer named Janet Smith. I bumped into Janet one day at a music store in Walnut Square in Berkeley. The owner, who I seem to remember was called Mike, was a classical guitarist who had a stock of medieval and baroque sheet music. I had decided to take up playing the recorder again (I prefer the German concept of Blockflöte – block-flute), so I used to go in there every Saturday to look through his stocks and buy what I could. Janet was looking for some folk-type-wind-instrument player to play music she was composing for the Berkeley Repertory Theater’s production of William Saroyan’s My Heart Is In The Highlands. So I got to be the flautist.

That was 1981, I think. My parents visited late in the year, and I took them along to a production. Producing theatre was my Dad’s favorite thing to do as a teacher of English, and they liked the performance. I didn’t say anything about the music. Neither did they. The Rep had spelt my name wrong in the program. I pointed it out at the end. Dad said “Yes, we thought that sounded a bit like you”. I am still unsure whether that was meant as a compliment.

There is a wider scheme of things. Pete is generation two in the tradition of song collection in the field. That started with the availability of electrical recording equipment, specifically with John A. Lomax in the 1930′s, aided by his son Alan, who was a near-contemporary of Pete (four years older). As his biography notes, Alan was a founder of the notion of, and collector of, world music. We owe a lot to John and Alan, amongst other things the collection of the Archive of American Folk Song in the US Library of Congress, now part of the Library’s American Folklife Center. Pete and Alan knew each other well, of course.

Such collection, at least for music of the British Isles and Ireland and its continuance in North America, is now more or less over, with the advent of festivals and iPods and iPads and electronic devices in every North American, British and Irish pocket. If there is an unnoted singer in this tradition left anywhere nowadays I would be surprised. But, please, I would be delighted to be surprised!

The US folk-music-archival tradition is not that long. It started with Francis James Child, whose life spanned the nineteenth century and who was the first Professor of English at Harvard (before that, he was Professor of Rhetoric and Oratory). Child researched the folk-poetry tradition. He published in the mid-1800′s from his compilations, and realised that most of the work he was publishing stemmed from the Reverend Thomas Percy’s Reliques of Ancient English Poetry, published in 1765. His collaboration with Frederik Furnivall, the founder of the Early English Text Society, turned up a Folio of Percy’s Reliques, and Child started on his 8-volume masterwork, The English and Scottish Popular Ballads. (Scans of all volumes are available through the link.) Child didn’t finish his work – that was left to his successor George Kittredge, who completed the task in 1898. Less than a decade later, John Lomax turned up at Harvard with his interest in folk songs, in particular cowboy songs (Lomax was Texan) and Kittredge encouraged him. When Lomax was back in Texas, he published Cowboy Songs and Other Frontier Ballads.

Child was a literature specialist. He included no tunes. Those were supplied in the 1950′s-1970′s by Bertrand Harris Bronson, a Professor at the University of California, Berkeley, in his four-volume The Traditional Tunes of the Child Ballads, a reprint of which is again available. I recommend to anyone interested in songs and tunes his smaller one-volume resume, The Singing Tradition of Child’s Popular Ballads. I never met Bronson either, but I used to play blockflute, then fiddle at least once a week in Moe Hirsch’s Tuesday-lunchtime old-time music sessions under a tree on the UC Berkeley campus with Bronson’s assistant Lonnie Herman, a Native American professional folklorist. That’s that for tenuous connections, I promise. I guess they are here because there’s a lot of richness in this world which passes us by until it becomes too late, and there is a lot of that in what I am feeling now.

Pete’s not the only Seeger of note in the field-collection and performance tradition. His half-brother Mike was an avid collector and performer, founder of the New Lost City Ramblers. The song Freight Train, composed by a teenage Elizabeth Cotten, who worked for the Seeger family, was sung by her to Mike. I think the NLCR first published another of our band’s songs, Man of Constant Sorrow, which they got from a 1920′s recording of Emry Arthur, who claimed (to someone else) to have written it. They refer to Ralph Stanley’s “G version” of the song as a “classic recording”. It sure is. It’s most recently associated with Dan Tyminski, of the “Foggy Bottom Boys” (that is, singwise, he and Ron Block) in the Coen Brothers’ film Brother, Where Art Thou?.

Finally, a song from Pete’s songbook that I heard lots on the radio as a child, this time sung by its great exponent Burl Ives: The Big Rock Candy Mountain, which I am listening to as I write. It’s just such a jolly hobo fantasy, composed and then sung by people with nothing other than the clothes on their backs. Music is life. For them. For him. For us, for it’s ours.

** Seeger number: length of the shortest path in the graph of who has jointly performed with whom, with root Pete Seeger.

A Book on the Fukushima Dai-Ichi Accident

6 12 2013

In August 2011, we held the 11th Bieleschweig Workshop on Systems Engineering. The theme was the accident at the Fukushima Daiichi nuclear power plant.

We have just published a book on it. An Analytical Table of Contents may be found at the end of this note.

I had convened a mailing list in the days after the accident, after receiving a short note from Charles Perrow which he had written in response to a newspaper’s request for comment. He pointed out there was an obvious, indeed known, design feature that left the plant’s emergency electricity generation systems susceptible to flooding, and therefore that this was not a “normal accident” in his sense, but a design-failure accident. The accident clearly had a high organisational/human factors/sociological component, as do many accidents with process plants. The mailing list, which was closed to enable frank discussion, rapidly attracted system safety engineers and sociologists concerned with the safety of engineered systems as they are deployed. Discussion was intense. I surveyed the New York Times, the Washington Post, and the Guardian every day as key sources of information, as well as the BBC News Live Feed, which ran for a month or so, and the daily news reports from the nuclear regulators on technical matters at the Fukushima Dai-Ichi plant.

Indeed, Charles Perrow himself had anticipated the mechanisms of the accident (flooding of the basement taking out the emergency electrical systems and thus rendering cooling systems ineffective) in his 2007 book The Next Catastrophe. Why, I continue to think, is it a sociologist who put his finger right on a hazard which safety engineers had overlooked for some four decades? It does, of course, require a sociologist to answer such a question of organisational weakness.

I became and remain convinced that engineering-safety sociologists are essential partners to engineers in safety matters with high-impact engineering. Their presence is somewhat subdued with those areas with which I have been more concerned, such as rail, air and road transport but let us hope it can be increased. The first step was to organise a workshop in August 2011 on the Fukushima accident to which system safety engineers, scientists involved in safety, and sociologists concerned with engineering safety were invited. I take workshops seriously: lecturers were asked for 45 minutes of material, given a 90-minute slot, and discussions ran full course.

The University of Bielefeld’s CITEC project, the Excellence Cluster in Cognitive Interaction Technology, which pursues studies in anthropomorphic robotics, generously sponsored the Workshop, thanks to a strong recommendation from its convenor, Helge Ritter, enabling us to bring some stars to Bielefeld as speakers, including Perrow, his colleagues John Downer and Lee Clarke as well as engineering-safety experts Martyn Thomas, Robin Bloomfield and Nancy Leveson. We had some pretty good nosh, sponsored by the UK Safety-Critical Systems Club and Causalis Limited.

The book of essays is now out. The Fukushima Dai-Ichi Accident, ed. Peter Bernard Ladkin, Christoph Goeker and Bernd Sieker, LIT Verlag, Vienna, Berlin, Zürich, Münster, November 2013, 291pp, ISBN 978-3-643-90446-1. List price €39.90. See below for an analytical table of contents.

The book is currently on the WWW page of the publisher, LIT Verlag (there is a language switch button between English and German) and if you click on the image, you get to the product page. The book is also available as a downloadable PDF at a slightly reduced price.

A word about publishing politics. We chose the publisher specifically with a view to (a) keeping the retail price reasonable; and (b) authors retaining our intellectual property. The big scientific publishers generally violate both conditions. For example, appropros (a), a reprint of a single article in a journal from “the big two” scientific publishing firms will cost something similar to the costs of this book. Apropos (b), author contracts for one of those companies require you not only to transfer copyright but also the intellectual property (I personally renegotiated my last contract with this company in 2011. According to the colleagues who assisted that negotiation and who had published with them for twenty years, the company has stopped doing that. My colleagues now self-publish. Their proceedings contain articles from companies as well as academics, and companies do not sign over their intellectual property without compensation – if they are content to do so in a particular case, it’s because what they wrote is anodyne). I don’t agree with either phenomenon and am happy there is an alternative. We availed ourselves of it. My students are now able to afford to buy the book at the student discount price of 40%; this is becoming rare in technical subjects. The costs of studying at university are rising, it appears inexorably. A decade ago, having received offers to publish my system safety book, I decided not to publish at a price which I thought students could not afford. It’s taken me that length of time to understand and pursue alternatives. We are very happy to be working with LIT.

LIT Verlag is hitherto known for its series in the humanities and social sciences. With this book, it is starting a series in engineering, which we hope to continue focusing on engineering in its social context. I am the series editor. If anyone has, or is planning, book-length material which they might wish to publish at a reasonable price, while retaining authors’ intellectual property, please get in touch!

The Fukushima Dai-Ichi Accident: Analytical Table of Contents

Chapter 1: The Fukushima Accident, Peter Bernard Ladkin. Ladkin explains the technical background, the structure of the plant, describes how the severity of a nuclear accident is measured by the IAEA, and comments on what went right and what went wrong in dealing with the events triggered by the Tohoku earthquake and tsunami.

Chapter 2: Hazard Analysis and its Application, Peter Bernard Ladkin. Ladkin explains the background to the safety-engineering technique of hazard analysis (HazAn) in layman’s terms, as well as how one engineers a safety-critical system in general. He compares this ideal picture with what appears to have been done at Fukushima Dai-Ichi and draws some conclusions about safety engineering practice in general.

Chapter 3: The Nuclear Accident at the Fukushima Dai-Ichi Plant: Physics and Parameters, Bernd Sieker. Sieker explains the physics of nuclear power, and then analyses the daily data put out by the operator from its sensors. He concludes inter alia that there was likely only one cooling system at first, and then two, operating for the defueled Units 5 and 6. This suggests a reduction in “defence in depth” (with one system there is no “depth”) which did not cohere with the Japanese self-assessment that Units 5 and 6 suffered an INES Level 0 event. He argues that it should really have been Level 2.

Chapter 4: I’m Warning You, Lee Clarke. Clarke considers the social effectiveness of warnings (and rescinding warnings): what they are meant to do, and how they may operate. Who one trusts in issuing and commenting on warnings. He argues that there is a major question of institutional trust.

Chapter 5: Rationalising the Meltdown, John Downer. Downer argues that, when the public has been assured that a safety-critical system is “safe” and then an accident happens, there are only a very few public responses available to the operators and regulators. He lists and analyses them all.

Chapter 6: Fukushima as a Poster Boy, Charles Perrow. Perrow points out that much about this accident is commonplace or prosaic, and is all but inevitable when we have high concentrations of energy, economic power and political power. He enumerates the resulting phenomena to illustrate this typicality, and the risks we run by indulging it.

Chapter 7: Japan, a Tale of Two Natural Disasters, Stephen Mosley. In this short note, written for a collection of short essays organised by the research group on Communicating Disaster, meeting for the year at Bielefeld’s Centre for Interdisciplinary Research (ZiF), Mosley compares the 1891 Great Nobi earthquake with the 2011 Tohoku earthquake.

Chapter 8: The Destruction of Fukushima Nuclear Power Plant: A Decision Making Catastrophe?, Stefan Strohschneider. In another short note for the ZiF group Communicating Disaster, Strohschneider looks at the decision making in the immediate aftermath and finds it wanting.

Chapter 9: Judging from the color of smoke: What Fukushima tells us about Information Structure Breakdowns and IT development methodologies, Volkmar Pipek and Gunnar Stevens. Pipek and Stevens note that, with all the IT informational systems supposedly available to plant operators, 17 days after the initiation of the accident it was still not clear what had happened and what was going on. They suggest lessons to be learned in the design of such informational systems.

A Fukushima Diary, Peter Bernard Ladkin. Ladkin redacts many of his contributions to the mailing list, as a “diary” of the accident. Many themes arise, from engineering and sociological to political as well as the role of the press and various agencies of various governments as well as the UN, as well as commentary on the daily reports from the regulator about “progress” in dealing with the accident.

Authors: who they are and what they are professionally known for.

Bibliography: 423 items, most available on the World-Wide Web, although some newspaper reports seem to have disappeared from the public WWW at time of writing, and requests to the newspaper have not yielded replacements.


25 08 2013

When I was in school in the 1960′s, detention was what happened to you if you attempted to imitate farting when the French teacher was writing on the board, and he figured out it was you. You spent forty-five minutes after school in a classroom with, quite deliberately, nothing to do.

It turned out to be extremely useful. You learnt thereby how not to be bored, that your own head and its thoughts could keep you occupied.

Next day came the task of analysing with your pals the reason written out by the French teacher, very good training in critical thinking. “Audible expulsion of air through an inappropriate orifice”. That couldn’t be right – burping would have been OK? Must train that ….. “Deliberate audible expulsion of air through a bodily orifice”. That can’t be right either. If you just have to fart, then it can be very uncomfortable if you don’t, as we will advocate to our spouses sixty years later when they note the grandchildren are present. And what if you just failed to make it quiet, even though you tried? There was an unwritten rule that the reason had to be “objective”. “Made a farting noise with his mouth to try to make fun of me while my back was turned” just didn’t hack it, even though 100% true.

All in all, a good education for all of us except the French teacher, who himself probably enjoyed it all too. Not so, though, with all forms of detention.

The past three months have been an uncomfortable revelation. I have spent my life living and working largely in three countries: the UK, California, and Germany, roughly equally. Two of them are involved in substantial surveillance of electronic communications of private citizens of various countries, more than suspected even by many professionals. The third maybe is on some level but has chosen at the government level to protest.

Both the UK and the US defend their activities as necessary to counter terrorism. However, the UK activity is arguably contrary to the 8th article of the European Convention on Human Rights, to which the UK is an original signatory, some 60 years ago. There is no suggestion at present that any other signatory is violating Article 8.

When the revelations about the extent of the surveillance surfaced in June 2013, the British Foreign Secretary, William Hague, said on BBC television “if you are a law-abiding citizen of this country….. you have nothing to fear”. [Ref: the video ]

I -we- have heard this so often. We can choose to believe it, or not. In this sense, it is a highly political statement. I don’t believe it, because of my personal experience. You can, in the first place, wonder what “nothing to fear” means. You may not fear that someone is going to put you in jail and throw away the key, but you may well need to worry whether authorities will make your life in certain ways very much harder indeed.

The statement is misleading. If you are detained and questioned by the authorities against your will, it will be claimed that they are trying to decide whether you are law-abiding or not. The people who detain you, of course, do not have that power. The courts do, but there is a long way between being detained and getting a court to rule, and that makes your life harder, as certain journalists for the Sun newspaper can attest. When the court substantiates your claim, the “nothing to fear” phrase, by this time manifestly false, mutates into “justice has been done”.

Mr. Hague should have said, more accurately, that if you are detained and investigated by the authorities on the basis of information gleaned from the surveillance program, and you are later found by the courts to be law-abiding, justice will have been done. Doesn’t sound quite as carefree, does it?

One of the ways in which this process works is this. Suppose you presume yourself to be a law-abiding citizen of another country besides the UK, and you think of the UK as one of the lands of the free. You fly from Germany back home through London Heathrow airport. You are carrying sensitive informational material about which there is considerable dispute that it should be in the public domain. But you are travelling on behalf of two trusted colleagues, one of whom is your life partner, and they are working through a highly respected multinational news organisation headquartered in the UK, through which you are passing. This trust is absolute: your life partner is publicly working on a public project of great significance and your trip is backed and paid for by his widely-respected employer, who has paid considerable, and continuing, public attention to the morality of the entire project. There is no chance that anyone is trying to pull wool over your eyes.

You might worry that Mr. Hague’s statement does not necessarily apply to you, since you are not a UK citizen. But neither are most of the other 244,999,999 people who pass through the UK in transit in a year. You are unlikely to interpret Mr. Hague’s statement in the narrow sense as “citizen of this country… nothing to fear (but anyone else we do what we want with, so watch out!)”.

As you transit in LHR, you are detained by the London Metropolitan Police. They inform you that you are required to answer all their questions, and threaten you with arrest and jail if you don’t. Worse, you are not at all fluent in their language. (You may think you can get by in another language, up until you have an adversarial transaction based on rules about which you don’t have the slightest idea. Happened to me all the time the first few years in my current job.) They won’t let you contact your lawyer. They offer you one chosen by them.

You have lots to fear. You have been threatened with jail. This goes on for eight hours until you can talk to your lawyer. They detain you for another hour, then you are released. Without any of your electronic devices, all of which have been retained.

This is a harrowing experience. I am quite sure that the people conducting it know it is a harrowing experience. It would be reasonable to presume this is part of the point. Whatever the status of this procedure, somebody really doesn’t want you doing it again, whatever “it” is. You think you are couriering material for established journalists on behalf of one of the world’s great news organisations. Someone really doesn’t want you doing that. You think it’s legal. You think you are law-abiding. Your country, indeed, officially protests to the UK.

This all seems outrageous to you, your colleagues and your government. It is an international “incident”. The reinterpretation starts.

The police detained you, questioned you and threatened you with jail if you didn’t answer their questions (which presumably include the passwords for your various personal electronic records, which you had thought were private – but probably not any passwords for any sensitive records you were carrying because you wouldn’t know those). They did so under a power granted by UK law called the Terrorism Act 2000.

This law gives the police the power to detain you without suspicion, specifically to ascertain if you are a terrorist. It allows them to search materials you have with you (for example, all the electronic devices) to that end. It allows them to do so for nine hours. Then, if they haven’t found anything that says you are a terrorist, they must let you go.

Now, surely the authorities aren’t going to do that unless they have some inkling beforehand that you might be a terrorist, are they? That would be – disproportionate, and the law does not allow them to exercise these powers of detention disproportionately.

So, the first part of the reinterpretation. The authorities will be in course of “trying to determine whether you are a terrorist” for the next little while. They have to do that, for otherwise your detention will clearly have been unlawful. It has been acquiesced in at the highest level of UK government (the government was very careful to explain that acquiescence is the highest level of authority they can exercise over this procedure) – it would be very inconvenient for the government if your detention should be found to be unlawful. So don’t be surprised if you should turn out to be a terrorist, unbeknownst to you.

The second part of the reinterpretation. Just for good measure, there are not one but two grounds for your detention, according to the Home Secretary. One is to determine whether you are a terrorist. The other is to determine whether you are carrying materials whose use by a third party would endanger UK national security. In this case, that would be the information on the electronic devices you were carrying.

The law is clear, as explained by a former government minister who helped draft it and pass it. You may be detained and searched to determine whether you are a terrorist. Full stop.

So, now, expect that the definition of “terrorist” henceforth will include “someone carrying materials whose use by a third party would endanger UK national security”.

Back to Mr. Hague’s statement that “citizens of this country…. have nothing to fear.” Yes, we do. It is that we might be carrying informational materials through a UK airport which, unsuspected by us, would endanger UK national security and therefore, unsuspected by us, we are terrorists as lately clarified.

Who determines whether some material “would endanger UK national security”? Courts convening in secret, now, thanks to recent legislation. People such as myself who think they themselves generally perform public service (my day job, in fact), and know they are carrying material which they wouldn’t want some authorities to see, could find themselves detained, declared “terrorist”, and be unable to find out why, because the reasons are secret.

Suppose you are a lawyer working on a case against one of the UK’s “strategic industries” (we don’t have any -besides surveillance, that is – but I think it is about time we did, just as Germany and France tacitly do). You have material relevant to your client’s case, which you think is good, even though others, including some in the government broadly interpreted (those “supporting” that “strategic industry”), think it isn’t. You go to get on the Eurostar and … zap!… nine hours detention (shortly, we are told, to be six), forced to answer each and every question put to you on pain of arrest and jail, and all your kit is retained. You are forced to give passwords to your material. Of course, the authorities are not entitled to read it because it is legally privileged….. so of course they don’t, do they, and because they don’t, of course they can’t pass any of it on to your opponents in that strategic industry, can they. This is Britain after all! But, unfortunately, there are one or two bad apples in any police force ……

The nice Mr. Hague, Mrs. May and Sir Bernard Hogan-Howe wouldn’t let any of that happen, would they. But what of their successors in 10 years? 20 years? When the English Defence League is in government? The point of the law is that we should not have to rely on the benificence of our politicians to be treated appropriately by the state. I mean “appropriately” here in the supralegal sense of morally decently. What do I mean by morals? I mean that this kind of thing should not, should never, even in principle, be allowed to happen. It must be decisively ruled out in law.

For the moment, though, it seems as if, when UK authorities think you might be carrying information which someone, secretly, could deem detrimental to UK national security for some reason, Mrs. May thinks it’s quite appropriate to detain you for nine hours – shortly to be six – and force you to answer all questions, in any transit area of any UK port, UK citizen or no.

Whatever you think happened last Sunday at LHR, for many professionals engaged in sensitive, even privileged, activities this would seem to contradict Mr. Hague’s assurance that they have nothing to fear.

Saying the Wrong Thing

28 03 2013

The Guardian yesterday wrote an encomium to the UK government’s Chief Scientific Advisor Prof. Sir John Beddington (I hope they don’t mind that I quote in full):

Politics may not be the enemy of scientific method, but they are hardly intimate friends. Science inches along by experiment, evidence and testing (and retesting); politics is often about bold moves executed on personal judgment. So the chief scientific adviser to the government has his or her work cut out. But John Beddington, who has held the post since 2008 and retires this month, has trodden a thin line with grace. Three crises broke on his watch – the Icelandic volcano eruptions, Fukushima and ash dieback disease – and in each he showed a useful caution: compare the political hysteria over Fukushima in Germany with the calm that prevailed here. Mr Beddington has also been an advocate for science, by spearheading the push to install a chief scientist in each Whitehall department. And in raising the alarm about “a perfect storm” of rising population, falling energy resources and food shortages, he did the right and brave thing.

Concerning what he said on Fukushima, I wrote to the ProcEng list on 16.03.2011:

………. (BBC tweet at 1431): “The UK Government’s Chief Scientific Officer, Prof John Beddington, has sought to allay fears of radiation exposure. He told a press conference at the UK embassy in Tokyo: “What I would really re-emphasise is that this is very problematic for the area and the immediate vicinity and one has to have concerns for the people working there. Beyond that 20 or 30 kilometres, it’s really not an issue for health,” he says. The full and very interesting transcript is available on the embassy’s website.”
The key phrase, for those not familiar with British modes of writing, lie in the phrase “very interesting”. I infer that the BBC thinks Beddington['s comment is contentious]. …..

The Guardian cites three well-known events. The Icelandic volcano eruption and the ash-dieback event pose/posed no threat to human life and very little to general human well-being as broadly construed. The British air traffic service provider reaction to the Icelandic volcano eruptions was exemplary, in particular in face of the engineering uncertainty and the pressure from the airlines.

However, the Fukushima event involved some considerable danger to people. He got that wrong, contrary to what The Guardian suggests. At the time he was making his soothing statement above, the Japanese government itself, extremely concerned about the lack of reliable information on the accident it was receiving from TEPCO, was discussing plans to evacuate Tokyo. And not even TEPCO had an accurate idea of how dangerous the circumstances were. The event at Fukushima, as we now know, could have been very much worse than it was and is, and, even though we were spared the very worst, it still could be worse than we think. Sir John, a population biologist and not a safety engineer, was inadvertently misleading his audience on a matter concerning danger.

That is, of course, one of the disadvantages of the job, when one must make public pronouncements on matters on which one is not especially expert. But I wonder why he had not received better advice?

Moving on, it is hard to leave this particular comment of The Guardian alone:

compare the political hysteria over Fukushima in Germany with the calm that prevailed here

The Guardian calling the German reaction “political hysteria” is just silly. There is considerable and long-standing political opposition to nuclear power here in Germany, including a permanent platform from a major party who has been in government, namely the Green Party. Chancellor Merkel simply adopted the Green Party platform, whereas her party had previously been “for” continued use and further building of nuclear power stations. That is normal democratic, opportunistic, representative politics. Considering that the building and use of nuclear power stations involves large amounts of taxpayers’ money being paid to private corporations – in Germany’s case, to assure them a “reasonable profit” to which they claim they have a legal right – there is a moral obligation for politicians to pay significant attention to what ordinary people think on the matter, and some evidence that, apart from the Green Party, they had not been doing so. (A more detailed comment from TheRealPM is on the Guardian page.)

Lest we forget, nobody, not even Germany, has solved the problem of what to do with the waste. It’s fifty years and counting. Someone will have to think of something soon.

Root Cause Analysis

5 02 2013

The International Electrotechnical Commission, IEC, is currently preparing an international standard to be known as IEC 62740 Root Cause Analysis. I prepared some material for potential inclusion in the standards document but as of writing it appears it will not be used. I think it is quite useful, so I make it hereby available.

The paper on the RVS WWW-site, Root Cause Analysis: Terms and Definitions, Accimaps, MES, SOL and WBA, consists of

  • a vocabulary I put together defining the terms I think are needed to talk effectively about root-causal analysis, based on the International Electrotechnical Vocabulary, IEC 60050, which all international electrotechnical standards are required to use. I am not completely happy with a variety of the definitions of fundamental concepts in the IEV. I make my discontent clear through notes which I have added to the IEV definitions. Other concepts are new, and not (yet) in the IEV. Readers might like to compare with the vocabulary which I prepared in 2008 for system safety uder the auspices of Causalis Limited, Definitions for Safety Engineering.
  • Brief introductions to the root cause analysis methods for accidents, Accimaps (from Jens Rasmussen, successfully applied by Andrew Hopkins and now the Australian Transport Safety Board in Australia), Multilevel Event Sequencing (MES, from Ludwig Benner, Jr. formerly of the US National Transportation Safety Board), Safety through Organisational Learning (SOL, from Babette Fahlbruch and SOL-VE GmbH, used in the German and Swiss nuclear industries), and Why-Because Analysis (WBA, originated by me and developed by colleagues at Uni Bielefeld RVS and Causalis Limited, used by two divisions of Siemens and now the German Railways DB, as well as Causalis for its accident analyses for clients). Each method description includes pictures, so readers get an idea of the presentation of results, a short section on process – what one does, and a section on strengths and limitations.

I think it would be a good think to have similar descriptions for all methods in current industrial use for root cause analysis of significant incidents. My personal list of such methods stands currently as follows:

  • Accimaps (in the document)
  • Barrier Analysis. BA is really an a priori method favored in the process industries, but also used post hoc to determine which barriers failed and why. Typified in Reason’s “Swiss Cheese” diagram.
  • Causes-Tree Method (CTM). Widespread and, I am told, sometimes legally required in France for accident analysis.
  • Events and Causal Factors (ECF) Analysis and Diagrams. ECF is dealt with extensively in Chris Johnson’s Failure in Safety-Critical Systems: A Handbook of Accident and Incident Reporting
  • Fault Tree Analysis (FTA). I had considered FTA primarily an ab-initio risk-analysis method at system design, but Nancy Leveson tells me she has seen more root cause analysis performed with the help of fault trees, sometimes put together after an incident rather than pre-existing, than with any other technique.
  • Fishbone or Ishikawa Diagrams. These are minimally a method, more a presentation technique, and not one I find particularly helpful. More applicable in industrial quality control than in significant-incident analysis, I would think.
  • Multilevel Event Sequencing (MES, and its associated technique STEP), in the document
  • The Reason Model of human operational analysis, involving human error in operations, classification such as skill-based, rule-based and knowledge-based operations (SRK), the notion of latent errors, or misdesign of operations allowing mishap sequences to occur normally, the “Swiss Cheese” model.
  • Safety through Organisational Learning (SOL, with its associated toolset SOL-VE), in the document.
  • STAMP and its associated methods, Leveson’s feedback-control-system model of critical-operational control, applied to the Rasmussen-Svedung hierarchy of operational, organisational and institutional context, dealt with extensively on Nancy Leveson’s WWW site
  • TRIPOD, a method developed over many years by oil companies in cooperation with Jim Reason’s group, and in wide use in the oil industry
  • Why-Because Analysis (WBA), in the document.

Besides these, there are special methods for root cause analysis of incidents involving human operations; maybe one can call these “human factors root cause analysis” methods. Amongst these are:

  • Connectionism Assessment of Human Reliability, CAHR, from Oliver Sträter’s group at Kassel, which has been used in analysing marine accidents and incidents.
  • Human Information-Processing Models. These originated with Peter Lindsay and Don Norman, include methods sometime used by NASA’s human factors research group (NASA Ames, at Moffett Field in California). Our PARDIA classification is such a model.
  • Human Factors Analysis and Classification System (HFACS).
  • Management Oversight and Risk Tree (MORT), developed by William Johnson for the US Nuclear Regulatory Commission and widely used in the US nuclear industry.
  • The SHEL model (note that the referenced page spells it mistakenly with two “l”s).
  • Shorrock and Kirwan’s TRACEr model for identifying and classifying cognitive error in air traffic management and control operations. For example, see this paper.

There are other promising methods which I could include, but I don’t know how much industrial “traction” they yet have. If readers could let me know of other worthwhile methods which have found some foothold in industry, I would be grateful. I would be even more grateful for descriptions of methods similar to those that are already in the document! Authorship will of course be acknowledged in the usual manner.

The State of Modus Ponens and of Rational Discussion

28 12 2012

A bit of intellectual biography, prompted by a couple of days’ free time leading me to a paper written 27 years ago by a pal, which I have just read. I say a little of what’s in the paper, to encourage others to read it. And then I comment on a couple of disappointing aspects of the WWW, and of academic work here in Bielefeld.

I am reading a collection of papers on The Law of Non-Contradiction, edited by Priest, Beall and Armor-Garb (Oxford University Press, 2004, reprinted 2011), for a seminar I offer on the subject of paraconsistent logics. Amongst them is a paper by Vann McGee, an MIT logician and philosopher, on Frank Ramsey’s Dialethism. Dialethism is the position that there are logically incompatible assertions that are true. In this case, says McGee, “…sometimes Ramsey is willing to count each of two classically logically incompatible theories as true”.

I am interested in such phenomena because I am interested in reasoning in general, and have been induced by a Bielefeld student, Daniel Milne, who has been following such matters for some time, to become interested in reasoning about and reasoning in stories – fiction. For, one could say, one of the ways in which much fiction works is to induce us to reason about situations invented by the author, who may well not be constrained in general by the “laws” of reasoning applying to the physical world. One can imagine a story in which, while I am sitting listening to you present a paper on dialethism in my seminar, you are simultaneously off waterboarding a tax collector. You cannot be in two different places at once physically, but in a story you can be so without the story appearing to be incoherent. But other stories are incoherent – think of Finnegan’s Wake. How to mark a difference?

Further, many stories involve people and objects which are not real, which are invented. Do these people and objects “exist” in some way? If so, then certainly not in the way in which you or I exist, for we are “real”, “actual”, or however you might like to describe us, and the invented entities not – they don’t have an address or an ID card or pay radio licence fees and nobody is going to go looking for them to insist they sign up for any of these. But we can’t just say “anything goes”, that we can reason about such invented entities any way we like. What about that superficial referring phrase itself, “invented entities”? Does it refer? In one sense, obviously it does: you know exactly what I am talking about, because I told you: things invented by people to occur in stories they write. In Fregean logic, modern formal logic, or on Russell’s interpretation of putatively-referring terms, however, the term doesn’t refer. But singular or plural terms in “classical” formal logic (that is, the post-Frege traditional complete formulations of propositional and predicate logic) must refer. What, then, does this term do and how does it do it? And what logic, if definitely not classical as just noted, is involved in reasoning concerning it? Say, in helping to explain what this very paragraph means?

I was in grad school at Berkeley with Vann McGee, who entered a year later than I did – or was it two years? We were in the Group in Logic and Metholodology of Science, started by Tarski and having 15 or so graduate students pursuing PhDs, and about four times that many faculty members, some of whom we never saw, such as the game theorist John Harsanyi, who was to win a Nobel Prize. Vann entered at the same time, I think, as Shaughan Lavine, a loquacious logician interested in physics – at that time Shaughan wanted to solve the riddle of the intellectual incoherence of quantum mechanics, but thought it would take decades and thus the enterprise couldn’t really start until one had tenure, so he considered it wise to pick lower-hanging fruit for PhD and pre-tenure work. Shaughan left Berkeley after a couple of years because he didn’t see how it was actually possible to get a PhD degree in the Group in the environment prevailing in the 1970′s. He worked as an editor for the Physical Review, and came back at the end of the decade to work on a technical problem in mathematical model theory which he thought he could crack in a couple of years (in fact, it took him eight more years, underlining the accuracy of his earlier observation).

I was very interested in mathematical logic. In fact, I came to Berkeley being most interested in the Scott semantics for the lambda calculus, but I found nobody else there interested in them, except a young Japanese scholar, Reiji Nakajima, working with a temporary faculty member in the Computer Science Department who was applying it to programming languages with recursive constructs, so I thought. My interest in computation was theoretical – Turing machines, recursive functions and the like, and I came from a university which had a research group in programming languages, but no department doing work in computing science – the Oxford Computing Laboratory was largely for people who wanted to solve applied-math problems numerically and I was utterly uninterested in those at the time (things would change!). I classified the Berkeley Computer Science Department in that shoebox – one of the many and varied intellectual mistakes I have made in my career, and this one took me a decade to correct.

Even then, I think I was equally or more interested in questions in philosophical logic than in set theory or model theory, but there were more people doing the hard math and I didn’t think you could get a job doing philosophical logic. Further, the math seemed “hard” and the philosophical logic “soft”. The math was hard – it proved too hard for me in the end. But I was worried about career prospects in philosophical logic. I knew some physicists in the mid-1970′s who had told me that at that time there was just one tenure-track academic job in theoretical physics offered in the whole of the US. I thought philosophical logic was going the same way. So rather than follow my inclinations away from math, I went into it even more – I even taught myself and taught others numerical analysis (at both the undergrad and grad levels) because I thought I’d have more chance of a job doing something related to what I enjoyed. I didn’t realise that the South Bay was about to explode into Silicon Valley and help logic become one of the largest applications of mathematics after calculus and numerical algebra and analysis. But the varied non-logic mathematical skills I learned have proved invaluable to me; I don’t regret at all the time spent developing them.

Back to Vann. Vann was quiet in classes and conversation, but his observations and conjectures were pertinent and incisive when he made them and he was obviously both very clever and very able. As well as giving us the impression of being quite other-worldly. None of us at that time in the mid-1970′s knew how to get out of Berkeley with our PhD degree (indeed, the university itself was to recognise the fact that too many clever graduate students were often having too much demanded of them, and was to initiate change), but Vann gave me the impression of not caring about it that much, as long as he could carry on thinking about technical matters in measurement theory and conditionals and all those problems ignored by the logicians in the Math Department. He finished in 1985, having written not only a thesis on Truth and Necessity in Partially Interpreted Languages, but also having done work in the Theory of Measurement (one of Ernie’s Adams’s interests, as well as one of Pat Suppes’, down the road at Stanford) and in the Logic of Conditionals (the major Adams theme along with Probability Logic). Some of his work on conditionals was published the year he was awarded his PhD, in the Journal of Philosophy, a – some say the – leading journal.

I just read the paper, after 27 years. Which is part of what prompted this note.

Me, I’d gone “applied”, having taught math and computer science at two California State Universities, San Francisco and then Hayward to try to support myself while working on my degree in the copious free time :-( left to me on a full teaching schedule at a teaching university. I managed to reprove a result of Humberstone in algebraic logic without realising it, as Johan van Benthem noted when I explained my result. My resolution to “stop reading and get down to working!” had been taken two papers too soon :-( . “It shows what you can do!” said Johan helpfully, but it didn’t seem any consolation at the time after that couple of years’ work. I got my first real break in mid-1984, with a temporary job in SRI’s Computer Science Lab. That helped me write half a thesis on eliminating quantifiers in naive set theories, but that effort ended some months later when my job ran out. The second break was at my next job, at the Kestrel Institute starting in late 1985, where I was put to work on devising a computational system for reasoning about time. Cordell Green pointed me at James Allan’s work on intervals in interpretation of reasoning about time in natural languages, and I recognised a Relation Algebra, which is something I knew something about in algebraic logic, and about which my pal Roger Maddux knew much more. We got some significant new mathematical results (largely his) as well as data structures and algorithms (largely mine, some implemented). I had a book contract with MIT Press Bradford Books together with Pat Hayes (which remains to this day unrequited), submitted my thesis and was awarded my PhD degree in 1987. My code, written in the now defunct language REFINE, which was very modular and mostly declarative, persuaded me of the value of declarative languages with strong typing and rigorous modularity. I spent six months writing code to perform calendrical calculations according to my data structure (to computer scientists a “model”, but not to logicians), for the Project Manager part of the Knowledge-Based Software Assistant project of the USAF. I gave the code along with API to the integrator of the KBSA-PM. She spotted one error (a boundary value) inside a couple of hours of testing – and then the code ran seamlessly for demo at AAAI in 1986 and in the KBSA-PM delivered to the Air Force, for the next ?few years? as far as I know. In the last twenty-five years, we have not gone forward much in industrial programming languages. All the issues I was able to avoid seamlessly by using REFINE still occur all the time in the industrial systems I am acquainted with.

Shaughan finished a year later, in 1988. He had solved a major technical problem in admissible model theory and was successful in his job search at the very time that philosophical logic was suffering the fate of physics a decade earlier – I think he got the one tenure-track job in philosophical logic available at the end of the 1980′s. He was at Stanford – although I was in Palo Alto at my job most days in the week, I never met up with him there – and then went to Columbia, where he wrote his book Understanding the Infinite (Harvard University Press, 1994, reprinted 1998). I haven’t seen Shaughan for twenty years, nor Vann for thirty.

Man, what a paper that is which Vann published in 1985! A Counterexample to Modus Ponens. Tim Williamson in The Philosophy of Philosophy (Blackwell, 2007) calls Vann a “distinguished logician” while explaining one of these results (see for example, this citation).

Let A and B be things you assert (sentences, say, or propositions or statements, if you believe in those and can say what they are). “Assert” means something like “claim to be true”. Modus Ponendo Ponens is the inference rule whereby, from an assertion of A and an assertion that if A then B, you may infer B.

According to the Stanford Encyclopedia of Philosophy, Aristotle discussed a forerunner of Modus Ponens called Theophrastus, whereby from the premises if something is F, it is G and x is F one may infer x is G. Modus Ponens concerns general assertions, whereas Theophrastus is concerned with objects having properties or characteristics, and properly belongs to the logic of predicates rather than to propositional logic.

So, what is an inference rule? What are doing when you “infer”? One common explanation, the “classical” explanation (although “classical” here means largely the 150-year-old Fregean tradition) is that asserting A and A implies B or if A then B means you are taking these sentences to be true. Inference then means that you take the third sentence B also to be true on the basis of the truth of the first two. The rule is said to “preserve truth”. A rule of inference which preserves truth is said to be valid.

There are two main ways of formulating the logic of whole sentences, propositional logic. One is to give a set of axioms – a collection of logical truths (sentences guaranteed to be true just in virtue of their form, such as A implies A, or (A and B) implies B) and just two inference rules: Substitution and Modus Ponens. Substitution says you may replace any schematic letter, such as “A” in the two logical truths just given, by any sentence whatever. This is truth-preserving, because the logical truths are so because of their form, not their content, so no matter what “A” is, something of the form “A implies A” will be true. That “no matter what” phrase is another way of expressing Substitution. No one queries Substitution; it is one of the basic mechanisms of logic as truth/assertability according to form and not content. It looks to be significant for this century-and-a-half-long conception of logic that Modus Ponens may not be truth-preserving when “if…then….” is used in natural-language reasoning! The other way of formulating logic consists of giving no axioms, but plenty of rules of inference, indeed some (“introduction” rules and “elimination” rules) for each logical constant. Modus ponens is the “introduction rule” for the conditional in this formulation. So either way Modus Ponens is key. (The first type of system is popularly ascribed to the German mathematician David Hilbert, the second to the German logician Gerhard Gentzen.)

In fact, when “implies” is taken to be what is called the “material conditional”, Modus Ponens is truth-preserving, as Vann points out. The material conditional is the intepretation of “implies” whereby “A implies B” is taken to be equivalent to saying “either Not-A or B”. One interpretation of logic, one explanation of the meaning, takes the “logical constants” in propositional logic, the connectives “and”, “or”, “implies” and “not” to be purely functions of the truth or falsity of the sentences they combine. This, along with the claim that every sentence whatever either is true or is false, constitute the basis of what is called classical propositional logic (that is, the common propositional logic since Frege).

It is easy to see that, when “implies” is the material conditional, Modus Ponens is truth-preserving, as follows. You assert A. A is taken to be true. You assert A implies B, that is, either Not-A or B. So this is taken to be true. But you have taken A to be true, so it follows that you cannot take Not-A to be true as well, for you would be contradicting yourself (the so-called Law of Non-Contradiction is another foundational principle of classical logic, but exactly what it means can be questioned – see the more than 240 different variations pointed out by Patrick Grim’s article in the eponymous op. cit.). If the “Not-A” part of the true either Not-A or B isn’t true, then it must be the “B” part that is true. That shows that Modus Ponens is truth-preserving, because B is exactly what Modus Ponens infers from the first two sentences.

People using formal logic in mathematics generally take “implies” to mean the material conditional when they are using logic or talking about it. And they take this to be settled. But they also infer, as a professional activity: they prove theorems from other mathematical “facts” (theorems). It is prima facie apparent that inference of this sort may well not be the same kind of activity as when, looking out from my room, I see your shadow on the street and infer that the sun is shining. For that is defeasible – somebody may have turned a searchlight on you on a cloudy day. Whereas mathematical theorems are not usually taken to be defeasible in the same way – they taken to be wrong only if their author has made a technical mistake in reasoning, not if the phenomenon they assert is valid but otherwise explained.

When Vann points out apparent counterexamples to Modus Ponens, he is noting that there are conditionals, “if…then…”-statements, in the language we use, and if one is trying to formulate truth-preserving inferences using those notions of implication, then formal Modus Ponens doesn’t preserve truth.

On the face of it, he’s right. “On the face of it” means that the arguments he uses are formally of the Modus Ponens form (except for a couple of minor typographical differences which are assumed to be contingently grammatical and not substantial). The question is how to explain the phenomenon. Vann suggests it is crucial that the “B” part of his counterexamples is itself a conditional. That is, there is an “if ….. then….” as the “then”-part of an “if….then…..”; known as “nested conditionals”.

There is a substantial amount of work on the logic of conditionals. They seem to be quite tricky, so it is really not surprising that phenomena such as Vann identified have remained unnoticed for so long. Ernie Adams wrote an influential eponymous book on the logic of conditionals, published in 1975. David Lewis addressed it in a number of seminal papers as well as a book, Counterfactuals (Havard U.P./Blackwell’s 1973, reissued Blackwell’s 2001). Jonathan Bennet has an extensive survey of some 380 substantial pages (A Philosophical Guide to Conditionals, Clarendon Press, Oxford, 2003). One locus classicus is a set of papers edited by Frank Jackson (Conditionals, Oxford University Press 1991, unfortunately out of print).

Vann considers also the interpretation of if A then if B then C as if A and B then C and vice versa (he calls this the “law of exportation”, the “law of importation” being the interpretation of the second as the first), and notes that, if these laws are correct interpretations of conditionals, the difficulty is “basic”: that you are stuck with taking “if … then …” to be the material conditional (which it can’t be, because if so there would be no counterexamples to Modus Ponens) or the logically most powerful conditional called “strict implication”, whereby “if A then B” is true only if in every possible world in which A is true, B is also true. Which wouldn’t seem right: “if I have my brown jacket on, then my grey jacket is at the cleaner’s” tells you something about my clothing habits in this world in which we actually live, and tells you nothing about another world, odd but possible, in which I have a pathological hatred specifically of wearing grey jackets and would never do so, even if there were fifty in my closet and I only had my brown one otherwise.

That is a powerful and surprising result.

He goes further, in showing that Robert Stalnaker’s account of a certain kind of conditionals called subjunctive or counterfactual conditionals (conditionals in which the antecedent, the part following the “if” and before the “then” are not actually true but hypothetical) is “inaccurate” (Stalnacker’s account is in A Theory of Conditionals, in Studies in Logical Theory, American Philosophical Quarterly, Monograph 2, 1968, reprinted in Jackson op. cit.). He means wrong, if the law of exportation holds. This is also a significant result, for at the time the Stalnaker and closely-related Lewis semantics for counterfactual conditionals were held to be the best accounts. (They are still the best available for many purposes. Forty years on, we use the Lewis semantics for counterfactual conditionals in my technique for causal analysis of accidents, Why-Because Analysis, where it works very well in the context of complex engineered sociotechnical systems.) The issues with counterfactual conditionals in particular were, I believe, first raised by Nelson Goodman in a paper The Problem of Counterfactual Conditionals, Journal of Philosophy XLIV(5), February 27, 1947, available through JSTOR to those with access. It is also reprinted as Chapter 1 of his book Fact, Fiction and Forecast (Harvard University Press, 1984).

There is much, much more in this short paper. I am so glad I read it finally.

On to my second theme, somewhat distressing. As I have written before, I thought in the mid-1990′s that the advent of the World-Wide Web would render the business models of traditional academic publishing obsolete. That hasn’t happened, to my regret as well as sometimes to my annoyance. But the WWW has led to on-line discussions, and there are various software available to format ongoing discussions of any and all subjects on the WWW. Instead of searching out a bunch of like-minded people to meet to discuss raising blue goldfinches, you can find them right there in the blue-goldfinch forum! What a wonderful enrichment of our lives.

I looked for discussion of Vann’s paper. I only found two discussions in forums on the first few pages of the Google search. The second entry in the Google search for the paper was a discussion on TalkRational: A Republic of Free Thought. A “moderator” brings up McGee’s paper in 2010, a quarter-century after publication. Kudos for drawing attention to it, one might think, but consider hisher comment:

(1) I think that the most obvious problem with McGee’s argument is that he equivocating between two radically different ways of construing the relevant statements. Are there any other problems with the argument that you see?
(2)Is Vann McGee retarded? Seriously, is there any reason whatsoever why his argument should be persuasive?

which is partly personally abusive. Heshe says in a later note:

McGee has basically become a rock star in philosophical logic because of this argument, too. It’s a pretty tragic statement on the condition of contemporary philosophy.

The discussion goes downhill from there, quite steeply. Most people seem to want to deprecate McGee personally, as the moderator implictly does.

Such a combination of incomprehension and abuse is unfortunately rife on WWW forums. It doesn’t seem to happen to anything like the same extent on subscription-only e-mailing lists. This is one area in which e-mail seems to serve a function which the WWW does not, contrary to what one might have anticipated. I regret, and am frustrated by, the low standard of such forum discussion. Recall that this is a discussion which appears high on the Google list responding to the query “vann mcgee modus ponens”.

I wish for a different world, a world in which papers and arguments can be presented and discussed on the WWW the way they are presented and discussed in colloquia, conferences and the better journals. We are unfortunately a long way from that.

On to my third theme.

The first PhD to graduate whom I advised in Bielefeld was Thorsten Scherer. Thorsten built a mobile robot to perform lab assays automatically. I became his advisor after his original advisor left Bielefeld and Thorsten didn’t want to follow. His robot worked in a biotechnology lab. It drew samples from a large (industrial-scale) fermenter, which was producing cells, took them to and installed them in a centrifuge, started the centrifuge, removed them when it stopped and took the results to and installed them in an assay machine. These devices were distributed around the lab. Thorsten had developed the robot to such a degree of reliability that it worked at night when nobody was around. It only spilled stuff one time, near the beginning of development.

I was very impressed by this piece of system engineering. Thorsten had put together algorithms – recognition, motion and control algorithms – some of which he had gleaned from the literature and many of which he had devised himself and had integrated them in a piece of hardware which performed its chosen task to a demonstrated high level of reliability (achieving the task as wished) and safety (avoiding spills, collisions, breakages).

Readers will appreciate that most academic contraptions of this sort are “proof of concept”, that is, their devisers can get it to do what it is supposed to do some of the time, at least once or twice. Adding dependability to such “proof of concept” devices comes out to around ten times as much work, as an industrial rule of thumb. It is very frustrating to those of us who work in the area that, with some notable exceptions, dependability issues are largely ignored in academic computer science, for they are not intellectually trivial. Most of us end up spending far more time talking with industrial engineers than we do with fellow academics.

I thought this superb work, and proposed Thorsten for a summa cum laude designation. So did his second thesis reviewer, his ex-boss. But it was vetoed by the Chair of his committee (as thesis advisor, I could not be Chair) on the basis that he had taken too long – seven years, I think.

Another example. I had an Indonesian scholar in my group, I Made Wiryana. Made’s thesis was on what I would call practical requirements engineering in culturally very different situations from those in the West. Indonesia has many different cultures, information technology is helpful and very much needed, but some ways we have of engineering these systems just don’t fit local cultures there, which are many and varied. Made devised a means of performing dynamic adjustments to sociotechnical system requirements through causal analysis of cultural issues that came up during initial system development and prototyping. Again, unlike most academic work, this was serious “grown-up” engineering. The examples in his thesis included designing and implementing the system to run the blog of the Indonesian president, whom he had personally advised, and designing and implementing the warning-message function associated with the tsunami early-warning system installed with international help after the 2004 December tsunami.

Again, I thought this work worthy of a summa cum laude designation, as indeed Made’s committee decided. But before the defence, I had a brief chat with one of my colleagues, multiple times Dean of our faculty, known for his very effective fund-raising, and now Rector of my university, who opined strongly that it was inappropriate to consider awarding a summa cum laude to someone who had “taken too long” (Made had been working with my group about a decade).

To my mind, the quality of a PhD lies solely in its achievement. Both of these scholars had achieved way beyond what most German PhDs in computer science achieve, in that they had devised and implemented systems with demonstrated dependability. As I noted, that simply takes longer. Made had to work with a number of organisations, including government, to get his results. Anyone setting a clock ticking on government work anywhere is liable to run out of clock batteries.

Why am I saying this here? By means of contrast. Vann took ten or eleven years to get his PhD. Shaughan took 13, as did I. Was ten years worth that one seminal paper of Vann, let alone a PhD? In my view yes, most certainly! Read it, and I bet you’ll agree. But in Germany he would have “taken too long”………

Aerial Collision Avoidance

9 12 2012

Just over a decade ago, in July 2002, there was a catastrophic mid-air collision of a Russian passenger aircraft heading westwards and a freighter aircraft of DHL heading northward, near the town of Überlingen on Lake Constance (Bodensee) in Southern Germany near the Swiss border. I wrote a paper on it about a month later, ACAS and the South German Midair, RVS Technical Note RVS-Occ-02-02, on 12 August 2002, in which I suggested that there were issues concerning the verification of the algorithms used in TCAS, as well as the assumptions about cockpit decision-making upon which the successful use of TCAS depends.

In May 2004 the final report of the investigating body, the German BFU, was published. It is 114pp long in english, without the appendices. There are mistakes in it, one of which I had already anticipated in my August 2002 note. I then wrote a paper based on my 2002 note, which accompanied an Invited Talk I gave at the Ninth Australian Workshop on Safety-Related Programmable Systems in Brisbane, Australia, in 2004, Causal Analysis of the ACAS/TCAS Sociotechnical System. This paper is also available on the Publications page of the RVS WWW site.

Neale Fulton, a colleague at the state research agency CSIRO in Canberra, who has been working on algorithms for proximity/collision avoidance for some years, recently told me of a paper by Peter Brooker, in the journal Safety Science 46(10), December 2008, entitled The Überlingen Accident: Macro-Level Safety Lessons, which refers to my work. That’s four years ago. Brooker apparently says some things about my work.

I haven’t seen the paper. Gone seem to be the old courtesies by which one forwarded a copy of an academic paper to a colleague whose work was discussed. Our library used to subscribe to the journal, until 2002, but I suppose it became too expensive. It is certainly expensive now: the publisher Elsevier wishes to charge me (or my library) €31.50 for this paper of about 15 pages. As I have said before, I don’t agree with the current commercial politics of many academic publishing houses. Not all authors do as I do to ensure that some version of a published paper appears for free on a WWW site under the auspices of the taxpayer-funded organisations who pay me a salary for this work. I hope Professor Brooker will understand me seasonally donating to charity the €31.50 I have saved by not buying his paper.

Brooker says some odd things about my work. Also, in 2008 the TCAS standard was amended. So it seems time to revisit those considerations.

There is now a TCAS II Minimal Operational Performance Standard RTCA/DO-185B. There is an FAA Technical Standard Order (TSO) TSOC119c, and an EASA TSO ETSO-C119c, corresponding to TCAS II Version 7.1, as it is now called, which includes two changes, detailed in Change Proposals CP112E and CP115, as in this Honeywell white paper. CP112E is directly relevant to the Überlingen accident, as below.

There are three main points which I wish to address again.

First, I pointed out in my 2004/5 paper (Section 3) that use of TCAS played a direct causal role in the accident. To phrase it technically, the use of TCAS was a necessary causal factor in the collision. I proved this by means of the Counterfactual Test. However, amongst the probable causes which the BFU report lays out, this factor is missing. That is a logical mistake.

I still encounter many technical people in aviation who refuse to accept this observation. I fail to understand why the proof is not routinely accepted. Instead, few seem to want to say in public that use of TCAS was a necessary causal factor in the accident. Maybe politics and wishful thinking triumph over logic once again?

Second, my Issue 4.1 of the paper concerns the fact that the Reversal RA mechanism apparently did not operate as it should have. I labelled this a requirements problem. The design of the kit did not operate in the way the requirement intended. People have waffled about this too, but here is the BFU report telling us that the failure to issue a Reversal RA was a necessary causal factor of the collision according to the Counterfactual Test:

A Eurocontrol specialist team has analysed the accident based on three TCAS simulations. Three different data sources and two different analysing tools for TCAS II were used. It is the BFU’s opinion that the following important insights can be drawn from the Eurocontrol study:
The analysis confirmed that the TA’s and RA’s in both airplanes were triggered according to the design of the CAS-logic
The simulation and the analysis of the alert sequence showed that the initial RA’s would have ensured a safe vertical separation of both airplanes if both crews had followed the instructions accurately.
Moreover, Eurocontrol conducted a further analysis how TCAS II would have reacted in this case with the modification CP 112 which had already been developed prior to the accident. According to the results provided, TCAS would have generated a Reversal RA after the initial RA which would have led to a sufficient vertical separation of both aircraft if the Boeing B757-200 [the DHL freighter] crew would have reacted according to the Reversal RA.

Despite this clear statement, this necessary causal factor did not appear amongst the causes in Section 3 of the BFU report.

In fact, it was known to Eurocontrol in 2000 that Reversal RAs did not function as desired. In engineering-scientific parlance, the design of TCAS did not fulfil its requirements specification. Eurocontrol filed a change notice with the committee, CP 112, to get this fixed. Two years later, there occurred the Überlingen collision. Two years after the problem was first openly acknowledged. Then there were other near-misses, detailed in the Eurocontrol SIRE+ project. Finally, in 2008, RTCA accepted the amended CP 112+ as well as another Change Proposal, resulting in TCAS II Version 7.1 (some issues are detailed in the document Decision criteria for regulatory measures on TCAS II version 7.1 by Stéphan Chabert & Hervé Drévillon).

The anomaly was known in 2000. A major accident in which it was a causal factor occurred 2002. The change was made in 2008. I think it is a scandal that it took so long to remedy this anomaly and that so many were killed on the way.

Third, Issue 4.5 of my paper concerned the cognitive state of the operators (the crews) and the decisions they took. I used an analysis method which I called the Rational Cognitive Model (RCM). Intuitively, it works like this. Suppose the operators were replaced by perfect robots with the same cognitive information and programmed with the TCAS operator procedures, as well as algorithms to make decisions according to the information and procedures. What would the robots do? I pointed out that the robots piloting the Russian aircraft might well have chosen to descend, as the Russian crew did, and for which they have been roundly criticised by all and sundry.

I have subsequently looked at various sociotechnical interactions using RCM. A number of them are analysed in Verbal Communication Protocols in Safety-Critical System Operations, a chapter in the Handbook of Technical Communication, Mouton-de Gruyter, 2012. I have also analysed road accidents, including multiple-vehicle pile-ups on motorways in fog, in The Assurance of Cyber-Physical Systems: Auffahr Accidents and Rational Cognitive Model Checking, which was supposed to be a chapter of a book. I applied RCMs subsequently to same-direction road traffic conflicts (as a bicycle rider, and not necessarily a slow one, I have plenty of experience to draw on). The paper is not yet available.

Ten years on, it is instructive to see how far we have come. I suggested that TCAS be verified using Rational Cognitive Model Checking (RCM-checking). RCM-checking consists in enumerating all the configurations which can occur and determine that the desired operator behaviour under decision-making gives the right outcome. I exhibited in my 2002 note and 2004 paper, and again explicitly in the 2012 Handbook chapter, a situation in which this “right outcome” cannot be assured, namely the Überlingen situation. The 2012 Handbook-chapter formalism makes clear this is (small) finite-state-machine calculation, well within the ability of existing model checkers.

However, verifying a specific scenario for correctness or anomaly is clearly easier than running through all possible scenarios to check. Could current automated model-checkers check and verify all such states for a given system such as TCAS? I put this question to John Rushby, who has applied model checking in similar situations. Say, his paper from 2002 on Mode Confusion and other automation surprises), of which I saw the original contribution in Liege in 1999. John has been at it three years longer than I, although I did have a go at Ev Palmer’s “Oops” example also using WBA and PARDIA in 1995-6. The latest version of John’s work with Ellen Bass, Karen Reigh and Elsa Gunter is from 2011. John suggested that checking large numbers of RCMs (say, more than 50 or so different scenarios) might well be difficult with current model checkers.

I am disappointed at the meagre take-up of these model-checking approaches to algorithms involving cooperative operator behavior. The technical material involved is not so very hard – every digital engineer nowadays has to deal with FSMs. Maybe a problem lies in that people still do not consider operator procedures subject to the same kinds of verification as other algorithms. Maybe this will change as more and more robots come “on-line” to replace humans in various activities. The safety of their interactions is surely governed by the international standard for functional safety of E/E/PE systems, IEC 61508, although for industrial fixed-base robots a new international standard is being developed. IEC 61508 requires assurance measures; maybe this will prompt interest in verification.

There are apparently still intellectual hurdles to overcome. One seems to lie in persuading people that sociotechnical procedures can be verified in the rigorous way it is (sometimes) done in informatics. Another is apparently to persuade them that this would yield any advantage. Which brings me to Brooker’s paper. Neale sent me an excerpt. Brooker takes exception to what I suggested should be done, namely
1.Check and fix the Reversal RA misfit so that design fulfils requirement.
2.Check the interaction between ACAS and Reduced Vertical Separation Minima (RVSM) more thoroughly
3.Determine precisely in which circumstances ACAS algorithms are correct, and where they fall short.
4.Deconflict requirements and advice to pilots on use of ACAS.
5.Causally analyse the operator interactions using Rational Cognitive Models and decision theory.
6.Analyse carefully what happens when one actor has a false model of system state.

Brooker’s comment on all this: “some of Ladkin’s recommendations may not be very wise”.


Brooker explains how he comes to this conclusion by means of an analogy. He discusses in a couple of paragraphs a situation in Ancient Rome, whereby bricks would fall off buildings onto or near passers-by. Apparently wives would push their husbands out of the way. He discusses some decision-theoretic aspects of, well, pushing one’s husband out of the way (as opposed, one might think, to pushing him under).

No arguments for relevance of this situation to that of ACAS are proffered.

So I have to look for clues around and about. Brooker says: “Ladkin says that it ‘‘should be precisely determined in which circumstances ACAS algorithms are correct and in which circumstances they fail.” But the first task is precisely what has been done under ICAO’s auspices for decades (Carpenter, 2004)”. I take it from this suggestion that Brooker has little idea of what is involved in verifying algorithms, as that term is understood in informatics. And I take it he is not familiar with my work, despite citing me, or that of Rushby.

I recommend that people take a look at Fulton’s work on collision-avoidance to see what such algorithm verification might look like. And, for those who are unfamiliar with it, at Rushby’s and my work to see some ways of verifying procedures involving operator decisions.

As I indicated, I think that the poor TCAS/ACAS engineering standards which were causally involved in the deaths of 70-odd people ten years ago are a scandal, as is the fact that it took a further six years for them to start to be fixed. We are on the way to developing techniques which can be used to avoid such poor engineering in the future. I think that work should be encouraged. I don’t see any point in denigrating that endeavor through facile commentary.