Germanwings Flight 4U 9525

24 03 2015

07:40 CET on Friday 27th March

Two important points today. First, investigators have detailed apparently-deliberate actions by the First Officer to initiate a descent and keep the Captain from reentering the cockpit. Colleagues with some experience have said that it is premature to rule out actions in course of experiencing a stroke (Schlaganfall in German). Second, the workings of the cockpit door locking mechanism, and the policies concerning a pilot leaving the cockpit have come into question. I explain the operation of the A320 cockpit locking below.

First, terminology. Everybody is writing “pilot”, “co-pilot”. The usual term is Captain (CAP) and First Officer (FO), referring to the command roles. The term “pilot” informally refers to the person flying the airplane at a given time, known as Pilot Flying (PF). The other cockpit-crew member is the Pilot Non-Flying (PNF). In this incident, the PF appears to have been the FO.

French investigators have said that the Captain left the cockpit, with the First Officer, the Pilot Flying, remaining at the controls, alone in the cockpit. That shortly afterwards a descent was initiated by – I am here interpolating with some knowledge of the A320 – dialing an “open descent” into the FCU (the autopilot control unit just under the glare shield). An A320-rated colleague says you can put, say “100ft” target altitude, and activate, and the aircraft will go into open descent with engines at flight idle at about 4,000 feet per minute right down to 100 ft altitude, i.e. the ground here. In other words, twist and pull one knob.

I would emphasise here that such autopilot systems are not unique to the Airbus A320 but are to be found on most commercial transport aircraft nowadays.

Now to the first major issue. Concerning stroke versus deliberation action, a colleague was present when someone 29 years old had a haemorrhagic stroke.

Inside 30 minutes he went from conversing like normal; to weirdly reticent and uncoordinated; to silently sitting on a bed, clutching an aspirin bottle like a crazy person, totally unresponsive to the world. And in that time he managed to open a laptop and hammer-out an email full of utter nonsense, all for reasons that are still totally lost on him.

During such an event, one may well continue “breathing normally”, as the French press conference is reported to had said the First Officer did.

So it seems to be possible that a confused FO in the course of experiencing a stroke dialed an open descent into the FCU, maybe imagining he has to land. I am a little surprised that medical experts have not yet pointed such phenomena clearly out. It does suggest that concluding murder-suicide is premature at this stage.

It has also been suggested that the FO secured the cockpit door against being opened from outside (that is, he activated the third function below). Evidence for this is that the emergency-entry thirty-second buzzer did not sound. Maybe. No one has yet said whether there is evidence that the Captain in fact tried to activate the emergency-entry function via PIN (the second function below). Apparently he knocked on the door and continued to knock; but nothing else has been said.

Second major issue: the cockpit door locking. The cockpit door on the A320 in normal flight is permanently locked. There are three technical functions. First function: on the central console between the pilots there is a toggle switch which opens the door when it is used: it must be held in “open” position and reverts to door-locked when released. I emphasise: a pilot must “hold the door switch open” for the door to open, and it locks again when heshe releases the switch. Second function: there is a keypad mounted in the cabin outside the cockpit by the cockpit door. Someone standing outside can “ring” (press a key) to activate a ringing tone in the cockpit. Or, of course, knock on the door. The Pilot Flying (or another person) can then use the first function to open the door, and the person outside can then enter. Suppose that does not happen for some reason. Then the person outside can enter a PIN code into the keypad (heshe must have knowledge of the PIN code). A warning sound activites in the cockpit for thirty seconds, at the end of which the door unlocks for 5 seconds, when the waiting person can enter, and then reverts to locked. This second function addresses the issue of the incapacitation of the pilot or occupation with other urgent tasks. The third function is a deactivation: by using a switch in the cockpit, the second function can be deactivated for a preselected period of time (the Operating Manual says between five and twenty minutes; colleagues understand that on the Germanwings aircraft it was five minutes). That means that for this period of time, even use of the PIN code outside does not unlock the cockpit door for entry. The cockpit door can still be unlocked during this time by using the first function, the “unlock” toggle switch. This third function addresses the possibility that a hostile person could physically threaten someone outside the cockpit with knowledge of the PIN code (say CAP or FO who went to the toilet in the cabin) in order to gain entry via the second function.

This is the operation of the door locking/unlocking functions on the Airbus A320. We have not checked and compared with other aircraft.

I am told there is a rule in the USA that there must be two crew members in the cockpit at all times. So if CAP or FO leaves, a cabin-crew member must enter and stay until the cockpit crew member returns. This is not necessarily so in European commercial flying. As far as I know it is consistent with Germanwings operating rules that the PNF can leave the cockpit briefly under certain conditions, leaving just the PF within. (I omit discussion here of why’s and wherefores.)

It seems almost certain that there will be considerable technical discussion of whether these cockpit-door-locking procedures and rules are appropriate or need to be modified. I observe that the BBC has listed three apparent-murder-suicide events in commercial flight the last few decades (I do not know of more), and this might be a fourth (I emphasise again the word “might”). And in at least one of those incidents, the cockpit remained accessible to those outside. In contrast, on one day alone in 2001, four cockpit crews were overwhelmed by attackers from the cabin, and since the door-locking rules have been in force, none subsequently have been. And before that day in 2001, there were many instances of hostile takeover of an aircraft (“hijacking”). So arguments for and against particular methods and procedures for locking cockpit doors in flight are not trivial.

Finally, there seems to be a mistake in one of my observations below. The flight path corresponds more nearly to a 6° descent angle. This is steep, but within the normal range. London City airport has an approach glide path of 6°, and A320-series aircraft fly out of there (although, I believe, not the A320 itself). (Calculation, for the nerds like me: 1 nautical mile = about 6,000 ft so 1 nm/hr = about 100 feet per minute (fpm). So 400 knots airspeed = about 40,000 fpm. Flying at 400 kts and descending at 4,000 fpm is a slope of 1 in 10, which corresponds roughly to one-tenth of a radian which is about 6°.)

07:27 CET on Thursday 26th March

John Downer suggested the possibility of

  • an inadvertent behavioral event that did not obviously fit into my classification below. He quotes a colleague on the regular occurrence of the highly unusual: “as Scott Sagan put it: stuff that’s never happened before happens all the time“.

Inadvertent behaviour would likely involve one pilot leaving the cockpit, and the other suffering a medical event. I could then see two ways to achieve the regular flight path: engaging descent mode in the FCU at 4,000 fpm or 3° descent profile (note: Friday 27th March – I think this should be 6°!) or retarding the throttles in speed-hold mode.

Since the throttles are forward at high cruise, I think that slumping on them would cause them to advance, if anything, not to retard. John informs me that, during a stroke, people can become very confused. Thereby manipulating the FCU or retarding the throttles does not seem out of the question. Many thanks to John for pointing out this possibility which didn’t fit into my classification below!

Karl Swarz made us aware of the NYT report Germanwings pilot was locked out of cockpit before crash by Nicola Clark and Dan Bilefsky. Karl had sent the note before my conversation with John, but I hadn’t yet read it. It seems this is a scoop – there is also a similar report today in The Guardian but it cites the NYT.

There is some preliminary unconfirmed information from the CVR read out. One pilot did leave the cockpit and could not reenter during the event. There is, as currently analysed, no indication of a reaction from the pilot flying. We may presume that the analysis will become much more precise. It seems the commentators cited by the NYT are ruling out cabin depressurisation; that eliminates one of the (now) six possibilities. It seems to me likely that many of the others will be quickly ruled out.

19:04 CET on Wednesday 25th March.

Update: there is no more information on the behavior of the flight than I reported yesterday (below).

There is discussion of possibilities, and whether my classification is right. It is appropriate and necessary that there should be such discussion. Here, in the next paragraph, is some.

A colleague has suggested that the crew could have been overcome by carbon monoxide in the bleed-air from the engines (which is used to pressurise the aircraft). It has happened before that crew has been overcome by something. In each case, the flight has continued as configured until fuel is exhausted, and then come down. So if this happened here, why did the flight not continue at FL380 until the fuel was exhausted? Another colleague has suggested that the descent rate almost exactly corresponds to a descent profile of 3°, which is normal descent profile for (say) an ILS approach. OK, but why would a crew in cruise flight, continuing cruise enroute to Düsseldorf, change the autopilot setting to a descent profile?

Somebody said on Twitter this morning, in response to my interview with a radio station in Hessen, that enumerating possibilities is speculation and one should just let the investigators do their job (and presumably deliver results).

First, this misunderstands how things are investigated. Speculation is a major component of investigation – one supposes certain things, and tries to rule them out or keep them as active possibilities. And one carries on doing this until possibilities are reduced as far as possible, ideally down to one.

Second, each technology is constrained in behavior. Airplanes can’t suddenly turn left and crash into a lane separator. Cars can’t suddenly ascend at 4,000 feet per minute. Bicyles can’t stop responding to input and show you the blue screen of death. How each artefact can behave in given circumstances is constrained. And even further when there is a given partial behavioral profile. Why not attempt to write that down? If it’s wrong, someone will say so and it can be corrected.

Third, such a process obviously works most efficiently when experts with significant domain knowledge attempt to write it down and other such experts correct. And most inefficiently when people with little domain knowledge write down what they are dreaming, and attempt to argue with those who suggest their dreams are unrealistic. It’s a social process, which works better or worse, but I see no reason why it should generally be deemed inappropriate. Speculation is a necessary component of explanation.

18:42 CET on Tuesday 24th March.

Here is what I think I know at this point.

Germanwings Flight 4U 9525 has crashed against an almost vertical cliff in the Alps. The Flight was enroute from Barcelona to Düsseldorf and took the route which had been flown the day before. At about 0931Z (=10:31CET) he was at FL380 in level flight and started a descent at a rate of about 4,000 feet per minute, which continued more or less constant until about 7,000 ft altitude, when he levelled off. The descent lasted until 0941Z (=10:41CET).

He continued level for either 1 minute or 11 minutes. Contact was reported to have been lost at 0953Z. Such basic facts are often unclear in the first 24 hours, even though they appear to come from reliable sources.

I see five possible contributing events, not all mutually exclusive:

  • Loss of cabin pressure. A crew should react by starting a descent at about this rate, but the descent should have stopped before 7,000 ft altitude;
  • Fire. The crew would wish to descend and land as soon as possible. Emergency descents in excess of 4,000 feet per minute are possible, especially at higher altitudes, and a crew in a hurry to land, as in a case of fire on board, could have been expected to do so;
  • Dual engine problems, maybe flameout. Descent at best-glide speed, though, I have been informed is somewhere between 2,000 and 3,000 feet per minute. One would not wish to come down faster, since the more time one has to troubleshoot, and then to try to restart, the better
  • An air data problem affecting the handling of the aircraft. Recent air data problems with these aircraft, as well as with A330 and A340 aircraft that have almost-identical air data sensorics, during cruise and other phases of flight have occurred since 2008 and there have been a series of Airworthiness Directives from EASA and the FAA in this period, including recent Emergency Airworthiness Directives within the last few months. However, one would expect aircraft behavior associated with such a problem not to last nine minutes at constant, moderate rate of descent
  • Hostile – and criminal – human action on board

I’ve already given a TV interview in which I only mentioned four of these five. Such is life. Are there more?

In a number of these cases, one would expect a crew to turn towards a nearby adequate airport for landing, such as Marseille. One would certainly not expect them to continue flying towards high mountains! In particular, towars the Alps at 7,000 ft. So the question is raised whether the crew was or became incapacitated during the event.

I’ll update when I know more.

PBL 1800Z/1900CET

Fault, Failure, Reliability Definitions

4 03 2015

OK, the discussion on these basic concepts continues (see the threads “Paper on Software Reliability and the Urn Model”, “Practical Statistical Evaluation of Critical Software”, and “Fault, Failure and Reliability Again (short)” in the System Safety List archive.

This is a lengthy-ish note with a simple point: the notions of software failure, software fault, and software reliability are all well-defined, although it is open what a good measure of software reliability may be.

John Knight has noted privately that in his book he rigorously uses the Avizienis, Laprie, Randell, Landwehr IEEE DSC 2004 taxonomy (IEEE Transactions on Dependable and Secure Computing 1(1):1-23, 2004, henceforth ALRL taxonomy), brought to the List’s attention by Örjan Askerdal yesterday, precisely to be clear about all these potentially confusing matters. The ALRL taxonomy is not just the momentary opinion of four computer scientists. It is the update of a taxonomy on which the authors had been working along with other members of IFIP WG 10.4 for decades. There is good reason to take it very seriously indeed.

Let me first take the opportunity to recommend John’s book on the Fundamentals of Dependable Computing. I haven’t read it yet in detail, but I perused a copy at the 23rd Safety-Critical Systems Symposium in Bristol last month and would use it were I to teach a course on dependable computing. (My RVS group teaches computer networking fundamentals, accident analysis, risk analysis and applied logic, and runs student project classes on various topics.)

The fact that John used the ALRL taxonomy suggests that it is adequate to the task. Let me take John’s hint and run with it.

(One task before us, or, rather, before Chris Goeker , whose PhD topic is vocabulary analysis, is to see how the IEC definitions cohere with ALRL. I could also add my own partial set to such a comparison. )

Below is an excerpt from ALRL on failure, fault, error, reliability and so forth, under the usual fair use provisions.

It should be clear that a notion of software failure as a failure whose associated faults lie in the software logic is well defined, and that a notion of software reliability as some measure of proportion of correct to incorrect service is also possible. What the definitions don’t say is what such a measure should be.

This contradicts Nick Tudor’s suggestion in a List contribution yesterday that “software does not fail ….. It therefore makes no sense to talk about reliability of software“. Nick has suggested, privately, that this is a common view in aerospace engineering. Another colleague has suggested that some areas of the nuclear power industry also adhere to a similar view. If so, I would respectfully suggest that these areas of engineering get themselves up to date on how the experts, the computer scientists, talk about these matters, for example ALRL. I think it’s simply a matter of engineering responsibility that they do so.

In principle you can use whatever words you want to talk about whatever you want. The main criteria are that such talk is coherent (doesn’t self-contradict) and that the phenomena you wish to address are describable. Subsidiary criteria are: such descriptions must be clear (select the phenomena well from amongst the alternatives) and as simple as possible.

I think ALRL fulfils these criteria well.

[begin quote ALRL]

The function of such a system is what the system is intended to do and is described by the functional specification in terms of functionality and performance. The behavior of a system is what the system does to implement its function and is described by a sequence of states. The total state of a given system is the set of the following states: computation, communication, stored information, interconnection, and physical condition. [Matter omitted.]

The service delivered by a system (in its role as a provider) is its behavior as it is perceived by its user(s); a user is another system that receives service from the provider. [Stuff about interfaces and internal/external states omitted.] A system generally implements more than one function, and delivers more than one service. Function and service can be thus seen as composed of function items and of service items.

Correct service is delivered when the service implements the system function. A service failure, often abbreviated here to failure, is an event that occurs when the delivered service deviates from correct service. A service fails either because it does not comply with the functional specification, or because this specification did not adequately describe the system function. A service failure is a transition from correct service to incorrect service, i.e., to not implementing the system function. …… The deviation from correct service may assume different forms that are called service failure modes and are ranked according to failure severities….

Since a service is a sequence of the system’s external states, a service failure means that at least one (or more) external state of the system deviates from the correct service state. The deviation is called an error. The adjudged or hypothesized cause of an error is called a fault. Faults can be internal or external of a system. ….. For this reason [omitted], the definition of an error is the part of the total state of the system that may lead to its subsequent service failure. It is important to note that many errors do not reach the system’s external state and cause a failure. A fault is active when it causes an error, otherwise it is dormant.

[Material omitted]

  • availability: readiness for correct service.
  • reliability: continuity of correct service.
  • safety: absence of catastrophic consequences on the
    user(s) and the environment.
  • integrity: absence of improper system alterations.
  • maintainability: ability to undergo modifications
    and repairs.

[end quote ALRL]

Fault, Failure, Reliability Again

3 03 2015

On the System Safety Mailing list we have been discussing software reliability for just over a week. The occasion is that I and others are considering a replacement for the 18-year-old, incomplete, largely unhelpful and arguably misleading guide to the statistical evaluation of software in IEC 61508-7:2010 Annex D. Annex D is only four and a half pages long, but a brief explanation of the mathematics behind it and the issues surrounding its application resulted in a 14pp paper called Practical Statistical Evaluation of Critical Software which Bev Littlewood and I have submitted for publication. Discussion in closed communities also revealed to me a need to explain the Ur-example of Bernoulli processes, namely the Urn Model, introduced and analysed by Bernoulli in his posthumous manuscript Ars Conjectandi of 1713, as well as its application to software reliability in a paper called Software, the Urn Model, and Failure.

This discussion about statistical evaluation of software has shown that there is substantial disagreement about ideas and concepts in the foundations of software science.

On the one hand, there are eminent colleagues who have made substantial careers over many decades, written seminal papers in software science and engineering, and published in and edited the most prestigious journals in software, on the subject of software reliability.

On the one hand, there are groups of engineers who say software cannot fail. They don’t mean that you and I were just dreaming all our struggles with PC operating systems in the ’90′s and ’00′s, that those annoying things just didn’t happen. They mean that, however you describe those frustrating events, the concept of failure doesn’t apply to software. It is, as Gilbert Ryle would have said, a category mistake.

I knew that some people thought so twenty years ago, but I had no idea that it is still rife in certain areas of software practice until I was informed indirectly through a colleague yesterday. I have also been discussing, privately, with a member of the System Safety List who holds this view. I have encouraged him to take the discussion public, but so far that hasn’t happened.

The Urn Model can be considered a trope introduced by one man 300 years ago and still not well understood today. Yesterday, I noted another 300-year-old trope that was recognised as mistaken nearly a half century later, but still occurs today without the mistake being recognised, and which I regularly encounter. That is, John Locke’s account of perception and Berkeley’s criticism, which is regarded universally today as valid. It occurs today as what I call the “modelling versus description” question (I used to call it “modelling versus abstraction”), and I encounter it regularly. Last month at a conference in Bristol in questions after my talk (warning, it’s over 50MB!); and again yesterday in a System Safety List discussion. I don’t know when the trope calling software failure a category mistake got started (can someone advise me of the history?) but it’s as well to observe (again) how pointless it is, as follows.

Whatever the reasons for holding that “software cannot fail” as a conceptual axiom, it should theoretically be easy to deal with. There is a definition of something called software failing in the papers referenced above, and I can obviously say it’s that conception which I am talking about. You can call it “lack of success“, or even flubididub, if you like, the phenomenon exists and its that about which I – and my eminent colleagues whose careers it has been – are talking. Furthermore, I would say it’s obviously useful.

Another approach is to observe that the concept of software failure occurs multiple times in the definitions for IEC 61508. So if you are going to be engineering systems according to IEC 61508 – and many if not most digital-critical-system engineers are going to be doing so – it behooves you to be familiar with that concept, whatever IEC 61508 takes it to be.

There is, however, a caveat. And that is, whether the conceptions underlying IEC 61508 are coherent. Whatever you think, it is pretty clear they are not ideal. My PhD student Christoph Goeker calculated a def-use map of the IEC 61508 definitions. It’s just under 3m long and 70cm wide! I think there’s general agreement that something should be done to try to sort this complexity out.

What’s odder about the views of my correspondent is that, while believing “software cannot fail“, he claims software can have faults. To those of us used to the standard engineering conception of a fault as the cause of a failure, this seems completely uninterpretable: if software can’t fail, then ipso facto it can’t have faults.

Furthermore, if you think software can be faulty, but that it can’t fail, then when you want to talk about software reliability, that is, the ability of software to execute conformant to its intended purpose, you somehow have to connect “fault” with that notion of reliability. And that can’t be done. Here’s an example to show it.

Consider deterministic software S with the specification that, on input i, where i is a natural number between 1 and 20 inclusive, it outputs i. And on any other input whatsoever, it outputs X. What software S actually does is, on input i, where i is a natural number between 1 and 19 inclusive, it outputs i. When input 20, it outputs 3. And on any other input whatsoever, it outputs X. So S is reliable – it does what is wanted – on all inputs except 20. And, executing on input 20, pardon me for saying so, it fails.

That failure has a cause, and that cause or causes lie somehow in the logic of the software, which is why IEC 61508 calls software failures “systematic”. And that cause or causes is invariant with S: if you are executing S, they are present, and just the same as they are during any other execution of S.

But the reliability of S, namely how often, or how many times in so many demands, S fails, depends obviously on how many times, how often, you give it “20″ as input. If you always give is “20″, S’s reliability is 0%. If you never give it “20″, S’s reliability is 100%. And you can, by feeding it “20″ proportionately, make that any percentage you like between 0% and 100%. The reliability of S is obviously dependent on the distribution of inputs. And it is equally obviously not functionally dependent on the fault(s) = the internal causes of the failure behavior, because that/those remain constant.

The plea is often heard, and I support it, to take steps to turn software engineering into a true engineering science. That won’t happen if we can’t agree on the basic phenomena concerning success or failure – call it lack of success if you like – of software execution. And, even if we do agree on the phenomena, not being able to agree on words to call them by.

Quantitative Easing and Helicopter Money

30 10 2014

The US Federal Reserve Bank is to end its programme of quantitative easing (QE). QE was introduced by former chairman Bernanke as a response to the financial crash starting in 2008 (or 2007, or whenever). A colleague asked me a few years ago if I understood QE.

I didn’t. Now I (am under the illusion that I think I) do.

I am prompted to write about it because last week I came across a startlingly readable book by British-educated economist Roger E. A. Farmer, who is at UCLA and still advises the Bank of England as well as various branches of the U.S. Federal Reserve. He won the inaugural 2013 Allais prize, with coauthors Carine Nourry and Alain Venditti of the Université Aix-Marseilles for a paper on why financial markets don’t work well in the real world. Maurice Allais won the 1988 Nobel Memorial Prize, and is most well known for his paradox. He was at the Ecole des Mines in Paris, whence in an earlier era the great mathematician Henri Poincaré entered on his program to synchronise the world’s clocks (Peter Galison, Einstein’s Clocks, Poincaré’s Maps, W. W. Norton, 2003).

Where was I? Roger E. A. Farmer, How the Economy Works, Oxford University Press, 2014.

The point of QE is as follows. Central banks such as the Fed and BoE traditionally attempt to regulate the rate of inflation in the economy by buying three-month government bonds in the market. The price they are prepared to pay says what they intend overall prices to do in the next three months. But suppose the interest rate, as given by government bonds, is 0, or very close to 0 . Who’d care about selling bonds to the Fed or BoE? It takes somebody some work (keys have to be pushed on computer terminals and so on) for no gain at all.

So maybe there’s something else to be bought which could influence economic activity? Say, long-term government bonds, and commercial paper. The centrals buy it from banks which hold it. That’s QE. Commercial paper is short-term bonds issued by companies rather than government. Now, the central bank doesn’t actually have to cash in that paper, and neither did the banks which previously held it. But if your paper is held by the BoE rather than, say, Lehmann Brothers, you as a company have a little more security that no one is going to come after your money soon. In other words, your loan has become a retrospective government grant. And the banks which held all those long-term bonds get to cash them out right away – long-term has become short-term. (They could always do that on the markets, but on worse terms; the central bank is working by fiat.)

So, obviously, under QE more cash gets injected into the economy. It’s called “liquidity” (liquidity is stuff which can be moved around the economy quickly. Like cash. People judge the health of economies by the “velocity of money” and it’s publicly measured by, for example, the St. Louis Fed). There was a big discussion whether the 2008 crisis was caused by illiquidity or insolvency. Michael Lewis proposed a third possible factor: greed. This was so obviously right that nobody argued. (Greed, BTW, is not a concept that you find in most economics textbooks. Adam Smith is said to have shown that “greed is good” and it thereafter disappeared from the vocabulary of “serious” economics, or so Wall Street would have us believe. Not all of us agree with that, especially after 2008.) Anyway, this placates the illiquidity proponents, leaving the legal system and courts free to deal with the insolvency bit (um, did that happen? Oh, I forgot: “too big to fail”. That’s all right then).

The big idea is that “the people who know where the money should go”, namely the banks which sold their paper and long-term bonds to the central bank, are then giving it away to those who need it for good purpose. Say, British manufacturers. So British manufacturing should have been on the up and up. Well, it’s a bit up, I understand. So who did go on the up and up? The London housing market in particular and the British housing market in general. Well, duh, isn’t that where we came in?

There is a short article in The Guardian today by one of their economics commentators, Larry Elliot. Elliot compares QE with a mooted alternative, “helicopter money”. That’s Milton Friedman’s idea that you could fly helicopters over the country and drop money from them. Or, more soberly, as Farmer says (I think it’s somewhere in his book), the central bank writes a check for £2,700 to every man, woman, androgyne and child in the country. The idea is they go spend it where they want and thereby demonstrate true demand, which has been thwarted or at least distorted by the crisis. As we might guess, everyone will go out and buy undersea cables, high-performance aircraft engines, and the occasional bit of railway signalling and new track, won’t they? It would be just toooo Victorian to imagine it’ll go on drugs, sex and booze, but if it does, the British government might not be quite so furious at the consequences of sex and drugs now being included in the national accounts.

Actually, provided that Vodaphone can in the future find a way around charging people £15,000 for an evening’s phone “usage”, it might well be spent on telecommunications and digital entertainment, that is, demand-attentive networks and the iTunes store. Yes, those phones and tablets are made elsewhere, and (not-)taxed in Ireland, but they all use ARM chip designs from Cambridge. So the British public wins: sex, drugs AND rock n’ roll, all now in the national accounts!

You see, I do understand QE. Don’t I? I suppose you’re thinking “he’s being silly”. But go a little further. If you suppose “he’s being silly, but come to think of it nobody can really do much better”, I could imagine you have thereby deemed me An Economist!

A Continuing Economic Puzzle

26 10 2014

For me at least. I came across a fine article in Vanity Fair magazine from September 2011 in which Michael Lewis endeavors to explain pervasive German attitudes to finance and financial risk . Lewis is well-known for his incisive inquiries into finance. I first read his The Big Short (Allen Lane/Penguin, 2010), about the crash and why it happened, then Boomerang (Allen Lane/Penguin 2011) about what countries (sovereigns) had been doing, and I have just finished Flash Boys (Allen Lane/Penguin 2014), about high-frequency trading and how markets react to your transactions before you’ve completed them. Donald MacKenzie in the London Review of Books also has a fine article about the same phenomenon and the extremely small time frames in which it takes place

I am a civil servant in Germany and have been for about two decades. I spent a couple of decades in California (nice to see Lewis’s nod in his article towards Alan Dundes – a folk hero at one of my alma maters, U.C. Berkeley) and the couple before that in the UK, of which country I remain a citizen.

My university had its account with WestLB. WestLB was the state bank of my state, North Rhine-Westfalia. Those are the people who held the money which my employer gives me every month. I say “was”. WestLB just sort of disappeared. The state bank – gone. There was a short article on the front page of the local newspaper about it. Nobody asked. It wasn’t a topic of conversation.

I was flabbergasted, and still would be if I hadn’t run out of flabber. I was and am still reading acres of British newspaper articles about what UK banks had been doing, Libor and Forex rigging, handling drug-cartel money, selling inappropriate PPI, and so on, and how and what was being done to correct it. But move to Germany, and WestLB just disappeared. I got a hint as to the financial exposure which caused it to do so through reading, not the local or even a national newspaper, but Lewis’s book The Big Short. Apparently “Düsseldorf” was buying all the US-subprime-backed CDOs which no one else was buying. Lewis’s article explains in more detail that it was a Düsseldorf bank named IKB, which somehow was backed by WestLB.

It seems Lewis set himself in the article the task of trying to figure out what went wrong in Germany and why. I think he only scratched the surface but he got a deeper than anyone else I’ve read on the topic, including German sources.

How can it be that our state bank, which held all the money on behalf of all the state enterprises, all the state organisations, the state government and so on, just went belly-up and was seamlessly replaced without anyone batting an eyelid? Why weren’t all my colleagues the civil servants panicking, wondering where our next paycheck was to come from?

The answer may be something like this. People somehow thought that what the bank had was pieces of paper, promissory notes, and that somehow all that was backed up for purposes of the State by something more tangible and SOLID, real CASH, maybe GOLD, and all that SOLID material must somehow still be there even if someone couldn’t keep track of the promissory notes. And if WestLB couldn’t distribute it, then somebody else would just be asked to go over there, pick up all this tangible REAL stuff, convert it into money, and dole it out to the eligible recipients as usual.

Which seems to be what happened, thank heavens. The question which then arises is how this can happen without anyone pointing a finger at Düsseldorf and calling for divine retribution. Because when it happens with other accounts, say, Greece for example, fingers are pointing and tongues are a-tutting at very high volume and have been incessantly for six years. The other question is, if WestLB can be so simply liquidated without fuss, why the same mechanisms can’t be applied to Greek/Irish/Portuguese debt held in Germany.

There are some relevant phenomena which Lewis could have addressed. It is obvious in hindsight that policy around the euro was going to be dominated by its most influential member, which everyone has agreed is Germany. Apparently no one else in Europe, including Britain, noticed that neo-Keynesian mechanisms were largely frowned upon both by the German state and its most influential economists and ministers. For quasi-moral reasons, it seems. It seems to me that Keynes is largely regarded by Those Here Who Matter as a clever and influential man who just, unfortunately, got it wrong as clever men sometimes do. Pace Milton Friedman, “we” are not all Keynesians, or neo-Keynesians, now.

It is surely a recipe for a car crash when the most influential player in a monetary union has economic ideas at variance with those of almost all its partners. I’d like to know more about that. In August 2012 (Wednesday, 22nd August, 2012 to be precise), on the way back from Frankfurt I read an article in the newspaper “Die Welt” by the finance and tax lawyer and former member of the Constitutional Court (Verfassungsgericht), Paul Kirchhof, in which he used the word “Schuld”, which means both “debt” and “guilt” in German, to equivocate between the two concepts in a discussion about the legality of indebtedness in the eurozone. I remarked on it to some English colleagues. I can’t find the article any more on the newspaper’s WWW site. About a year later, George Soros observed that the words for “debt” and for “guilt” are identical so I am not the only person wondering about a connection.

I also want to know how come my neighbors are so incurious about a major bank failure. Why it happened and what the consequences are. In this land of banking rectitude, this seems to me to be anomalous. I’d really like to know how this all works, and am not content to assume simply that there is something SOLID sitting behind my monthly pay check. And my pension. And my savings. And the value of my house. And so on.

So who will tell us how and why WestLB failed? And why it won’t happen again? Will it?

Security Vulnerabilities in Commercial-Aircraft SATCOM Kit

14 08 2014

There has been some press in advance of last week’s Black Hat conference speaking of vulnerabilities in commercial-aircraft flight management systems and possible implications for the safety of flight, for example in a Reuters article by Jim Finkle from August 4. The article is technically fairly accurate on the claims made and the manufacturer’s response, but it also includes comments such as this

Vincenzo Iozzo, a member of Black Hat’s review board, said Santamarta’s paper marked the first time a researcher had identified potentially devastating vulnerabilities in satellite communications equipment.

“I am not sure we can actually launch an attack from the passenger inflight entertainment system into the cockpit,” he said. “The core point is the type of vulnerabilities he discovered are pretty scary just because they involve very basic security things that vendors should already be aware of.”

Which sort of says what the Black Hat program committee know about airworthiness certification of avionics: not very much, if anything at all. The phrases “potentially devastating” and “pretty scary” are to my mind completely out of place. I have also seen some public discussion of the vulnerability claims which suggests the sky could, or is at least theoretically able to, or maybe possibly theoretically able to, fall. I figure it is worth saying a couple words about it here.

This note may seem ponderous, but I think it is important to give the complete background and references. Aviation airworthiness certification is one of the more developed safety assessment regimes and some public discussion is obviously ignorant of it. For example, some contributions fail to make the basic distinction between a vulnerability (which could pose a hazard) and the possible consequences of exploitation of that vulnerability (the severity of the hazard).

This distinction is basic to safety and security analysis for half a century or more. Its necessity is easy to see. People can demonstrate hacking bank ATMs at security conferences and have them spill banknotes all over the stage. But that doesn’t mean the hacker has access to all the networks at the bank in question and can embezzle trillions from their transaction systems. Indeed, no one thinks it does. The vulnerability is that a bank ATM can be compromised; the severity is (at least) that it loses its contents, and maybe more (maybe hackers can gain access to the central control SW). A bank can routinely cope with losing all the bank notes in an ATM; by all accounts attempts at fraud in financial transaction systems are orders of magnitude more severe and have been for decades. Vulnerability and consequences are connected but separate, and both or either could be rightly or wrongly assessed in any given proposal.

It appears vulnerabilities do exist in the systems investigated by the company IOActive and its associate Ruben Santamarta, but the severities of any such vulnerabilities have already been assessed by regulators during airworthiness certification and have been found to be negligible or minor.

There is a White Paper on their work from the company IOActive. It concerns vulnerabilities in satellite-communications (SATCOM) systems in general, mostly about ships and land-equipment for the military. There is one aviation application, as far as I can see. They claim to have compromised Cobham Aviator 700 and Aviator 700D devices. This kit contains software certified to DO-178B Design Assurance Level (DAL) E, respectively DAL D, they say. They also say it is installed on the military C-130J.

The first paragraph of “Scope of Study” in the company White Paper says that the researcher(s) didn’t have access to all the devices, but “reverse-engineered” those to which they didn’t have access and found vulnerabilities in their reverse-engineered copies.

DAL D software is that installed on kit whose malfunction could have at most a “minor effect“. DAL E software is that installed on kit whose malfunction could have at most “no effect”. These are technical terms: the notion of “effect” is the aviation-certification term for the possible consequences of a failure and corresponds with the more common term “severity” used in other safety-related engineering disciplines. A good general reference on certification of aviation equipment is Chapter 4 of Systematic Safety, E. Lloyd and W. Tye, CAA Publications, London, 1982. Lloyd and Tye categorise a Minor Effect as one “in which the airworthiness and/or crew workload are only slightly affected” and say that “Minor Effects … are not usually of concern in certification”. They don’t include them in the risk matrix which they use to illustrate the certification requirements. The risk matrix shows the slightly differing characterisations of the FAA and JAA certification regimes. The JAA was the former de facto certification authority in Europe and has been subsequently replaced by EASA. Most countries accept FAA and EASA airworthiness certification as adequate demonstration of airworthiness.

The Cobham Aviator series is kit which may or may not be fitted to any specific aircraft. The Cobham WWW site contains a number of data sheets about the Aviator series. It appears to be available for (retro)fit to the Dassault Falcon bizjet series and apparently NASA Armstrong FRC has some: here is a related purchase order.

The airworthiness of the Cobham Aviator 700 and 700D systems is governed by 14 CFR 25.1309 in the US, and Certification Specification 25 (CS-25) clause 25.1309 in Europe. There is an FAA Advisory Circular defining the acceptable means of compliance with this regulation, which includes the definitions of effects and their allowable probabilities: AC 25.1309-1A System Design and Analysis, issued 21 June 1988.

The specific definition of “Minor Effect” from AC 25.1309-1A is

Failure conditions which would not significantly reduce airplane safety, and which involve crew actions that are well within their capabilities. Minor failure conditions may include, for example, a slight reduction in safety margins or functional capabilities, a slight increase in crew workload, such as routine flight plan changes, or some inconvenience to occupants.

The CS-25 definition is similar.

The general vulnerabilities IOActive claim to have found in the Cobham Aviator devices are listed in Table 1 of their report:

Weak Password Reset
Insecure Protocols
Hardcoded credentials

IOactive has informed US-CERT about the vulnerabilities it has found in the Cobham Aviator 700 and 700D kit. The US-CERT entry in the Vulnerability Notes Database contains a rather more precise statement of the vulnerabilities found. The note says that the identified vulnerabilities are

CWE-327: Use of a Broken or Risky Cryptographic Algorithm – CVE-2014-2943
IOActive reports that Cobham satellite terminals utilize a risky algorithm to generate a PIN code for accessing the terminal. The algorithm is reversible and allows a local attacker to generate a superuser PIN code.

CWE-798: Use of Hard-coded Credentials – CVE-2014-2964 
IOActive reports that certain privileged commands in the the satellite terminals require a password to execute. The commands debug, prod, do160, and flrp have hardcoded passwords. A local attacker may be able to gain unauthorized privileges using these commands.

The Common Weakness Enumeration (CWE) derives from Mitre, and an explanation of, for example, CWE-327 is to be found on the CWE WWW site, as is an explanation of CWE-798.

IOactive says the following about the vulnerabilities of the Cobham 700 and 700D devices. I quote their report in full.

The vulnerabilities listed in Table 1 could allow an attacker to take control of both the SwiftBroadband Unit (SBU) and the Satellite Data Unit (SDU), which provides Aero- H+ and Swift64 services. IOActive found vulnerabilities an attacker could use to bypass authorization mechanisms in order to access interfaces that may allow control of the SBU and SDU. Any of the systems connected to these elements, such as the Multifunction Control Display Unit (MCDU), could be impacted by a successful attack. More specifically, a successful attack could compromise control of the satellite link channel used by the Future Air Navigation System (FANS), Controller Pilot Data Link Communications (CPDLC) or Aircraft Communications Addressing and Reporting System (ACARS). A malfunction of these subsystems could pose a safety threat for the entire aircraft.

This is the entire statement. IOActive is thus explicitly disagreeing with the regulators: they say the vulnerabilities “could pose a safety threat for the entire aircraft” whereas the regulators have determined during airworthiness certification that the consequences of any malfunction of the Aviator 700 and 700D are “No Effect”, respectively a “Minor Effect”.

It is certain that regulator and vendor have a significant amount of paperwork on file purporting to establish the severity of malfunctions of the Cobham Aviator 700 and 700D kit. Much of that will refer in detail to the kit, and therefore will contain proprietary information and will not be available to the general public.

In contrast, IOActive has merely asserted, as above, its deviant view of the severity, as far as I can tell without providing any reasoning to back up its claim.

The vendor has provided the following statement to US-CERT:

Cobham SATCOM has found that potential exploitation of the vulnerabilities presented requires either physical access to the equipment or connectivity to the maintenance part of the network, which also requires a physical presence at the terminal. Specifically, in the aeronautical world, there are very strict requirements for equipment installation and physical access to the equipment is restricted to authorized personnel.

The described hardcoded credentials are only accessible via the maintenance port connector on the front-plate and will require direct access to the equipment via a serial port. The SDU is installed in the avionics bay of the aircraft, and is not accessible for unauthorized personnel.

Cobham SATCOM will continue to evaluate any potential vulnerabilities with its equipment and implement increased security measures if required.

In other words, they don’t think the discovered vulnerabilities affect the use of its kit much at all, and presumably the regulator agrees – that is, it has already agreed in advance during airworthiness certification, and sees no reason to change its mind.

US-CERT judges


A local unauthenticated attacker may be able to gain full control of the satellite terminal.


The CERT/CC is currently unaware of a practical solution to this problem.

I would disagree with use of the words “problem” and “solution” here. Indeed the entire categorisation seems to be somewhat puzzling. Obviously the vendor could fix the vulnerabilities by using better crypto in places, and by using device-access authentication that is not hard-coded; that would surely constitute a “practical solution” and surely CERT is as aware of this as I am and the vendor is. It also appears that neither vendor nor regulator sees the need to undertake any action in response to the revelations. There is no record that the airworthness certification of the kit has been withdrawn and I presume it hasn’t been.

Summary: IOActive and US-CERT have said “you’re using risky or broken crypto, and you’re hard-coding authentication”. Vendor and (implicitly) airworthiness regulator have said “so what?”. End of Story, probably.

None of this is to say that airworthiness certification always gets it right. Indeed, it is clear that every so often it is gotten wrong. But it is a lot more effective than what people without any experience of it seem to be assuming in discussion.

Don Hudson and PBL on the ITU’s proposal for real-time flight data transmission

10 04 2014

The International Telecommunications Union has been conducting its four-yearly meeting. Its president has apparently promised everyone to make possible the real-time transmission of flight data from commercial transport aircraft in flight. This has been supported by the Malaysian delegate. All according to this news report: MH370: ITU Commits to Integrate Flight Data Recorders with Big Data and Cloud, writes Vineeta Shetty from Dubai

Captain Don Hudson is a valued colleague for well over a decade. He has some 20,000 or so flying hours, has flown a variety of transports, including Lockheed L1011, Airbus A320 and varieties of A330 and A340 machines and is an active professional pilot although formally retired from scheduled airline flying. While with Air Canada he contributed significantly to the development of the airline’s safety-management and flight-quality systems while he was a captain on intercontinental flights.

Don points out, as have others, that the technology exists to do what the ITU proposes. However, he finds the proposal problematic, as do I, and economically and from a safety point of view barely justified, if at all.

It almost goes without saying that people expert in the standardisation of telecommunications are not necessarily expert with the human and organisational factors involved with aviation safety programs. Don is expert. I recommend that ITU delegates read what he has to say below. Some comments of mine follow.

Captain Don Hudson: Response to the ITU’s proposal for real-time flight data transmission

Some important issues have not been addressed in the ITU’s suggestions.

Aside from any commercial priorities and processing/storage/retrieval issues regarding DFDR and CVR data, a number of important issues are not addressed in this announcement.

I suspect that each individual country would “own” their carriers’ data. Given the difficulties with establishing even a “local” distributed archive of flight data within one country, even purely for safety purposes and with limited access, I doubt that such a flight-data archive will be hosted by a world body anytime soon. Within such a proposed arrangement lie a number of serious impediments to international negotiation, not the least of which is the potential for legal prosecution of one country’s flight crews by other countries. Such data could be a legal goldmine for various interested parties and that is not what flight data recorders and CVRs are for. Their purpose is expressly for flight safety.

I submit that such a suggestion to stream flight (and CVR?!) data would be an initiative to solve the wrong problem – the one of disappearance, of which there have been only two very recent cases among millions of flights and billions of passengers, all carried under a safety record that is enviable by any standards.

The main problem that many are trying to come to grips with is certainly real. We needed to know what occurred to AF447 and the results of that knowledge have materially changed the industry in many, many ways. We need to know what happened on board, and to, MH370.

What makes more sense, in place of a wholesale sending of hour-by-hour flight data from every flight at all times, is a monitoring function something along the lines of the work Flight Data Monitoring people perform on a regular, routine basis, but do so on-board the aircraft, using a set of sub-routines which examine changes in aircraft and aircraft system behaviours to assess risk on a real-time basis.

Flight data analysts look for differences-from-normal – a rapid change in steady states or a gradual change towards specified boundaries, without return to normal within expected durations. It is possible to define a set of normal behaviours from which such differences can trigger a higher rate of capture of key parameters. This approach is already in use on the B787 for specific system failures. A satellite transmission streams the data set until either the differences no longer exist or end-of-flight.

Flight data quantity from aircraft such as the B777 is in the neighbourhood of 3 – 4 MB per flight hour. Most events prior to an accident and relevant to that accident are moments to minutes in duration.

The two industry events which have triggered the ITU interest were a rapid departure from cruise altitude & cruise Mach (AF447) and MH370, with which a sudden change in aircraft routing concurred with a loss in normal air-ground routine transmission by automated equipment, (transponder, ADS-B). Both these events lasted moments and would be events that would initiate data-capture and transmission in my proposed scenario. In the MH370 case the transmission would remain active until end-of-flight. If the AF447 aircraft had been recovered from the stall and had made destination, the data would still be in place off-aircraft. Loss of satellite signal occurred on AF447 but the problem would not have prevented an initial data stream (See the BEA Interim Report No. 2 on AF447, p39).

The flight phases defined as “critical” by AF447 and MH370 are from the top-of-climb to the top-of-descent phases, in other words, the cruise phase. From the takeoff phase through the climb to cruise altitude and the descent/approach and landing, no such need for this kind of system exists, because any accident site is going to be within about 150nm or about one-half-hour’s flight time from departure and arrival.

An “on-condition” data transmission would be more practical and cheaper than full-time transmission of flight data, which would bring the notions expressed by many regarding these issues a bit closer to implementation.

Besides flight data, there is cockpit voice recording (CVR). The data issues with CVR transmission require a parallel and separate examination.

Don Hudson

[End of Hudson's Essay]

PBL Comments on Hudson

Concerning the ITU’s statement, I fail to see what either big-data analysis or cloud computing have to do with real-time data transmission from aircraft in flight. I suspect an infection by buzzword.

The ITU is suspected by many people concerned with the Internet, and certainly by most in the IETF, to be a political body more interested in control than in enabling technology. This current announcement does seem opportunistic, and, as one can see by connection to “cloud” and “big data”, some of its senior officers apparently don’t really understand the technology which they want to regulate.

There are further questions which Don does not address above.

Who is going to pay for it, and will the expense be justified? Likely will the satellite companies gleefully support the proposal (see the article’s comment on Inmarsat), given that through it they would be rolling in cash. But where shall that cash come from in a boom-and-bust industry such as airlines? Likely from the passengers through an increase in fares. So one would be asking passengers to pay for a service on their flight for which the rest of the world would only benefit if the flight crashed and said passengers died. That seems to be stretching everyday altruism to its limits.

As Don points out, such a proposal would have helped in precisely two cases in five years. But Air France was found, at a cost according to this Newy York Times article of €115m, and it currently looks as if MH 370 will also be found, so someone will be able to estimate the cost of that. A cost-benefit analysis (CBA) is thus possible, though I guess there would be lengthy argument over the components of the calculations. With a CBA a decision on implementation would come down to attempting to reduce, respectively to raise, the cash paid to satellite companies, and that seems to me to be a commercial issue and not one which governments would or should care to resolve in the absence of demonstrated need. I doubt governments would want to end up paying out of taxes. Surely, any individual government would prefer to put such resources into improving its surveillance capability, and expect to use those covertly in the very rare cases such as MH 370?

To the main point. How would real-time data transmission kit have helped in the search for MH 370?

Likely it would not have helped at all, if the current hypothesis of deliberate human action is validated. Such a system, like the transponder and ADS-B, can be turned off. For reasons of basic electrical safety, a discipline established over 100 years ago, you have to be able to turn off any given electrical circuit in flight. You can make it easier or harder, but no certifiable design can allow that it be prevented. Thus no such system is resilient against intentional human action.


Hijacking a Boeing 777 Electronically

17 03 2014

John Downer pointed me to an article in the Sunday Express, which appears to be one of their most-read: World’s first cyber hijack: was missing Malaysia Airlines plane hacked with mobile phone? by James Fielding and Stuart Winter.

The answer is no. To see why, read on.

The authors interviewed a Dr. Sally Leivesley, who is said to run “her own company training businesses and governments to counter terrorist attacks” and is “a former Home Office scientific adviser“.

…Dr Sally Leivesley said last night: “It might well be the world’s first cyber hijack.”

Dr Leivesley, a former Home Office scientific adviser, said the hackers could change the plane’s speed, altitude and direction by sending radio signals to its flight management system. It could then be landed or made to crash by remote control.

Apparently Ms. Leivesley thinks one can can hijack the Flight Management System on a Boeing 777 with a mobile phone.

First point. Suppose you could. When the pilots noted that the aircraft was not doing what they wanted, they would turn off the Flight Management System. Problem solved. It’s nonsense to suggest that the aircraft “could then be landed or made to crash by remote control“.

One needs to make the distinction, clear to people who know anything about aviation systems, between the Flight Management System (FMS) and the Flight Control System (FCS). If you could take over the Flight Control System you would be able to make the airplane land or crash. The Boeing 777 is an aircraft with a computer-controlled FCS, so it is reasonable to ask whether it is vulnerable. Indeed, I did, on a private list which includes the designers of the AIMS critical-data bus (a Honeywell SAFEbus, standardised as ARINC 659; ARINC is an organisation which standardises aviation electronic communication technologies). The FMS and other systems on the Boeing 777 such as the Electronic Flight Instrument System (EFIS) and the Engine Indicating and Crew Alerting System (EICAS) use the AIMS bus to transfer critical data and control commands (from the LRMs, the computers processing the data on the AIMS bus) between components. A good generally-accessible reference is Digital Avionics Systems, Second Edition by Cary R. Spitzer, McGraw-Hill, 1993.

Second point. Both FMS and FCS are electronically well-shielded. They have to be – it is part of their certification requirements. They are not vulnerable to picking up signals from a device such as a mobile phone. In fact, there are laboratories where you can park an airplane in the middle of banks of very powerful electromagnetic radiators and irradiate it, to see whether your electronics are shielded. These installations are mostly used for military aircraft, against which aggressors might use such powerful radiators, but they are used by civilian aircraft too.

Third point. Any communication requires a receiver. If you want an electronic system S to pick up and process signals from a radiating device such as a mobile phone, there has to be a receptive device attached to S. So anyone wanting to put spoof data including control commands on a bus must do so through such a receptive device. Either there is one there already (such as in one of the components sharing the bus) or someone has to insert one (a physical tap). And it has to “speak” the communications protocol used by, in this proposed case, an intruding mobile phone. As far as I know, none of the usual components sharing the critical buses on the Boeing 777 has a mobile-phone-communications receiver/transmitter, but some airlines do attach such devices to their systems so that they can download data from the Quick Access Recorder (QAR, a flight-data recording device, such as a “black box” but with fewer parameters and not a critical component) at destination airports after a flight, for quality management purposes. As far as I know, a QAR is a passive device that is not able itself to put data on the bus, so hacking the QAR through its transmitter wouldn’t give you access.

Fourth Point. Could you pre-install a receiving device somewhere on the buses, to allow someone in flight to communicate with the bus, to perform what is called a “man in the middle” (MitM) attack? An MitM spoofs one or more components transferring data on the bus. Well, theoretically someone on the ground with maintenance-type access could install such a component, but let’s ask what it then can do. The AIMS bus carries pure data between components who know what the data means. The bus is clocked and slotted, which means that data is transferred between components according to a global schedule in specific time slots; the bus time is given by a clock, of which values all bus users have to be aware. Components communicate according to certain “timing constraints”, that is, slot positions, which constraints/positions are only “known” to the components themselves. So to spoof any component you need to reverse-engineer the source code which drives that component to find out what its specific “timing constraints” are. So you need the source code.

Not only that, but the AIMS bus, for example, runs at 30 Mb/s (million bits per second), and there is a 2-bit gap between words on the bus; that is, one fifteen-millionth of a second. It is questionable whether the available mobile-phone protocols are fast enough. The fastest protocol in the CDMA2000 family is EV-DO Rev. B at 15.67 Mb/s, so no; there is no time to perform computations to determine a “gap” and start putting your bits on the bus. (Basic CDMA2000 is 144kb/s, extendable to 307kb/s; two orders of magnitude slower.) The HPSA+ protocol, from another mobile-phone protocol family, gets 28Mbp/s upstream and 22 Mb/s downstream, so has half a chance of being compatible. But, really, to synchronise with something running at 30Mb/s and a 2-bit sync gap you’d need something running at more like double that speed, I should think. The wireless computer-network protocols IEEE 802.11g or 802.11n could do it in their faster versions. You’d need a device (and receiver) speaking these (why would it be a mobile phone?)

To visualise what has to happen, imagine trying to merge with traffic travelling on a densely-packed motorway at rush hour – on your bicycle. Even if you are going fast, you’re not likely to be successful.

Then you need to figure out, not only the timing constraints, but what data you want to put on the bus. Again, to know this, you need to reverse-engineer the source code for the components you want to spoof. Indeed, to put the spoof data on the bus consistent with the timing constraints for the component you are spoofing, it would probably be easier to have stolen the SW and use that to satisfy the very narrow timing constraints. All millions of lines of it.

Rather than tap in to the bus for a MitM attack, it would seem more reasonable to install a receiver/transmitter surrepticiously on one of the LRMs (“line replaceable modules”, the processors in the AIMS cabinet) and swap it in.

Summary: to achieve a MitM attack on the critical buses of the Boeing 777, you would need in advance to modify physically some of the hardware processors which run the bus so that they have a transmitter/receiver for WiFi signals (of the more powerful 802.11 standards family), and someone has to install such a modified LRM on the target aircraft beforehand. Then, the SW for the various components which the attacker intends to spoof must be obtained and reverse-engineered to obtain the timing constraints and the data-interchange formats, many million lines of code in all. That must all be installed on a portable device, probably not a mobile phone, which you then use in flight.

Dr. Leivesley refers to Hugo Teso’s demonstration in Amsterdam a year ago, in which he showed how to transmit messages from a mobile phone to a flight simulator running on a PC, and to modify FMS commands on the simulator via ACARS and ADS-B vulnerabilities. Neat demo, but it didn’t show you could take over a flight control system, as pointed out by many commentators. For one thing, the PC already has relevant communications devices (Bluetooth, WiFi, and one can insert a USB dongle for mobile-phone reception). Second, it’s just flight simulation software. Who knows what kinds of vulnerabilities it has and who cares? It is not the certified SW running on an actual airplane, or on the actual aircraft HW, and has not been developed, analysed and tested to anywhere near the same exacting standards used for flight control software (Design Assurance Level A, according to the RTCA DO-178B standard to which I believe the Boeing 777 flight control system was certified. Consider even the exacting and resource-intensive MCDC assessment required for DAL A. If you don’t know what MCDC is, let me recommend Kelly Hayhurst’s NASA tutorial. Nobody performs MCDC assessment on PC-based flight simulation SW).

To summarise, the demo was a little like hacking an X-box F1 road race game. It doesn’t mean you can take over Sebastian Vettel’s car in the middle of a real race.

Unlike the authors of the newspaper article, I have put most of these thoughts past a group of experts, including the designers of the SAFEbus. I think Philippe Domogala, John Downer, Edward Downham, Kevin Driscoll, Ken Hoyme, Don Hudson, Michael Paulitsch, Mark Rogers, Bernd Sieker, and Steve Tockey for a very enlightening discussion.

There has been some thought about whether it is feasible for an interception aircraft with transmission capability to fly formation with MH 370, so that there would only be one blip on primary radar, and accomplish such an electronic takeover. The considerations above would still apply, as far as I can see – you still need to modify the physical HW in advance on the target airplane to allow electronic intrusion from outside.


Pete Seeger

28 01 2014

Pete Seeger died early today. It popped up on my iPad as I was reading the morning news.

There is lots to say about Pete, most of it not by me. The New York Times’s obituary by Jon Pareles does justice to the man. His music speaks for itself. Because, as he would probably say, it’s not his music, it’s our music, of which he was one of the greatest exponents. So here are some samples.

He had a number-one hit in the US before I was born, with a Leadbelly song, Goodnight, Irene. As he wrote in his songbook American Favorite Ballads (Oak, 1961), “six months after Leadbelly died, this song of his sold two million copies on the hit parade.“. The Weavers played a reunion concert in Carnegie Hall in 1980, shortly before Lee Hayes died. Here is a video of The Weavers singing Goodnight, Irene at the reunion.

There is a splendid version of House of the Rising Sun recorded by Pete in 1958 on American Favorite Ballads volume 2. What a voice! These recordings now belong to the Smithsonian Institution, which took over Folkways Records. Pete writes in his book of the same name (Oak, 1961) that he learned it from Alan Lomax. It’s there as “The Rising Sun” in Alan’s book The Penguin Book of American Folk Songs (Penguin, 1964). The credits say it was originally in Lomax&Lomax’s Our Singing Country (Macmillan, NY, 1941). Lomax says “A ragged Kentucky Mountain girl recorded this modern Southern white song for me in 1937 in Middlesborough, Kentucky, the hardboiled town in the Cumberland Gap on the Tennessee border. This blues song of a lost girl probably derives from some older British piece. At any rate, the house of the Rising Sun occurs in several risqué English songs, and the melody is one of several for the ancient and scandalous ballad Little Musgrave“.

I sing and play in a band, and we sing The Fox, about a fox who steals a goose and a duck out of the farmer’s pen to take home for his cubs to eat: “Daddy, Daddy, you gotta go back again ‘cos it must be a mighty fine town-O!” As Pete says (op.cit.), “it’s nice to find the fox for once treated as the hero“. We also sing the ubiquitous song “Rye Whiskey”, otherwise known as “Drunken Hiccoughs” – here is Pete’s version, also on AFB. And just last night we were working on a version of Turn, Turn, Turn, a setting of Ecclesiastes 3:1-8 from the King James Bible, one of the great works of English literature. It turns out to be a very difficult song to sing well, but for Pete it seems to be effortless. There is a version by Judy Collins and of course the Byrds’ Top 20 hit (with a notably slim David Crosby), by which I remember being very struck as a teenager in the 1960′s. Fifty years on, the Seeger version stands out as timeless.

So what’s my connection, besides re-singing the music? Thinner than I would like. I never saw, heard or met Pete. But I have a Seeger number of 3, same as my Erdös number. (See the footnote for “Seeger number” – I couldn’t get internal reference to work.)

How I came by my Seeger number. When I was a kid, I heard Malvina Reynolds’s Little Boxes, which was a big hit on the BBC. Malvina was a collaborator of Pete. When I got to Berkeley in the 1970′s, I remember visiting San Francisco and going down towards Daly City I saw all these “little boxes … all the same” on the hillside. It turns out that that Daly City development was the exact inspiration for the song. It’s not such a coincidence, more a déjà vu – on the original video for “Little Boxes” played on the BBC there was a photo of these very same houses. Malvina’s accompanist on that video, and at many gigs, was a musician and composer named Janet Smith. I bumped into Janet one day at a music store in Walnut Square in Berkeley. The owner, who I seem to remember was called Mike, was a classical guitarist who had a stock of medieval and baroque sheet music. I had decided to take up playing the recorder again (I prefer the German concept of Blockflöte – block-flute), so I used to go in there every Saturday to look through his stocks and buy what I could. Janet was looking for some folk-type-wind-instrument player to play music she was composing for the Berkeley Repertory Theater’s production of William Saroyan’s My Heart Is In The Highlands. So I got to be the flautist.

That was 1981, I think. My parents visited late in the year, and I took them along to a production. Producing theatre was my Dad’s favorite thing to do as a teacher of English, and they liked the performance. I didn’t say anything about the music. Neither did they. The Rep had spelt my name wrong in the program. I pointed it out at the end. Dad said “Yes, we thought that sounded a bit like you”. I am still unsure whether that was meant as a compliment.

There is a wider scheme of things. Pete is generation two in the tradition of song collection in the field. That started with the availability of electrical recording equipment, specifically with John A. Lomax in the 1930′s, aided by his son Alan, who was a near-contemporary of Pete (four years older). As his biography notes, Alan was a founder of the notion of, and collector of, world music. We owe a lot to John and Alan, amongst other things the collection of the Archive of American Folk Song in the US Library of Congress, now part of the Library’s American Folklife Center. Pete and Alan knew each other well, of course.

Such collection, at least for music of the British Isles and Ireland and its continuance in North America, is now more or less over, with the advent of festivals and iPods and iPads and electronic devices in every North American, British and Irish pocket. If there is an unnoted singer in this tradition left anywhere nowadays I would be surprised. But, please, I would be delighted to be surprised!

The US folk-music-archival tradition is not that long. It started with Francis James Child, whose life spanned the nineteenth century and who was the first Professor of English at Harvard (before that, he was Professor of Rhetoric and Oratory). Child researched the folk-poetry tradition. He published in the mid-1800′s from his compilations, and realised that most of the work he was publishing stemmed from the Reverend Thomas Percy’s Reliques of Ancient English Poetry, published in 1765. His collaboration with Frederik Furnivall, the founder of the Early English Text Society, turned up a Folio of Percy’s Reliques, and Child started on his 8-volume masterwork, The English and Scottish Popular Ballads. (Scans of all volumes are available through the link.) Child didn’t finish his work – that was left to his successor George Kittredge, who completed the task in 1898. Less than a decade later, John Lomax turned up at Harvard with his interest in folk songs, in particular cowboy songs (Lomax was Texan) and Kittredge encouraged him. When Lomax was back in Texas, he published Cowboy Songs and Other Frontier Ballads.

Child was a literature specialist. He included no tunes. Those were supplied in the 1950′s-1970′s by Bertrand Harris Bronson, a Professor at the University of California, Berkeley, in his four-volume The Traditional Tunes of the Child Ballads, a reprint of which is again available. I recommend to anyone interested in songs and tunes his smaller one-volume resume, The Singing Tradition of Child’s Popular Ballads. I never met Bronson either, but I used to play blockflute, then fiddle at least once a week in Moe Hirsch’s Tuesday-lunchtime old-time music sessions under a tree on the UC Berkeley campus with Bronson’s assistant Lonnie Herman, a Native American professional folklorist. That’s that for tenuous connections, I promise. I guess they are here because there’s a lot of richness in this world which passes us by until it becomes too late, and there is a lot of that in what I am feeling now.

Pete’s not the only Seeger of note in the field-collection and performance tradition. His half-brother Mike was an avid collector and performer, founder of the New Lost City Ramblers. The song Freight Train, composed by a teenage Elizabeth Cotten, who worked for the Seeger family, was sung by her to Mike. I think the NLCR first published another of our band’s songs, Man of Constant Sorrow, which they got from a 1920′s recording of Emry Arthur, who claimed (to someone else) to have written it. They refer to Ralph Stanley’s “G version” of the song as a “classic recording”. It sure is. It’s most recently associated with Dan Tyminski, of the “Foggy Bottom Boys” (that is, singwise, he and Ron Block) in the Coen Brothers’ film Brother, Where Art Thou?.

Finally, a song from Pete’s songbook that I heard lots on the radio as a child, this time sung by its great exponent Burl Ives: The Big Rock Candy Mountain, which I am listening to as I write. It’s just such a jolly hobo fantasy, composed and then sung by people with nothing other than the clothes on their backs. Music is life. For them. For him. For us, for it’s ours.

** Seeger number: length of the shortest path in the graph of who has jointly performed with whom, with root Pete Seeger.

A Book on the Fukushima Dai-Ichi Accident

6 12 2013

In August 2011, we held the 11th Bieleschweig Workshop on Systems Engineering. The theme was the accident at the Fukushima Daiichi nuclear power plant.

We have just published a book on it. An Analytical Table of Contents may be found at the end of this note.

I had convened a mailing list in the days after the accident, after receiving a short note from Charles Perrow which he had written in response to a newspaper’s request for comment. He pointed out there was an obvious, indeed known, design feature that left the plant’s emergency electricity generation systems susceptible to flooding, and therefore that this was not a “normal accident” in his sense, but a design-failure accident. The accident clearly had a high organisational/human factors/sociological component, as do many accidents with process plants. The mailing list, which was closed to enable frank discussion, rapidly attracted system safety engineers and sociologists concerned with the safety of engineered systems as they are deployed. Discussion was intense. I surveyed the New York Times, the Washington Post, and the Guardian every day as key sources of information, as well as the BBC News Live Feed, which ran for a month or so, and the daily news reports from the nuclear regulators on technical matters at the Fukushima Dai-Ichi plant.

Indeed, Charles Perrow himself had anticipated the mechanisms of the accident (flooding of the basement taking out the emergency electrical systems and thus rendering cooling systems ineffective) in his 2007 book The Next Catastrophe. Why, I continue to think, is it a sociologist who put his finger right on a hazard which safety engineers had overlooked for some four decades? It does, of course, require a sociologist to answer such a question of organisational weakness.

I became and remain convinced that engineering-safety sociologists are essential partners to engineers in safety matters with high-impact engineering. Their presence is somewhat subdued with those areas with which I have been more concerned, such as rail, air and road transport but let us hope it can be increased. The first step was to organise a workshop in August 2011 on the Fukushima accident to which system safety engineers, scientists involved in safety, and sociologists concerned with engineering safety were invited. I take workshops seriously: lecturers were asked for 45 minutes of material, given a 90-minute slot, and discussions ran full course.

The University of Bielefeld’s CITEC project, the Excellence Cluster in Cognitive Interaction Technology, which pursues studies in anthropomorphic robotics, generously sponsored the Workshop, thanks to a strong recommendation from its convenor, Helge Ritter, enabling us to bring some stars to Bielefeld as speakers, including Perrow, his colleagues John Downer and Lee Clarke as well as engineering-safety experts Martyn Thomas, Robin Bloomfield and Nancy Leveson. We had some pretty good nosh, sponsored by the UK Safety-Critical Systems Club and Causalis Limited.

The book of essays is now out. The Fukushima Dai-Ichi Accident, ed. Peter Bernard Ladkin, Christoph Goeker and Bernd Sieker, LIT Verlag, Vienna, Berlin, Zürich, Münster, November 2013, 291pp, ISBN 978-3-643-90446-1. List price €39.90. See below for an analytical table of contents.

The book is currently on the WWW page of the publisher, LIT Verlag (there is a language switch button between English and German) and if you click on the image, you get to the product page. The book is also available as a downloadable PDF at a slightly reduced price.

A word about publishing politics. We chose the publisher specifically with a view to (a) keeping the retail price reasonable; and (b) authors retaining our intellectual property. The big scientific publishers generally violate both conditions. For example, appropros (a), a reprint of a single article in a journal from “the big two” scientific publishing firms will cost something similar to the costs of this book. Apropos (b), author contracts for one of those companies require you not only to transfer copyright but also the intellectual property (I personally renegotiated my last contract with this company in 2011. According to the colleagues who assisted that negotiation and who had published with them for twenty years, the company has stopped doing that. My colleagues now self-publish. Their proceedings contain articles from companies as well as academics, and companies do not sign over their intellectual property without compensation – if they are content to do so in a particular case, it’s because what they wrote is anodyne). I don’t agree with either phenomenon and am happy there is an alternative. We availed ourselves of it. My students are now able to afford to buy the book at the student discount price of 40%; this is becoming rare in technical subjects. The costs of studying at university are rising, it appears inexorably. A decade ago, having received offers to publish my system safety book, I decided not to publish at a price which I thought students could not afford. It’s taken me that length of time to understand and pursue alternatives. We are very happy to be working with LIT.

LIT Verlag is hitherto known for its series in the humanities and social sciences. With this book, it is starting a series in engineering, which we hope to continue focusing on engineering in its social context. I am the series editor. If anyone has, or is planning, book-length material which they might wish to publish at a reasonable price, while retaining authors’ intellectual property, please get in touch!

The Fukushima Dai-Ichi Accident: Analytical Table of Contents

Chapter 1: The Fukushima Accident, Peter Bernard Ladkin. Ladkin explains the technical background, the structure of the plant, describes how the severity of a nuclear accident is measured by the IAEA, and comments on what went right and what went wrong in dealing with the events triggered by the Tohoku earthquake and tsunami.

Chapter 2: Hazard Analysis and its Application, Peter Bernard Ladkin. Ladkin explains the background to the safety-engineering technique of hazard analysis (HazAn) in layman’s terms, as well as how one engineers a safety-critical system in general. He compares this ideal picture with what appears to have been done at Fukushima Dai-Ichi and draws some conclusions about safety engineering practice in general.

Chapter 3: The Nuclear Accident at the Fukushima Dai-Ichi Plant: Physics and Parameters, Bernd Sieker. Sieker explains the physics of nuclear power, and then analyses the daily data put out by the operator from its sensors. He concludes inter alia that there was likely only one cooling system at first, and then two, operating for the defueled Units 5 and 6. This suggests a reduction in “defence in depth” (with one system there is no “depth”) which did not cohere with the Japanese self-assessment that Units 5 and 6 suffered an INES Level 0 event. He argues that it should really have been Level 2.

Chapter 4: I’m Warning You, Lee Clarke. Clarke considers the social effectiveness of warnings (and rescinding warnings): what they are meant to do, and how they may operate. Who one trusts in issuing and commenting on warnings. He argues that there is a major question of institutional trust.

Chapter 5: Rationalising the Meltdown, John Downer. Downer argues that, when the public has been assured that a safety-critical system is “safe” and then an accident happens, there are only a very few public responses available to the operators and regulators. He lists and analyses them all.

Chapter 6: Fukushima as a Poster Boy, Charles Perrow. Perrow points out that much about this accident is commonplace or prosaic, and is all but inevitable when we have high concentrations of energy, economic power and political power. He enumerates the resulting phenomena to illustrate this typicality, and the risks we run by indulging it.

Chapter 7: Japan, a Tale of Two Natural Disasters, Stephen Mosley. In this short note, written for a collection of short essays organised by the research group on Communicating Disaster, meeting for the year at Bielefeld’s Centre for Interdisciplinary Research (ZiF), Mosley compares the 1891 Great Nobi earthquake with the 2011 Tohoku earthquake.

Chapter 8: The Destruction of Fukushima Nuclear Power Plant: A Decision Making Catastrophe?, Stefan Strohschneider. In another short note for the ZiF group Communicating Disaster, Strohschneider looks at the decision making in the immediate aftermath and finds it wanting.

Chapter 9: Judging from the color of smoke: What Fukushima tells us about Information Structure Breakdowns and IT development methodologies, Volkmar Pipek and Gunnar Stevens. Pipek and Stevens note that, with all the IT informational systems supposedly available to plant operators, 17 days after the initiation of the accident it was still not clear what had happened and what was going on. They suggest lessons to be learned in the design of such informational systems.

A Fukushima Diary, Peter Bernard Ladkin. Ladkin redacts many of his contributions to the mailing list, as a “diary” of the accident. Many themes arise, from engineering and sociological to political as well as the role of the press and various agencies of various governments as well as the UN, as well as commentary on the daily reports from the regulator about “progress” in dealing with the accident.

Authors: who they are and what they are professionally known for.

Bibliography: 423 items, most available on the World-Wide Web, although some newspaper reports seem to have disappeared from the public WWW at time of writing, and requests to the newspaper have not yielded replacements.