Kissinger on SDI and the Soviet Collapse

13 11 2015

I’ve been reading Henry Kissinger’s “summation” of international relations, World Order, which is as interesting and insightful as people have said.

He says of SDI that

[Reagan] challenged the Soviet Union to a race in arms and technology that it could not win, based on programs long stymied in Congress. What came to be known as the Strategic Defence Initiative – a defensive shield against missile attack – was largely derided in Congress and the media when Reagan put it forward. Today it is widely credited with convincing the Soviet leadership of the futility of its arms race with the United States.

He says later,

…without Reagan’s idealism – bordering sometimes on a repudiation of history – the end of the Soviet challenge could not have occurred amidst such a global affirmation of a democratic future.

By “Reagan’s idealism“, Kissinger explicitly means the idea of the “shining city on a hill“, which he says “was not a metaphor for Reagan; it actually existed for him because he willed it to exist.

Kissinger uses the “key people in positions of power” theory of the mechanisms of international relations while explaining the continuity of US foreign policy from Nixon through Ford, Carter and Reagan. Such an assertion of continuity might surprise those who were actually present during the period, but Kissinger’s argument for it is coherent, as one might expect.

Kissinger hedges his point about SDI by not actually appropriating it – he says “widely credited“, and that is correct, I think. But that doesn’t mean it’s fact.

Let me propose an alternative view, in which it was one of two major factors (amongst a plethora of others).

George Kennan foresaw how things would progress in 1947. It might be said that his view, more widely spread, established the Cold War and predicted its denouement. It had been clear for a long time by the mid-1980′s that US productivity, when channelled into military spending, could outrun that of the Soviet Union in the long term, but no one knew how long that term would be. I seem to recall some reports that the Soviets were putting 40% of their productivity into military kit, and for all anyone knew maybe they could raise that to 60%, because it could have been seen as more important than feeding people. Whereas there was no appetite in the US for even 20% spending on the military, after the Vietnam war.

SDI was in the first place an escalation of resource consumption. It wasn’t based on a Reagan decision alone; it was based more generally on fantasy in the US military, of which there was a plentiful supply. I remember an eminent colleague in the mid-80′s recounting a meeting with a USAF general officer whose vision consisted of a helmet which could read and execute the thoughts of a fighter pilot: “fly there, do that, shoot that; I just THINK about it and it happens“. Thirty years later, bits of that have been implemented. Whereas the SDI vision is having trouble achieving even a 50% success rate in one-on-one anti-ICBM-missile trials, according to the table in this 2014 article. Now, I suspect well-grounded Soviet military technologists knew as well as well-grounded US military technologists that SDI at that point in the 1980′s was fantasy. The arguments are not hard; they were, as expressed by David Parnas, convincing, true and public. Some people in the Soviet Union surely must have known that SDI was bluff.

So what was SDI’s role in the Soviet collapse? I suggest it may have been half of it. The other half was Reagan suggesting directly to Gorbachev that both sides could just scrap their nuclear missiles, and meaning it. The Soviet leadership realised they were playing with someone who was far wealthier, who could more or less bet anything he pleased at any point in the game, at whim. If you’re on welfare, and you’re playing poker with a millionaire who has just spent €10,000 in front of you on a tie because he didn’t like the one he was wearing, and he’s offering at the same time to stop the game, it’s not clear what you should best do but stopping right now must seem an attractive option.

And, of course, if Kennan was right, which apparently everyone now thinks he was, then the collapse would have happened anyway, with or without SDI. But it might have taken a bit longer. Then of course there was that bit about taking down a wall in Berlin that might have had something to do with it.

The Accident to SpaceShip Two

3 08 2015

Alister Macintyre noted in the Risks Forum 28.83 that the

US National Transportation Safety Board (NTSB) released results of their investigation into the October 31, 2014 crash of SpaceShipTwo near Mojave, California.

The NTSB has released a preliminary summary, findings and safety recommendations for the purpose of holding the public hearing on July 28, 2015. All those may be modified as a result of matters arising at the hearing. This is standard procedure for the Board.

Their summary of why the accident happened is

[SpaceShip2 (SS2)] was equipped with a feather system that rotated a feather flap assembly with twin tailbooms upward from the vehicle’s normal configuration (0°) to 60° to stabilize SS2’s attitude and increase drag during reentry into earth’s atmosphere. The feather system included actuators to extend and retract the feather and locks to keep the feather in the retracted position when not in use.

After release from WK2 at an altitude of about 46,400 ft, SS2 entered the boost phase of flight. During this phase, SS2’s rocket motor propels the vehicle from a gliding flight attitude to an almost-vertical attitude, and the vehicle accelerates from subsonic speeds, through the transonic region (0.9 to 1.1 Mach), to supersonic speeds. ….. the copilot was to unlock the feather during the boost phase when SS2 reached a speed of 1.4 Mach. …. However, …. the copilot unlocked the feather just after SS2 passed through a speed of 0.8 Mach. Afterward, the aerodynamic and inertial loads imposed on the feather flap assembly were sufficient to overcome the feather actuators, which were not designed to hold the feather in the retracted position during the transonic region. As a result, the feather extended uncommanded, causing the catastrophic structural failure.

This, the Board notes, represents a single point of catastrophic failure which could be instigated, was in this case instigated, by a single human error.

A hazard analysis (HazAn) is required by the FAA for all aerospace operations it certifies. It classifies effects into catastrophic, hazardous, major, minor and “no”, and certification (administrative law) requires that the probability of events in certain classes is ensured to be sufficiently low, through avoidance or mitigation of identified hazards.

HazAn is a matter of anticipating deleterious events in advance. The eternal questions for HazAn are:

  • Question 1. Did you think of everything? (Completeness)
  • Question 2. Does your mitigation/avoidance really work as you intend?

These questions are very, very hard to answer confidently. Imperfect HazAns are almost inevitable in novel operations. In aviation, sufficient experience has accumulated over the decades to ensure that the HazAn process fits the standard kinds of kit and operations and the answers to the questions are to a close approximation yes-yes. In areas in which there is no experience, for example use of lithium-ion batteries for main and auxiliary electric-power storage in the Boeing 787, answers appeared to be no-no . In commercial manned spaceflight, there is comparatively a tiny amount of experience. Certification of a new commercial transport airplane takes thousands of hours. Problems are found and usually fixed. SS1 and SS2 have just a few hours in powered spaceflight so far.

As soon as the accident happened it was almost inevitable that the answer to either Question 1 or Question 2 was “no”. The NTSB summary doesn’t actually tell us whether it was known that unlocking the booms too early would overstress the kit, but given Scaled Composites’ deserved reputation, as well as the strong hint from the NTSB that human factors were not sufficiently analysed, I would guess that the answer is yes; and the answer to Question 2 is partially no: the mitigation works unless the pilot makes an error under the “high workload” (performing many critical tasks under physical and cognitive stress) of transonic flight.

I emphatically don’t buy Macintyre’s suggestion that anyone “cut corners” on test pilot training and HazAn.

These are brand-new operations with which there is very little experience and (contrary to marketing) are inevitably performed at higher risk than operations with thousands or millions of hours accumulated experience. Nobody, in particular no one at Scaled, messes around in such circumstances. Scaled has a well-deserved reputation over three decades for designing radically new aerial vehicles to enviably high standards of safety. But things do sometimes go wrong. Voyager scraped a wingtip on takeoff and nearly didn’t make it around the world (they had 48kg of fuel remaining when they landed again at Edwards after nine days of flight in December 1986, enough only for a couple hours more). Three people were killed during a test of a rocket system in 2007 which was based on a nitrous oxide oxidiser, apparently a novel technology. OSHA investigated. An example of some public commentary is available from Knights Arrow. Scaled has been owned by Northrop Grumman since 2007 (before the rocket-fuel accident). And now a test pilot has lost his life and the craft by performing an action too early.

It may be more apt to note that, like many such analyses of complex systems with proprietary features, the HazAn for WK2/SS2 space operations is substantial intellectual property, whose value will increase thanks to the NTSB’s suggestions on how to improve it.

The purpose of the NSTB’s investigation is to look minutely at all the processes that enabled the accident and to suggest improvements that would increase the chances of a yes-yes pair of answers to the HazAn questions as well as all other aspects of safety. They said the human factors HazAn could be improved. Since human error was presumed to be the single point of failure, that conclusion was all but inevitable. The NTSB also suggested continuity in FAA oversight – the FAA flight-readiness investigation was carried out by different people for each flight so there was reduced organisational learning. As also some other stuff about how to improve the efficacy of oversight, and organisational learning such as the mishap database. And the NTSB suggested proactive emergency readiness by ensuring a rescue craft is on active standby (it usually was, but this wasn’t the case for the accident flight).

One wonders what else in the HazAn isn’t quite right. There are plenty of places to look (witness the Knights Arrow report above on the fuel choice). It doesn’t mean the HazAn is bad. But it will be improved. And improved, all with the goal of getting to yes-yes.

Volvo Has An Accident

5 06 2015

……. but not the one you thought!

Jim Reisert reported in Risks 28.66 ( Volvo horrible self-parking car accident) on a story in on 2015-05-26 about a video of an accident with a Volvo car, apparently performing a demo in the Dominican Republic. The story is by Kashmir Hill. Hill says “….[the video] is terrifying“. The video is linked/included in the piece.

The video shows a Volvo car in a wide garage-like area, slowly backing up, with people standing around, including in front of the vehicle. The car stops, begins to move forward in a straight line, accelerates, and hits people who did not attempt to move out of the way. Occupants are clearly visible in the car. The video is about half a minute long.

I didn’t find it terrifying at all. At first glance, I found it puzzling. Why didn’t people move out of the way? They had time.

Fusion reports comments from Volvo. I looked the story up using Google. Lots of articles, many of them derivative, and a reference to Andrew Pam’s corrective comment in Risks 28.67. From the better articles (in my judgement), one would crudely understand:

  • The car was being driven. What you see is not automatic.
  • It wasn’t a demo of self-parking. It was a purported demo of a collision-avoidance function.
  • The other-car collision-avoidance function is standard; the pedestrian-collision-avoidance function is an optional extra.
  • The demo car was not equipped with this optional function.

However, many of the articles still have “self-parking” in the headline or as part of the URL, and journalists asked why other-car collision-avoidance is standard, but pedestrian-collision-avoidance an optional extra. Surely, some journalists expect us to conclude, it would be more reasonable the other way around?

What Volvo actually said in response to journalists’ queries seems to be reasonable (see below). But they appear not to be controlling the narrative, and that is their accident. The narrative appears to be that they have a self-parking car which may instead accelerate into passers-by unless it is equipped with a $3,000 extra system to avoid doing so. And this is demonstrated on video. And this narrative is highly misleading.

Other-car/truck detection and avoidance is nowadays relatively straightforward. These objects are big and solid, have lots of metal and smooth plastic which reflects all kinds of electromagnetic and sound waves, and they behave in relatively physically-limited ways. People, on the other hand, are soft and largely non-metallic, with wave-absorbent outer gear, and indulge in, ahem, random walks. It’s a harder detection problem, and it is thereby much harder to do it reliably – you need absolutely no false negatives, and false positives are going to annoy driver and occupants. Such kit inevitably costs something.

But there is a laudable aspect to this commentary. Some, even many, journalists apparently think that pedestrian-collision avoidance systems should be standard, and are more important than other-car collision avoidance. I wish everybody thought like that!

Ten years ago, almost nobody did. I recall an invited talk by a senior staff member of a major car company at the SAFECOMP conference in Potsdam in 2004, about their collision-avoidance/detection/automatic-communication-and-negotiation systems and research. 45 minutes about how they were dealing with other vehicles. I asked what they were doing about pedestrians and bicycles. A 5-second reply: they were working on that too.

Pedestrians are what the OECD calls “vulnerable road users”. While accident rates and severities have been decreasing overall for some years, accident rates and severities for vulnerable road users have not – indeed, in some places they have been increasing. Here is a report from 17 years ago. The Apollo program, which is joint between the WHO and the EU, has a policy briefing ten years later (2008).

I am mostly a “vulnerable road user”. I have no car. My personal road transport is a pedelec. Otherwise it’s bus or taxi. Bicycle and pedelec road users need constantly to be aware of other road users travelling too fast for the conditions and posted speed limits, too close to you, and about to cut you off when you have right of way. As well as occasional deliberately aggressive drivers. All of which is annoying when you’re sitting inside another well-padded and designedly-collapsible shell, but means serious injury or death if you’re not.

I am all for people thinking that vulnerable-road-user detection and avoidance systems should be standard equipment on automotive road vehicles.

There are similar reports to that in Fusion also in:

as well as elsewhere. I like Caldwell’s Slashgear article far more than the others.

Andrew Del-Colle deals out a lengthy corrective in both Road & Track and in Popular Mechanics.

Three Volvo spokespeople are quoted in these articles: Johan Larsson (Fusion, and derivatively The Independent), Stefan Elfstroem (Slashgear and Money) and Russell Datz (Daily Mail). Volvo’s comment is approximately:

  • The car was equipped with a system called “City Safe” which maintains distance from other cars.
  • City Safe also offers a pedestrian-detection system, which requires additional equipment and costs extra money
  • The car was not equipped with this additional system
  • The car appears to be performing a demo. It is being driven.
  • The demo appears to be that of City Safe, not of the self-parking function.
  • The car was apparently being driven in such a way that neither of these systems was operational: the human driver accelerates “heavily” forwards.
  • When an active driver accelerates forwards like this, the detection-and-braking functions are not active – they are “overridden” by the driver command to accelerate
  • Volvo recommends never to perform such tests on real humans

All very sensible.

One major problem which car manufacturers are going to have is that, with more and more protective systems on cars, there are going to be more and more people “trying them out” like this. Or following what John Adams calls “risk homeostatis”, in driving less carefully while relying on the protective functions to avoid damage to themselves and others. I am also sure all the manufacturers are quite aware of this.

Cybersecurity Vulnerabilities in Commercial Aviation

18 04 2015

The US Government Accounting Office has published a report into the US Federal Aviation Administration’s possible vulnerabilities to cyberattack. One of my respected colleagues, John Knight, was interviewed for it. (While I’m at it, let me recommend highly John’s inexpensive textbook Fundamentals of Dependable Computing for Software Engineers. It has been very well thought through and there is a lot of material which students will not find elsewhere.)

None of what’s in the report surprises me. There are three main points (in the executive summary).

First, the GAO suggests the FAA devise a threat model for its ground-based ATC/ATM systems. (And, I presume, that the FAA respond to the threat model it devises.) I am one of those people who consider it self-evident that threat models need to be formulated for all sorts of critical infrastructure. One of the first questions I ask concerning security is “what’s the threat model?“. If the answer is “there isn’t one” then can anybody be suprised that this is first on the list?

Lots of FAA ground-based systems aren’t geared to deal with cybersecurity threats – many of them are twenty or more years old and cybersecurity wasn’t an issue in the same way it is coming to be. Many systems communicate over their own dedicated networks, so that would involve a more or less standard physical-access threat model. But many of them don’t. Many critical inter-center communications are carried over public telephone lines and are therefore vulnerable to attacks through the public networks, say on the switches. Remember when an AT&T 4ESS switch went down in New York almost a quarter century ago? I can’t remember if it was that outage or another one during which the ATCOs called each other on their private mobiles to keep things working. A human attacker trying to do a DoS on communications would probably try to take out mobile communications also. (So there’s the first threat for the budding threat model – a DoS on communications)

If the FAA don’t want to do a model themselves, couldn’t they just get one from a European ally and adapt it? The infrastructures aren’t that dissimilar on the high level and anything would be a help initially.

Second, when the FAA decided they were OK with the manufacturer putting avionics and passenger in-flight entertainment (IFE) data on the same databuses on the Boeing 787, many of us thought this premature and unwise and said so privately to colleagues (one of them even found the correspondence). We have recently had people claim to be able to access critical systems through the IFE (see below). I have reported on one previous credible claim on vulnerabilities in avionics equipment.

The GAO is suggesting that such configurations be thought through a little more thoroughly. The basic point remains: isn’t it abundantly clear that the very best way to ensure as much non-interference as possible is physical separation? Who on earth was thinking a decade ago that non-interference wouldn’t be that much of an issue? Certainly not me.

Third, the other matters the GAO addressed are organisational, which is important of course for the organisation but of little technical interest.

Concerning accessing critical avionics systems through the IFE, Fox News reports that Cyber security researcher Chris Roberts was pulled off a US commercial flight and interrogated by the FBI for a number of hours.

A colleague commented that “they are going after the messenger.” But let’s look at this a little more carefully.

Chris Roberts is CTO and founder of One World Labs in Denver. Staff at One World consist of a CEO who is a lawyer, a CFO and a VP of sales and marketing, and two technical employees, one of whom is Roberts, who is the company founder. The board appears to be well-balanced, with a former telecommunications-industry executive and a military SIGINT expert amongst others.

One World claims to have the “world’s largest index of dark content“, something called OWL Vision, to which they apparently sell access. One wonders how they manage to compile and sustain such a resource with only two technical people in the company, but, you know, kudos to them if it’s true.

According to the first line of his CV, Roberts is “Regarded as one of the world’s foremost experts on counter threat intelligence within the cyber security industry“. His CV consists of engagements as speaker, and press interviews – there is nothing which one might regard as traditional CV content (his One World colleagues provide more traditional info: degrees, previous work experience and so on). His notable CV achievements for 2015 are a couple of interviews with Fox.

Apparently he told Fox News in March, quoted in the article above, “We can still take planes out of the sky thanks to the flaws in the in-flight entertainment systems. Quite simply put, we can theorize on how to turn the engines off at 35,000 feet and not have any of those damn flashing lights go off in the cockpit…… If you don’t have people like me researching and blowing the whistle on system vulnerabilities, we will find out the hard way what those vulnerabilities are when an attack happens.

Read that first sentence again. He can take planes out of the sky due to flaws in the IFE, he says. Does it surprise anybody that the FBI or Homeland Security would want to find out exactly what he means? Maybe before he gets on a flight, taking some computer equipment with him? It is surely the task of security services to ensure he is not a threat in any way. If you were a passenger on that airplane, wouldn’t you like at least to know that he is not suicidal/paranoid/psychotic? In fact, wouldn’t you rather he got on the plane with a nice book to read and sent his kit ahead, separately, by courier?

It has been no secret for fourteen years that if you are going to make public claims about your capabilities you can expect security agencies nowadays to take them at face value. Would we want it otherwise?

Let us also not ignore the business dynamics. You have read here about a small Denver company, its products and claimed capabilities. I am probably not the only commentator. All at the cost to a company employee of four hours’ interrogation and the temporary loss of one laptop. And without actually having to publish their work and have people like me analyse it.

Germanwings 9525 and a potential conflict of rights

11 04 2015

Work continues on the investigation into the crash of Germanwings Flight 9525. I note happily that news media are reverting to what I regard as more appropriate phraseology. Our local newspaper had on Friday 27th March two-word major headline “Deadly Intention“, without quotation marks, and the BBC and Economist were both reporting as though an First Officer (FO) intention to crash the plane was fact. Written media are now reverting to what most of us would consider the formally more accurate “suspected of” phraseology. (For example, see the German article below.)

Flight International / Flightglobal had as main editorial in the 31 March – 6 April edition a comment deploring the way matters concerning the Germanwings crash are being publicly aired.

I read Flight as suggesting the Marseille procureur was abrupt. Many of us thought so at the time. An article from this week’s Canard Enchaine shows that part of the French (anti-)establishment agrees with that assessment, but for different reasons, concerning some political manoeuvring.

But Flight gets the logic wrong. The procureur was not announcing his “conviction” that the FO was “guilty” of…. whatever; neither was the announcement “surreal” by virtue of the fact that the FO was dead.

  • The procureur was not announcing the degree of his belief. He was making an accusation, in the usual formal manner using the usual formal terminology;
  • He was not judging the FO as “guilty”; that’s neither his job nor his right and he is obviously clear about that. Only a court can pronounce guilt.
  • It is not surreal: as Flight should be aware, in France prosecutions are brought, and are sometimes successful, after accidents in which everyone on board died, viz. Air Inter and Concorde. There is a case to be made that people at the airline had overlooked medical information on the FO which (would have) rendered him formally unfit to fly. There is the further possibility that there existed medical information relevant to his fitness to command a commercial airliner which was not shared with the relevant parts of the airline and/or regulator.

There is also a procedural aspect to the formal announcement by the Marseille procureur on Thursday 26th March which the Flight editorial ignores. Everyone knows the importance of preserving and gathering evidence quickly, in this case evidence about the FO. Presumably everyone agrees that it is a good thing. In order to set that process in motion, there need to be formal legal actions undertaken. The crash event took place within the jurisdiction of Marseille. Formal proceedings therefore need to be opened in Marseille and German legal authorities informed and cooperating in those proceedings in order to gather and preserve evidence in Germany. Obviously this needs to be done ASAP, because who knows how other people with immediate access to such materials are going to react. The question is whether proceedings have to be opened at a florid press conference. In this case it might have been hard to avoid.

In its editorial, Flight suggests the BEA is in a more appropriate position to gather evidence than prosecutors, and that they should be allowed to get on with that job. The other industry stalwart, Aviation Week and Space Technology, also says in a recent editorial that “We find more objectivity in accident investigators’ reports than in prosecutors’ statements.” I disagree. State attorneys’ offices and police are far more experienced at securing the kind of evidence likely to be relevant to the main questions about this crash than are aircraft safety investigators.

It seems to be the case that medical information relevant to the FO’s fitness was not distributed appropriately. For example, information concerning a 2009 depressive episode. The airline knew about this episode, and subsequently flight doctors have regularly judged him fit to fly (he regularly obtained a Class 1 medical certificate according to the annual renewal schedule). However, in April 2013 Germany brought into law the EU regulation that the regulator (LBA) must be informed and also determine fitness when an applicant has exhibited certain medical conditions. The LBA has said that it wasn’t so informed of the 2009 episode. (Here is a German news article on that, short and factual. It also laudably uses the “suspected” terminology.) If so, that seems to be an operational error for which the FO was not at all responsible in any way.

It is exactly right that the Marseille procureur along with his German counterparts is looking at all that and it is also right that that was undertaken very quickly.

There is a wider question. The confidentiality of German medical information is all but sacrosant. Its confidential status overrides many other possibly conflicting rights and responsibilities, and I understand this has been affirmed by the Constitutional Court. Pilots have an obligation to self-report, so medical confidentiality has not come into conflict with duty of care – yet. But what about a case when medical conditions indicating unfitness to fly are diagnosed, but the pilot-patient chooses not to self-report? The pilot flies for an airline; the airline has a duty of care. If something happens to a commercial flight which this pilot is conducting, which causes harm to the airline’s clients (passengers) and others (people and objects on the ground near a CFIT; relatives of passengers), then the airline has obviously not fulfilled its duty of care to those harmed: the pilot should not have been flying, but was. However, equally obviously, the airline was unable to fulfil its duty of care: it was deprived of pertinent knowledge.

Personality assessments are used by some employers in the US in evaluating employees. See, for example, the survey in the second, third and fourth paragraphs of Cari Adams, You’re Perfect for the Job: The Use and Validity of Pre-employment Personality Tests, Scholars journal 13; Summer 2009, along with the references cited in those paragraphs. It is not clear to me at this point whether it is legal in Germany to require potential employees to undergo such tests. (As I have indicated previously, I do think that some tests, such as MMPI, could identify extreme personality characteristics, which could be associated with future inappropriate behaviour when operating safety-critical systems, in some cases where these would not necessarily be picked up in the usual employee interviews.)

I suggest that this employee medical confidentiality/employer’s duty of care issue is a fundamental conflict of rights that won’t go away. It may be resolved but it cannot be solved. It may turn out that it is currently not so very well resolved in Germany. I would judge it a good thing if this one event opens a wider debate about the conflict.

Thoughts After 4U 9525 / GWI18G

4 04 2015

It is astonishing, maybe unique, about the Germanwings Flight 4U 9525 event how quickly it seems to have been explanatorily resolved. Egyptair Flight 990 (1999) took the “usual time” with the NTSB until it was resolved, and at the end certain participants in the investigation were still maintaining that technical problems with elevator/stabiliser had not been ruled out. Silk Air Flight 185 (1997) also took the “usual time” and the official conclusion was: inconclusive. (In both cases people I trust said there is no serious room for doubt.) There are still various views on MH 370, and I have expressed mine. However, it appears that the 4U 9525/GWI18G event has developed a non-contentious causal explanation in 11 days. (I speak of course of a causal explanation of the crash, not of an explanation of the FO’s behaviour. That will take a lot longer and will likely be contentious.)

A colleague noted that a major issue with cockpit door security is how to authenticate, to differentiate who is acting inappropriately (for medical, mental or purposeful reasons) from who isn’t. He makes the analogy with avionics, in which voting systems are often used.

That is worth running with. I think there is an abstract point here about critical-decision authority. Whether technical or human, there are well-rehearsed reasons for distributing such authority, namely to avoid a single point of decision-failure. But, as is also well-rehearsed, using a distributed procedure means more chance of encountering an anomaly which needs resolving.

What about a term for it? How about distributed decision authority, DDA. DDA is used in voted-automatics, such as air data systems. It is also implicit in Crew Resouce Management, CRM, a staple of crew behavior in Western airlines for a long time. Its apparent lack has been noted in some crew involved in some accidents, c.f., the Guam accident in 1997 or the recent Asiana Airlines crash in San Franciso in 2013. It’s implicitly there in the US requirement for multiple crew members at all times in the cockpit, although here the term “DDA” strains somewhat – a cabin crew member has no “decision authority” taken literally but rather just a potentially constraining role.

There are also issues with DDA. For example, Airbus FBW planes flew for twenty years with air data DDA algorithms without notable problems: just five ADs. Then in the last seven years, starting in 2008, there have been over twenty ADs. A number of them modify procedures away from DDA. They say roughly: identify one system (presumably the “best”) and turn the others off (implicitly, fly with just that one deemed “best”). So DDA is not a principle without exceptions.

A main question is what we need to do, if anything.

For example, consider the measures following 9/11. Did we need them and have they worked? Concerning need; I would say a cautious yes. (Although I note the inconvenience has led me to travel around Europe mainly by rail.) The world seems to contain more organisations with, to many of us, alien murderous ideologies. 9/11 was a series of low-technology, robust (multiple actors per incident) hijackings. Attempts have been made since to destroy airliners with moderate-technology and solitary actors (shoe bomber, underpants bomber, printer cartridge bombs) but these have all failed. They are not as robust; in each case, there was just one agent, and moderate-technology is nowhere near as reliable as low-technology: bombs are more complex than knives. One of them could have worked, but on one day in 2001 three out of four worked. It seems to me that, in general, we are controlling hijackings and hostile deliberate destruction moderately well.

After 4U 9525 do we need to do something concerning rogue flight crew? Hard to say. With the intense interest in the Germanwings First Officer’s background it seems to me likely that there will be a rethink of initial screening and on-the-job crew monitoring. Talking about the pure numbers, seven incidents in 35 years is surely very low incidence per flight hour, but then it’s not clear that statistics are any kind of guide in extremely rare cases of predominantly purposeful behavior. For example, how do we know there won’t be a flurry of copycat incidents? (I suspect this might be a reason why some European carriers so quickly instituted a “two crew in cockpit at all times” rule.)

What about classifying airlines by safety-reliability? A cursory look suggests this might not help much. Three, almost half, of murder-suicide events have been with carriers in Arnold Barnett’s high safety grade. Barnett has published statistical reviews of world airline safety from 1979 through recently (see his CV, on the page above, for a list of papers). His papers in 1979 and 1989 suggested that the world’s carriers divided into two general classes in terms of chances of dying per flight hour or per flight. Japan Air Lines, Silk Air (the “low-cost” subsidiary of Singapore Airlines) and Germanwings (the “low-cost” subsidiary of Lufthansa) are all in the higher class.

I consider it certain that DDA with flight crew will be discussed intensively, including cockpit-door technology. Also flight-crew screening and monitoring. What will come of the discussions I can’t guess at the moment.

Germanwings Flight 4U 9525

24 03 2015

19:15 CEST on Friday 3rd April

The BEA have recovered the Flight Data Recorder and read it. They issued a communiqué. Here is my translation of the pertinent paragraph:

At a first reading it appears that the pilot in the cockpit used the autopilot to command a descent to an altitude of 100 ft, then, numerous times during the descent, the pilot modified the autopilot setting to enhance the rate of descent.


  • There was an initial action to initiate a descent. This was surmised from the ADS-B readouts from Flightradar24, which showed an AP setting of FL 380, then an intermediate altitude setting of some 13,008 ft QNH, then an altitude setting of 96ft QNH (=100 ft, the lowest setting possible in the FCU). This very strongly suggested manual setting of the AP through the FCU, rather than, say, an automatic setting via the FMS. Apparently the manual setting action was heard on the CVR readout. The FDR confirms this manual setting action.
  • There followed mutiple subsequent manual actions coherent with the first action, to enhance the rate of descent.
  • I shall interpolating the communiqué to infer there were o manual actions inconsistent with this command to descend to 100ft.

I don’t see how multiple coherent actions over a period of time is consistent with the kind of brain event which I was considering up to now as a possibility. Mild epileptic-type events or stroke do not lead to coherent apparently-purposeful action. The actions usually don’t cohere at all.

This leaves just one possibility from the six I listed. The very-much-most-likely possibility is deliberate action, namely murder-suicide.

I am at best an amateur at psychology, but I did look hard at the DSM-IIIR a quarter-century ago and like to think I have kept in touch. This deliberate action was either extremely aggressive or extremely unempathetic towards the 149 other people involved. That surely points towards personality disorders which amateurs like me imagine could have been picked up by something like the MMPI. ‘Nuff said from me; others might continue this line of thought.

This is the worst mass murder in recent German history by far, and the fourth worst in recent European history (Srebrenica 1995 is by far the worst, 8,000+ lives taken; then comes Lockerbie, 1988, 270 lives taken; Madrid train bombings, 2004, 191 lives taken). Note that those other three were intended or actual acts of war.

07:40 CET on Friday 27th March

Two important points today. First, investigators have detailed apparently-deliberate actions by the First Officer to initiate a descent and keep the Captain from reentering the cockpit. Colleagues with some experience have said that it is premature to rule out actions in course of experiencing a stroke (Schlaganfall in German). Second, the workings of the cockpit door locking mechanism, and the policies concerning a pilot leaving the cockpit have come into question. I explain the operation of the A320 cockpit locking below.

First, terminology. Everybody is writing “pilot”, “co-pilot”. The usual term is Captain (CAP) and First Officer (FO), referring to the command roles. The term “pilot” informally refers to the person flying the airplane at a given time, known as Pilot Flying (PF). The other cockpit-crew member is the Pilot Non-Flying (PNF). In this incident, the PF appears to have been the FO.

French investigators have said that the Captain left the cockpit, with the First Officer, the Pilot Flying, remaining at the controls, alone in the cockpit. That shortly afterwards a descent was initiated by – I am here interpolating with some knowledge of the A320 – dialing an “open descent” into the FCU (the autopilot control unit just under the glare shield). An A320-rated colleague says you can put, say “100ft” target altitude, and activate, and the aircraft will go into open descent with engines at flight idle at about 4,000 feet per minute right down to 100 ft altitude, i.e. the ground here. In other words, twist and pull one knob.

I would emphasise here that such autopilot systems are not unique to the Airbus A320 but are to be found on most commercial transport aircraft nowadays.

Now to the first major issue. Concerning stroke versus deliberation action, a colleague was present when someone 29 years old had a haemorrhagic stroke.

Inside 30 minutes he went from conversing like normal; to weirdly reticent and uncoordinated; to silently sitting on a bed, clutching an aspirin bottle like a crazy person, totally unresponsive to the world. And in that time he managed to open a laptop and hammer-out an email full of utter nonsense, all for reasons that are still totally lost on him.

During such an event, one may well continue “breathing normally”, as the French press conference is reported to had said the First Officer did.

So it seems to be possible that a confused FO in the course of experiencing a stroke dialed an open descent into the FCU, maybe imagining he has to land. I am a little surprised that medical experts have not yet pointed such phenomena clearly out. It does suggest that concluding murder-suicide is premature at this stage.

It has also been suggested that the FO secured the cockpit door against being opened from outside (that is, he activated the third function below). Evidence for this is that the emergency-entry thirty-second buzzer did not sound. Maybe. No one has yet said whether there is evidence that the Captain in fact tried to activate the emergency-entry function via PIN (the second function below). Apparently he knocked on the door and continued to knock; but nothing else has been said.

Second major issue: the cockpit door locking. The cockpit door on the A320 in normal flight is permanently locked. There are three technical functions. First function: on the central console between the pilots there is a toggle switch which opens the door when it is used: it must be held in “open” position and reverts to door-locked when released. I emphasise: a pilot must “hold the door switch open” for the door to open, and it locks again when heshe releases the switch. Second function: there is a keypad mounted in the cabin outside the cockpit by the cockpit door. Someone standing outside can “ring” (press a key) to activate a ringing tone in the cockpit. Or, of course, knock on the door. The Pilot Flying (or another person) can then use the first function to open the door, and the person outside can then enter. Suppose that does not happen for some reason. Then the person outside can enter a PIN code into the keypad (heshe must have knowledge of the PIN code). A warning sound activites in the cockpit for thirty seconds, at the end of which the door unlocks for 5 seconds, when the waiting person can enter, and then reverts to locked. This second function addresses the issue of the incapacitation of the pilot or occupation with other urgent tasks. The third function is a deactivation: by using a switch in the cockpit, the second function can be deactivated for a preselected period of time (the Operating Manual says between five and twenty minutes; colleagues understand that on the Germanwings aircraft it was five minutes). That means that for this period of time, even use of the PIN code outside does not unlock the cockpit door for entry. The cockpit door can still be unlocked during this time by using the first function, the “unlock” toggle switch. This third function addresses the possibility that a hostile person could physically threaten someone outside the cockpit with knowledge of the PIN code (say CAP or FO who went to the toilet in the cabin) in order to gain entry via the second function.

This is the operation of the door locking/unlocking functions on the Airbus A320. We have not checked and compared with other aircraft.

I am told there is a rule in the USA that there must be two crew members in the cockpit at all times. So if CAP or FO leaves, a cabin-crew member must enter and stay until the cockpit crew member returns. This is not necessarily so in European commercial flying. As far as I know it is consistent with Germanwings operating rules that the PNF can leave the cockpit briefly under certain conditions, leaving just the PF within. (I omit discussion here of why’s and wherefores.)

It seems almost certain that there will be considerable technical discussion of whether these cockpit-door-locking procedures and rules are appropriate or need to be modified. I observe that the BBC has listed three apparent-murder-suicide events in commercial flight the last few decades (I do not know of more), and this might be a fourth (I emphasise again the word “might”). And in at least one of those incidents, the cockpit remained accessible to those outside. In contrast, on one day alone in 2001, four cockpit crews were overwhelmed by attackers from the cabin, and since the door-locking rules have been in force, none subsequently have been. And before that day in 2001, there were many instances of hostile takeover of an aircraft (“hijacking”). So arguments for and against particular methods and procedures for locking cockpit doors in flight are not trivial.

Finally, there seems to be a mistake in one of my observations below. The flight path corresponds more nearly to a 6° descent angle. This is steep, but within the normal range. London City airport has an approach glide path of 6°, and A320-series aircraft fly out of there (although, I believe, not the A320 itself). (Calculation, for the nerds like me: 1 nautical mile = about 6,000 ft so 1 nm/hr = about 100 feet per minute (fpm). So 400 knots airspeed = about 40,000 fpm. Flying at 400 kts and descending at 4,000 fpm is a slope of 1 in 10, which corresponds roughly to one-tenth of a radian which is about 6°.)

07:27 CET on Thursday 26th March

John Downer suggested the possibility of

  • an inadvertent behavioral event that did not obviously fit into my classification below. He quotes a colleague on the regular occurrence of the highly unusual: “as Scott Sagan put it: stuff that’s never happened before happens all the time“.

Inadvertent behaviour would likely involve one pilot leaving the cockpit, and the other suffering a medical event. I could then see two ways to achieve the regular flight path: engaging descent mode in the FCU at 4,000 fpm or 3° descent profile (note: Friday 27th March – I think this should be 6°!) or retarding the throttles in speed-hold mode.

Since the throttles are forward at high cruise, I think that slumping on them would cause them to advance, if anything, not to retard. John informs me that, during a stroke, people can become very confused. Thereby manipulating the FCU or retarding the throttles does not seem out of the question. Many thanks to John for pointing out this possibility which didn’t fit into my classification below!

Karl Swarz made us aware of the NYT report Germanwings pilot was locked out of cockpit before crash by Nicola Clark and Dan Bilefsky. Karl had sent the note before my conversation with John, but I hadn’t yet read it. It seems this is a scoop – there is also a similar report today in The Guardian but it cites the NYT.

There is some preliminary unconfirmed information from the CVR read out. One pilot did leave the cockpit and could not reenter during the event. There is, as currently analysed, no indication of a reaction from the pilot flying. We may presume that the analysis will become much more precise. It seems the commentators cited by the NYT are ruling out cabin depressurisation; that eliminates one of the (now) six possibilities. It seems to me likely that many of the others will be quickly ruled out.

19:04 CET on Wednesday 25th March.

Update: there is no more information on the behavior of the flight than I reported yesterday (below).

There is discussion of possibilities, and whether my classification is right. It is appropriate and necessary that there should be such discussion. Here, in the next paragraph, is some.

A colleague has suggested that the crew could have been overcome by carbon monoxide in the bleed-air from the engines (which is used to pressurise the aircraft). It has happened before that crew has been overcome by something. In each case, the flight has continued as configured until fuel is exhausted, and then come down. So if this happened here, why did the flight not continue at FL380 until the fuel was exhausted? Another colleague has suggested that the descent rate almost exactly corresponds to a descent profile of 3°, which is normal descent profile for (say) an ILS approach. OK, but why would a crew in cruise flight, continuing cruise enroute to Düsseldorf, change the autopilot setting to a descent profile?

Somebody said on Twitter this morning, in response to my interview with a radio station in Hessen, that enumerating possibilities is speculation and one should just let the investigators do their job (and presumably deliver results).

First, this misunderstands how things are investigated. Speculation is a major component of investigation – one supposes certain things, and tries to rule them out or keep them as active possibilities. And one carries on doing this until possibilities are reduced as far as possible, ideally down to one.

Second, each technology is constrained in behavior. Airplanes can’t suddenly turn left and crash into a lane separator. Cars can’t suddenly ascend at 4,000 feet per minute. Bicyles can’t stop responding to input and show you the blue screen of death. How each artefact can behave in given circumstances is constrained. And even further when there is a given partial behavioral profile. Why not attempt to write that down? If it’s wrong, someone will say so and it can be corrected.

Third, such a process obviously works most efficiently when experts with significant domain knowledge attempt to write it down and other such experts correct. And most inefficiently when people with little domain knowledge write down what they are dreaming, and attempt to argue with those who suggest their dreams are unrealistic. It’s a social process, which works better or worse, but I see no reason why it should generally be deemed inappropriate. Speculation is a necessary component of explanation.

18:42 CET on Tuesday 24th March.

Here is what I think I know at this point.

Germanwings Flight 4U 9525 has crashed against an almost vertical cliff in the Alps. The Flight was enroute from Barcelona to Düsseldorf and took the route which had been flown the day before. At about 0931Z (=10:31CET) he was at FL380 in level flight and started a descent at a rate of about 4,000 feet per minute, which continued more or less constant until about 7,000 ft altitude, when he levelled off. The descent lasted until 0941Z (=10:41CET).

He continued level for either 1 minute or 11 minutes. Contact was reported to have been lost at 0953Z. Such basic facts are often unclear in the first 24 hours, even though they appear to come from reliable sources.

I see five possible contributing events, not all mutually exclusive:

  • Loss of cabin pressure. A crew should react by starting a descent at about this rate, but the descent should have stopped before 7,000 ft altitude;
  • Fire. The crew would wish to descend and land as soon as possible. Emergency descents in excess of 4,000 feet per minute are possible, especially at higher altitudes, and a crew in a hurry to land, as in a case of fire on board, could have been expected to do so;
  • Dual engine problems, maybe flameout. Descent at best-glide speed, though, I have been informed is somewhere between 2,000 and 3,000 feet per minute. One would not wish to come down faster, since the more time one has to troubleshoot, and then to try to restart, the better
  • An air data problem affecting the handling of the aircraft. Recent air data problems with these aircraft, as well as with A330 and A340 aircraft that have almost-identical air data sensorics, during cruise and other phases of flight have occurred since 2008 and there have been a series of Airworthiness Directives from EASA and the FAA in this period, including recent Emergency Airworthiness Directives within the last few months. However, one would expect aircraft behavior associated with such a problem not to last nine minutes at constant, moderate rate of descent
  • Hostile – and criminal – human action on board

I’ve already given a TV interview in which I only mentioned four of these five. Such is life. Are there more?

In a number of these cases, one would expect a crew to turn towards a nearby adequate airport for landing, such as Marseille. One would certainly not expect them to continue flying towards high mountains! In particular, towars the Alps at 7,000 ft. So the question is raised whether the crew was or became incapacitated during the event.

I’ll update when I know more.

PBL 1800Z/1900CET

Fault, Failure, Reliability Definitions

4 03 2015

OK, the discussion on these basic concepts continues (see the threads “Paper on Software Reliability and the Urn Model”, “Practical Statistical Evaluation of Critical Software”, and “Fault, Failure and Reliability Again (short)” in the System Safety List archive.

This is a lengthy-ish note with a simple point: the notions of software failure, software fault, and software reliability are all well-defined, although it is open what a good measure of software reliability may be.

John Knight has noted privately that in his book he rigorously uses the Avizienis, Laprie, Randell, Landwehr IEEE DSC 2004 taxonomy (IEEE Transactions on Dependable and Secure Computing 1(1):1-23, 2004, henceforth ALRL taxonomy), brought to the List’s attention by Örjan Askerdal yesterday, precisely to be clear about all these potentially confusing matters. The ALRL taxonomy is not just the momentary opinion of four computer scientists. It is the update of a taxonomy on which the authors had been working along with other members of IFIP WG 10.4 for decades. There is good reason to take it very seriously indeed.

Let me first take the opportunity to recommend John’s book on the Fundamentals of Dependable Computing. I haven’t read it yet in detail, but I perused a copy at the 23rd Safety-Critical Systems Symposium in Bristol last month and would use it were I to teach a course on dependable computing. (My RVS group teaches computer networking fundamentals, accident analysis, risk analysis and applied logic, and runs student project classes on various topics.)

The fact that John used the ALRL taxonomy suggests that it is adequate to the task. Let me take John’s hint and run with it.

(One task before us, or, rather, before Chris Goeker , whose PhD topic is vocabulary analysis, is to see how the IEC definitions cohere with ALRL. I could also add my own partial set to such a comparison. )

Below is an excerpt from ALRL on failure, fault, error, reliability and so forth, under the usual fair use provisions.

It should be clear that a notion of software failure as a failure whose associated faults lie in the software logic is well defined, and that a notion of software reliability as some measure of proportion of correct to incorrect service is also possible. What the definitions don’t say is what such a measure should be.

This contradicts Nick Tudor’s suggestion in a List contribution yesterday that “software does not fail ….. It therefore makes no sense to talk about reliability of software“. Nick has suggested, privately, that this is a common view in aerospace engineering. Another colleague has suggested that some areas of the nuclear power industry also adhere to a similar view. If so, I would respectfully suggest that these areas of engineering get themselves up to date on how the experts, the computer scientists, talk about these matters, for example ALRL. I think it’s simply a matter of engineering responsibility that they do so.

In principle you can use whatever words you want to talk about whatever you want. The main criteria are that such talk is coherent (doesn’t self-contradict) and that the phenomena you wish to address are describable. Subsidiary criteria are: such descriptions must be clear (select the phenomena well from amongst the alternatives) and as simple as possible.

I think ALRL fulfils these criteria well.

[begin quote ALRL]

The function of such a system is what the system is intended to do and is described by the functional specification in terms of functionality and performance. The behavior of a system is what the system does to implement its function and is described by a sequence of states. The total state of a given system is the set of the following states: computation, communication, stored information, interconnection, and physical condition. [Matter omitted.]

The service delivered by a system (in its role as a provider) is its behavior as it is perceived by its user(s); a user is another system that receives service from the provider. [Stuff about interfaces and internal/external states omitted.] A system generally implements more than one function, and delivers more than one service. Function and service can be thus seen as composed of function items and of service items.

Correct service is delivered when the service implements the system function. A service failure, often abbreviated here to failure, is an event that occurs when the delivered service deviates from correct service. A service fails either because it does not comply with the functional specification, or because this specification did not adequately describe the system function. A service failure is a transition from correct service to incorrect service, i.e., to not implementing the system function. …… The deviation from correct service may assume different forms that are called service failure modes and are ranked according to failure severities….

Since a service is a sequence of the system’s external states, a service failure means that at least one (or more) external state of the system deviates from the correct service state. The deviation is called an error. The adjudged or hypothesized cause of an error is called a fault. Faults can be internal or external of a system. ….. For this reason [omitted], the definition of an error is the part of the total state of the system that may lead to its subsequent service failure. It is important to note that many errors do not reach the system’s external state and cause a failure. A fault is active when it causes an error, otherwise it is dormant.

[Material omitted]

  • availability: readiness for correct service.
  • reliability: continuity of correct service.
  • safety: absence of catastrophic consequences on the
    user(s) and the environment.
  • integrity: absence of improper system alterations.
  • maintainability: ability to undergo modifications
    and repairs.

[end quote ALRL]

Fault, Failure, Reliability Again

3 03 2015

On the System Safety Mailing list we have been discussing software reliability for just over a week. The occasion is that I and others are considering a replacement for the 18-year-old, incomplete, largely unhelpful and arguably misleading guide to the statistical evaluation of software in IEC 61508-7:2010 Annex D. Annex D is only four and a half pages long, but a brief explanation of the mathematics behind it and the issues surrounding its application resulted in a 14pp paper called Practical Statistical Evaluation of Critical Software which Bev Littlewood and I have submitted for publication. Discussion in closed communities also revealed to me a need to explain the Ur-example of Bernoulli processes, namely the Urn Model, introduced and analysed by Bernoulli in his posthumous manuscript Ars Conjectandi of 1713, as well as its application to software reliability in a paper called Software, the Urn Model, and Failure.

This discussion about statistical evaluation of software has shown that there is substantial disagreement about ideas and concepts in the foundations of software science.

On the one hand, there are eminent colleagues who have made substantial careers over many decades, written seminal papers in software science and engineering, and published in and edited the most prestigious journals in software, on the subject of software reliability.

On the one hand, there are groups of engineers who say software cannot fail. They don’t mean that you and I were just dreaming all our struggles with PC operating systems in the ’90′s and ’00′s, that those annoying things just didn’t happen. They mean that, however you describe those frustrating events, the concept of failure doesn’t apply to software. It is, as Gilbert Ryle would have said, a category mistake.

I knew that some people thought so twenty years ago, but I had no idea that it is still rife in certain areas of software practice until I was informed indirectly through a colleague yesterday. I have also been discussing, privately, with a member of the System Safety List who holds this view. I have encouraged him to take the discussion public, but so far that hasn’t happened.

The Urn Model can be considered a trope introduced by one man 300 years ago and still not well understood today. Yesterday, I noted another 300-year-old trope that was recognised as mistaken nearly a half century later, but still occurs today without the mistake being recognised, and which I regularly encounter. That is, John Locke’s account of perception and Berkeley’s criticism, which is regarded universally today as valid. It occurs today as what I call the “modelling versus description” question (I used to call it “modelling versus abstraction”), and I encounter it regularly. Last month at a conference in Bristol in questions after my talk (warning, it’s over 50MB!); and again yesterday in a System Safety List discussion. I don’t know when the trope calling software failure a category mistake got started (can someone advise me of the history?) but it’s as well to observe (again) how pointless it is, as follows.

Whatever the reasons for holding that “software cannot fail” as a conceptual axiom, it should theoretically be easy to deal with. There is a definition of something called software failing in the papers referenced above, and I can obviously say it’s that conception which I am talking about. You can call it “lack of success“, or even flubididub, if you like, the phenomenon exists and its that about which I – and my eminent colleagues whose careers it has been – are talking. Furthermore, I would say it’s obviously useful.

Another approach is to observe that the concept of software failure occurs multiple times in the definitions for IEC 61508. So if you are going to be engineering systems according to IEC 61508 – and many if not most digital-critical-system engineers are going to be doing so – it behooves you to be familiar with that concept, whatever IEC 61508 takes it to be.

There is, however, a caveat. And that is, whether the conceptions underlying IEC 61508 are coherent. Whatever you think, it is pretty clear they are not ideal. My PhD student Christoph Goeker calculated a def-use map of the IEC 61508 definitions. It’s just under 3m long and 70cm wide! I think there’s general agreement that something should be done to try to sort this complexity out.

What’s odder about the views of my correspondent is that, while believing “software cannot fail“, he claims software can have faults. To those of us used to the standard engineering conception of a fault as the cause of a failure, this seems completely uninterpretable: if software can’t fail, then ipso facto it can’t have faults.

Furthermore, if you think software can be faulty, but that it can’t fail, then when you want to talk about software reliability, that is, the ability of software to execute conformant to its intended purpose, you somehow have to connect “fault” with that notion of reliability. And that can’t be done. Here’s an example to show it.

Consider deterministic software S with the specification that, on input i, where i is a natural number between 1 and 20 inclusive, it outputs i. And on any other input whatsoever, it outputs X. What software S actually does is, on input i, where i is a natural number between 1 and 19 inclusive, it outputs i. When input 20, it outputs 3. And on any other input whatsoever, it outputs X. So S is reliable – it does what is wanted – on all inputs except 20. And, executing on input 20, pardon me for saying so, it fails.

That failure has a cause, and that cause or causes lie somehow in the logic of the software, which is why IEC 61508 calls software failures “systematic”. And that cause or causes is invariant with S: if you are executing S, they are present, and just the same as they are during any other execution of S.

But the reliability of S, namely how often, or how many times in so many demands, S fails, depends obviously on how many times, how often, you give it “20″ as input. If you always give is “20″, S’s reliability is 0%. If you never give it “20″, S’s reliability is 100%. And you can, by feeding it “20″ proportionately, make that any percentage you like between 0% and 100%. The reliability of S is obviously dependent on the distribution of inputs. And it is equally obviously not functionally dependent on the fault(s) = the internal causes of the failure behavior, because that/those remain constant.

The plea is often heard, and I support it, to take steps to turn software engineering into a true engineering science. That won’t happen if we can’t agree on the basic phenomena concerning success or failure – call it lack of success if you like – of software execution. And, even if we do agree on the phenomena, not being able to agree on words to call them by.

Quantitative Easing and Helicopter Money

30 10 2014

The US Federal Reserve Bank is to end its programme of quantitative easing (QE). QE was introduced by former chairman Bernanke as a response to the financial crash starting in 2008 (or 2007, or whenever). A colleague asked me a few years ago if I understood QE.

I didn’t. Now I (am under the illusion that I think I) do.

I am prompted to write about it because last week I came across a startlingly readable book by British-educated economist Roger E. A. Farmer, who is at UCLA and still advises the Bank of England as well as various branches of the U.S. Federal Reserve. He won the inaugural 2013 Allais prize, with coauthors Carine Nourry and Alain Venditti of the Université Aix-Marseilles for a paper on why financial markets don’t work well in the real world. Maurice Allais won the 1988 Nobel Memorial Prize, and is most well known for his paradox. He was at the Ecole des Mines in Paris, whence in an earlier era the great mathematician Henri Poincaré entered on his program to synchronise the world’s clocks (Peter Galison, Einstein’s Clocks, Poincaré’s Maps, W. W. Norton, 2003).

Where was I? Roger E. A. Farmer, How the Economy Works, Oxford University Press, 2014.

The point of QE is as follows. Central banks such as the Fed and BoE traditionally attempt to regulate the rate of inflation in the economy by buying three-month government bonds in the market. The price they are prepared to pay says what they intend overall prices to do in the next three months. But suppose the interest rate, as given by government bonds, is 0, or very close to 0 . Who’d care about selling bonds to the Fed or BoE? It takes somebody some work (keys have to be pushed on computer terminals and so on) for no gain at all.

So maybe there’s something else to be bought which could influence economic activity? Say, long-term government bonds, and commercial paper. The centrals buy it from banks which hold it. That’s QE. Commercial paper is short-term bonds issued by companies rather than government. Now, the central bank doesn’t actually have to cash in that paper, and neither did the banks which previously held it. But if your paper is held by the BoE rather than, say, Lehmann Brothers, you as a company have a little more security that no one is going to come after your money soon. In other words, your loan has become a retrospective government grant. And the banks which held all those long-term bonds get to cash them out right away – long-term has become short-term. (They could always do that on the markets, but on worse terms; the central bank is working by fiat.)

So, obviously, under QE more cash gets injected into the economy. It’s called “liquidity” (liquidity is stuff which can be moved around the economy quickly. Like cash. People judge the health of economies by the “velocity of money” and it’s publicly measured by, for example, the St. Louis Fed). There was a big discussion whether the 2008 crisis was caused by illiquidity or insolvency. Michael Lewis proposed a third possible factor: greed. This was so obviously right that nobody argued. (Greed, BTW, is not a concept that you find in most economics textbooks. Adam Smith is said to have shown that “greed is good” and it thereafter disappeared from the vocabulary of “serious” economics, or so Wall Street would have us believe. Not all of us agree with that, especially after 2008.) Anyway, this placates the illiquidity proponents, leaving the legal system and courts free to deal with the insolvency bit (um, did that happen? Oh, I forgot: “too big to fail”. That’s all right then).

The big idea is that “the people who know where the money should go”, namely the banks which sold their paper and long-term bonds to the central bank, are then giving it away to those who need it for good purpose. Say, British manufacturers. So British manufacturing should have been on the up and up. Well, it’s a bit up, I understand. So who did go on the up and up? The London housing market in particular and the British housing market in general. Well, duh, isn’t that where we came in?

There is a short article in The Guardian today by one of their economics commentators, Larry Elliot. Elliot compares QE with a mooted alternative, “helicopter money”. That’s Milton Friedman’s idea that you could fly helicopters over the country and drop money from them. Or, more soberly, as Farmer says (I think it’s somewhere in his book), the central bank writes a check for £2,700 to every man, woman, androgyne and child in the country. The idea is they go spend it where they want and thereby demonstrate true demand, which has been thwarted or at least distorted by the crisis. As we might guess, everyone will go out and buy undersea cables, high-performance aircraft engines, and the occasional bit of railway signalling and new track, won’t they? It would be just toooo Victorian to imagine it’ll go on drugs, sex and booze, but if it does, the British government might not be quite so furious at the consequences of sex and drugs now being included in the national accounts.

Actually, provided that Vodaphone can in the future find a way around charging people £15,000 for an evening’s phone “usage”, it might well be spent on telecommunications and digital entertainment, that is, demand-attentive networks and the iTunes store. Yes, those phones and tablets are made elsewhere, and (not-)taxed in Ireland, but they all use ARM chip designs from Cambridge. So the British public wins: sex, drugs AND rock n’ roll, all now in the national accounts!

You see, I do understand QE. Don’t I? I suppose you’re thinking “he’s being silly”. But go a little further. If you suppose “he’s being silly, but come to think of it nobody can really do much better”, I could imagine you have thereby deemed me An Economist!