Software Quality and Fitness for Purpose

26 08 2011

Following on to my recent post on certification requirements for commercial aircraft, John Rushby and I have been discussed a paper of his, on commercial aircraft software and the guidelines DO178B, in the invited session on certification at EMSOFT 2011.

John is concerned with whether DO178B “works”, that is, leads to high-quality code which is fit for purpose, and, if so, why and how. I think that is a hard and important question and I commend his bravery in addressing it squarely (rather than hiding behind a blog format, as I do :-) ). I commend his paper to people when he publishes it – I imagine it will be on his publications page sometime in October 2011. The paper is not long, but it is dense. Rather as if it were a poem, I had to read it multiple times, carefully. It took me a week to respond. I concluded I do really prefer Goethe, but then he didn’t talk about avionics SW.

John suggests that DO178B is focused on assuring the correctness of the executable code. I found that surprising; I think both Martyn Thomas and I are concerned that DO178B is focused largely on processes which people thought and still think (often, we claim, without much scientific evidence) correlate with code that is fit for purpose.

I use the term “fit for purpose” as does Martyn. John suggests that is a British term not used in the US. I prefer to use it to using the term “correct”, so let me translate. There are at least two ways in which code may be said to be “correct”. One is that it fulfils its specification; let us call this correct-1. The second is that the code causes the system to behave in a manner appropriate for the task at hand; let us call this correct-2. I call the correctness-1 properties of code its quality, and correctness-2 properties fitness for purpose. But let me say correct-1 and correct-2 for the remainder of this note.

The specification may not subsume the “task at hand” in all cases: quality, correct-1, does not imply fitness for purpose, correct-2. Indeed, there is good evidence dating back some twenty years now, say, in work of Robyn Lutz published in 1993 , that most failures of correctness-2 in moderate to complex systems are not failures of correctness-1.

Martyn and I would agree on the lack of evidence that “clean and tidy” SW development processes entail any concrete property of the resulting code (such as fitness for purpose). Martyn would also suggest, citing work of Andy German of QinetiQ, that higher DALs in DO178B do not necessarily correlate with higher-quality software. Here is what Martyn says, information from personal communication with Andy:


Here is some data from the formal analysis of the avionics software of a current American military aircraft that was certified against DO-178B Levels A and B for use in civil airspace.

The following defects were among those reported in the software after certification:

Erroneous signal de-activation.
Data not sent or lost
Inadequate defensive programming with respected to untrusted input data
Warnings not sent
Display of misleading data
Stale values inconsistently treated
Undefined array, local data and output parameters
Incorrect data message formats
Ambiguous variable process update
Incorrect initialisation of variables
Inadequate RAM test
Indefinite timeouts after test failure
RAM corruption
Timing issues – system runs backwards
Process does not disengage when required
Switches not operated when required
System does not close down after failure
Safety check not conducted within a suitable time frame
Use of exception handling and continuous resets
Invalid aircraft transition states used
Incorrect aircraft direction data
Incorrect Magic numbers used
Reliance on a single bit to prevent erroneous operation

The worst module had a defect density greater than 1 defect in 10 lines of code. the best had 1 defect in 250 lines.

The problem, as I see it, is that most software assurance relies on testing, although we have known for at least 40 years that testing can only show the presence of errors and not their absence. Until software assurance is mostly based on mathematically rigorous analysis of the software (which can be done at no increase in cost if the software is developed with this in mind) these unacceptable rates of software defects will continue.

Notice that these are errors in the sense of correctness-1.

I hold it to be significant that a number of these errors could not have occurred had the SW been written in a strongly-typed language and the compiler correctly implemented the strong typing. This has been an issue for forty years, ever since the Algol project gave up. This is one of the demonstrably best-known ways to avoid certain well-defined classes of program error. If DO178B is truly focused on the correctness of the implemented code, why doesn’t it require development in strongly-typed source, and use of a demonstrated type-correct compiler?

Andy also claimed in a paper in Crosstalk 16(11), the Journal of Defence Software Engineering, November 2003 that “no significant difference” had been found with respect to levels of correctness-1 in code developed according to DO178A Design Assurance Level A and Level B. Development according to DAL A is regarded as significantly more resource-intensive that development to DAL B, in part because DAL A requires so-called MC/DC testing (see the helpful tutorial by Kelly Hayhurst, pointed out to me by Mike Holloway), which is quite hard work.

BTW, that edition of Crosstalk also includes a fine article on the so-called Ravenscar profile, for interprocess communication in Ada which admits straightforward static analysis, by Brian Dobbing and Alan Burns.

The big UK certification effort on which I understand much of the QinetiQ work was performed was for an aircraft manufactured by Lockheed Martin (BTW, one barely calls them “manufacturers” any more, but rather “system integrators”). John said by way of anecdote that he had indications that internal data of both Boeing and Airbus show “more issues” with DAL B software than with DAL A software.

Assuming these observations are correct, the question here would be how two experienced companies develop quality-improved software using the extra requirements for DAL A, but a third experienced company does not see any improvement.

The answer must be that there are hidden factors at work, factors which actually do lead to an improvement in SW quality, which in two companies are associated with the extra effort required for DAL A development, but which in a third company are not.

Since DO178B misses those factors (for otherwise all companies would show improved quality in DAL A development over DAL B development), isn’t it important to find out what they are, and then write them explicitly into DO178C, which is currently in its final stages?

BTW, if you want my view on what an ideal SW safety standard should say (thank you for asking :-) ), check out slide 22 of my Ada Connection keynote talk of 21 June 2011.

I might point out that it is much shorter than the 150pp of IEC 61508 Part 3 Version 2, which I mutter about on the previous slide.



Coda, Interdisciplinary Work, and Scientific Publishing

15 08 2011

It sounds like a mish-mash, doesn’t it? will probably read like a mish-mash, too.

Because true interdisciplinary work always looks that way, I think. That is one of the main points I wish to get across. But first, let me get there.

Concerning my last post, Leslie noted that the condition he labels “FAA requirement” in his slide 4, for 10-10 probability of failure per hour was actually a NASA requirement for the SIFT research. SIFT was the first digital flight control computer, and SRI was supposed formally to verify its operating system. The project didn’t succeed in this original goal, over a decade but, as is often the case, we computer people learned far more, and more fruitfully, from this failure, than we ever would have had it “succeeded”. For example, I am not aware of any formal proof that such-and-such a non-trivial system S is guaranteed free of Byzantine failures, for any system S that is not artificially constructed just for the proof. And that’s thirty years after the papers were published! Conclusions: Lamport and co put their fingers on some things that we just can’t do. Not only that, but they classified a cross-disciplinary problem in a new way. Byzantine failures, as spoken of by Driscoll et al., are a system problem, a mixture of phenomena which have to do with the electronic design, as well as the materials, of which system components are made. Transistors get cracks in them and turn into condensers (a Space Shuttle Byzantine agreement problem). But Lamport et al. turned their efforts to a pure algorithmic problem and published in pure computer-science journals (indeed, the best). Leslie is not a computer scientist who deals with avionics, he is a computer scientist who deals with computer science.

But right on the boundaries also. One of his most insightful (and to my mind, one of the best) pieces of work he ever did was on the collection of issues about arbitration in converting continuous (“analog”) data into discrete (“digital”) data: Buridan’s Principle, whose purely technical contribution rests on a mathematical theorem he proved with Dick Palais, his thesis advisor. You can read Leslie’s account of the odd results of his attempts to publish. He gave it to me sometime in the 1980′s. But since the 1990′s, everyone can know about it and read it at will, because he put it on the WWW. Thank heavens for the WWW!

And that is a point about interdisciplinary work with which I have been struggling now for almost twenty years. One writes a paper on the causal analysis of a computer-related aircraft accident using the Lewis semantics (the Counterfactual Test). One sends it to a computer science journal. Review: “that’s got aeronautics in it, no one in computer science understands aeronautics, better to try an aeronautics journal”. One does. Review: “that’s got logic in it, no one in aeronautics understands logic, better to try a logic journal”. One is not stupid, but if one were, one might try to do so. Anticipated review: “that’s got computer science and aeronautics in it, no one in logic understands computer science and aeronautics, better to try a computer-science-and aeronautics journal.”

And that’s all true and that’s all reasonable. Indeed no one in computer science reads aeronautics journals. No one in aeronautics reads logic journals, and so on. That’s why many engineers working on avionics bus systems still do not know about Byzantine failures, 30 years on.

The result is that most of what I write gets on the WWW and stays there. One can spend one’s time writing, or chasing one’s tail around such publishing conventions, but doing one takes time and effort away from the other, and I prefer writing.

Just to give an indication, one of the pieces of work I performed in the last year of which I am most proud is the analysis of causal explanations of the Concorde accident and assessments of responsibility, which I wrote about in my post Concorde, Ten Years On, Part 2. I see there a series of pressing social and technical issues and their interplay, which people have not satisfactorily come to grips with and I regard that piece as some kind of a start. As I said I’m proud of it. One can’t do that kind of work every day, or at least I can’t. One has to sieze the moment and I did. Actually, that is the way many successful researchers work in math or computer science. Or philosophy, for that matter. You spend most of your time laying some kind of groundwork as best you can, and then you are somehow handed a moment and you sieze it: “I can do that!” and you do. Some more than others.

This wasn’t ever different. Disciplines were partitioned, especially academic disciplines. But one would have thought, as I guessed 15 years ago, that the WWW would make everything different. Mais, plus ça change, plus c’est la même chose.

Some more examples.

I recently organised a Workshop on the Fukushima nuclear accident, inviting largely sociologists and computer-system-safety people. People who read my blog know why I laud the sociologists for their insights into technical matters. When I was thinking we could do this, I asked people about funding. The Scientific Board of CITEC, where I am a PI until November 2012, thought it was a cracking good idea and very relevant and offered financial support. My colleagues at the Centre for Software Reliability in Newcastle upon Tyne, when I called them to apologise that we were withdrawing from their exhibition at the Ada Connection in order to put the money in the Workshop, also offered financial support. Thank you all! And I did approach the German central funding agency for scientific research, the DFG, which had circulated an e-mail saying that in the wake of the tsunami there were instruments available to support cooperative research on the matter in the very short term.

Naive as I am, I took this message literally. I contacted the responsible administrator whose address was on the note. He graciously explained that his “instruments” were limited and didn’t support my workshop idea, and passed my request on to, amongst others, the administrator responsible for the support of engineering research, who replied forthwith in one sentence: “from the point of view of engineering, I don’t see any possibility of support” What? The world has just experienced one of the two most devastating engineering accidents ever, German politicians scrambled over each other to devise our exit from nuclear power, and the prestigious German academic research support agency says it ain’t interested? I put in a carefully worded query asking whether this could really be so, and received no reply.

Now, me being me, I would think they should be ashamed of themselves. But if I said that, I’m sure it would be indicated to me how inappropriate that would be, and really that I don’t understand the formal courtesy structures at play, and so on. Maybe all true. But the fact is that I have an international reputation in accident research, here was a biggie with major political consequences, I invited a bunch of top people to discuss it, they all said yes by return e-mail, and the engineering research support organisation said it wasn’t interested. There is no way around that fact, no matter how pretty the words.

And that illustrates something that I feel is going more and more wrong with academic-type research over the years in which I have been involved with it. I suspect it is particularly acute in Germany. Academic research here after the first degree is performed by scientific employees, by people in temporary jobs. There are no “graduate students” (although that is beginning to change: there are now narrowly-defined graduate colleges which offer competitive scholarships. We have one in Bioinformatics and Genome Research, another in Situated Communication, which I think is now over, and another in Cognitive Interaction Technology, which I think has absorbed it). You want to offer a research topic in an American university, you do so, to all the graduate students, and some one will be attracted to it and come and talk to you. In Germany, you have to apply for funding (mostly from the DFG) for a temporary position to perform the research, then wait for the job applications, and hire someone on the basis of an interview. It’s a lot more work for the faculty member; there isn’t the same personal connection to the bright young people you already know are capable of the work; it’s less flexible (I got three quarters of the way through two other thesis topics before I hit on the one I could finish, and none of them were connected with each other. You can’t do that in a job. Indeed, it took me three jobs!); and I believe the quality of the product suffers (but then, I was at Berkeley. Unfair comparison? Well, no. No German university makes the top fifty in any of the more well-known rankings and I’m talking about possible reasons for that).

Let me amplify a little on that parenthetical comment. I had a colleague here in Bielefeld with over twenty or twenty-five “scientific assistants” in his group, people working at temporary jobs who hoped thereby to get their doctoral degree. At Berkeley, people, even Turing-award winners, had at most four or five doctoral candidates whom they supervised. The key word here is “supervised”. No one person can supervise twenty-five doctoral candidates to anything like the Berkeley norm. Indeed, supervision, such as it was, was mostly delegated to the post-docs. Of which, to achieve the same ratio, one would need five or six or seven (I recall there were three or four). And these doctoral-work supervisors were not Turing-award and like winners, not even NSF Young-Investigator Award winners, such as at Berkeley. They were people who had got their first research qualification and were mostly at the beginning of seeing whether they could make any kind of name for themselves.

A couple of years after I got to Bielefeld, I discovered that somebody in that group had just written a doctoral thesis on temporal reasoning for artifical agents. Temporal reasoning for artificial agents? That’s the very work that I was known for, partly on the basis of which I was hired (here, one is not hired but “called”). This guy had never talked to me. Curious, I looked at the work. After I read the statement of the problem, it was obvious how to solve it. Then I looked at his solution and it wasn’t anywhere near as good. (But there was some program code behind it.) Happens here. Happens quite a lot here. Doesn’t happen at Berkeley, by and large.

I faulted the research structure. The guy had a job, with a job description. He was a nice, friendly and capable guy. At the end of the job was the expectation of a doctoral degree. Which was duly awarded after satisfying the appropriate formal criteria. All very neat and clean. DFG money apparently well spent. But the sum total to the world’s knowledge of how to solve temporal reasoning problems with artificial agents was essentially nil. His energies, and the funding support, would surely have been better spent had he talked to me, and then worked on a problem of the same level of difficulty, but to which the solution was not known.

This is already a lot of anecdotes. But it is hard to see how to get at the point without recounting lots of anecdotes. For each anecdote has its individual answer: it’s a special case; or I misinterpreted; or I was sour at someone; or I’m just being arrogant; or I’m looking for excuses for something I haven’t done or don’t do. Maybe all true, but it is the number of anecdotes, interpreted as the weight of evidence, that persuades reasonable people that there is something to the set-up which encourages all this.

Indeed, I am convinced that the model in which aspiring researchers pick their own topic from amongst those offered, make personal connections with a senior researcher who is able to judge whether they might be capable of completing the work, encouraged indeed required to correspond with more accomplished others who have worked on and solved similar issues, along with the freedom to change topic completely when the current one won’t work out, is a better way to induce productive research than the research-as-job model.

But this heavy structurally-constrained interpretation of what constitutes effective research goes much further. Recall my anecdote about DFG support for my workshop, above. Along those lines, consider the following. I am a Principal Investigation in CITEC (above), whose charter is coming up for renewal and the proposal is about to be submitted. It turns out that the business of saying what my group (essentially of two: me and my post doc) have accomplished and what we will accomplish in the next five-year period was delegated to a young colleague, whose job is supported through the institute through the five-year cycle, as indeed now are all professorial jobs in Germany (tenure has gone) and is thus dependent on the success of the upcoming proposal.

Despite offers to help, my colleague didn’t talk to me at all. Indeed, it took me a certain amount of effort to find out who was writing what about our work in the proposal, since apparently none of the stuff I wrote was going to make it in. He wrote one sentence about the work my group had accomplished over the four years (with apologies that he couldn’t find more). And he found no relevant publications, despite (he indicates) trawling our publications page. Well, during the course of the last few months I have been asked variously for one key publication; for five key publications; for ten publications not necessarily within the CITEC remit, all by various people none of whom are he. The Coordinator of CITEC (effectively the director) asked for a meeting, to explain to me that without any publications it didn’t look good for the proposal to include me.

What? People can’t find stuff I’ve written on the safety of mobile automatic devices in the last five years? Well, of course they can, but you see it doesn’t count. The DFG says peer-reviewed journal articles only.

There we go again. Structural constraints. Nobel-memorial-prize-winning economists, and sociologists, and political scientists, and legal scholars all write blogs. Hundreds and thousands of people read them and comment, including their peers, often in their blogs. Peer-reviewed? Most obviously! I just received a copy of a journal article (counts!) written by two colleagues about two essays (cited) I wrote in this blog. Other colleagues read my posts and they comment!

Another example. We started a mailing list in March 2011 on the Fukushima accident and I recently summarised my contributions, which amounted to 117 A4 pages in 12-pt type, for the workshop proceedings. Now, every word I wrote on that mailing list has been read by eminent colleagues on the list, and they have commented, frankly (it is a closed list). And I have commented on their writing in turn. That’s what you do on such a list, if you’re one of the people who do it (others prefer just to read). Peer review? How much more is it possible to get? And more easily?

The WWW has been pervasive for fifteen years and e-mailing lists for thirty. And there is still no measure of quality of contributions that is acceptable to the German research funding agency? (It is not the only one with such a view.) Astounding! It is not as if this is a hard problem. It would get to be a hard problem if what you want to try to define is The Definitive Measure of Scientific Quality, because there can’t be one. But judging the quality of blog posts or sustained mailing-list contributions is no easier or harder a job than judging the quality of peer-reviewed journal publications, indeed it’s often easier because you can ask more people.

Actually, what happened with the CITEC thing is this. Bernd, Jan and I figured a while ago that our textbook on Safety of Computer-Based Systems, which was been solicited by a major publisher some years ago, would be written and out by now. And we thought one book would likely suffice to show what we’d done. One book is not five published papers; in this case it’s more like fifteen, and there will be more. But it isn’t out. Since it is a text, we need to be sure that the techniques introduced can actually be used by the target audience, students and engineers, and so some of our contributors belong to that target audience (not all textbook writers do this, but I happen to think it’s a very good idea). They are not necessarily as experienced writers as I am, so it simply takes longer than we’d thought. Quite understandable, one would have thought. But apparently there is no reasonable way to say to the DFG that the book is almost finished. (Someone might even want to say that a textbook isn’t research. But this one is, you know, just like Nancy Leveson’s new text. Read that one too!)

Structural constraints, and how they hinder effective support of effective research. Is everyone convinced by now? At least convinced to look at the issue more closely? Shall I stop here then?

Not quite. One more word, back to the original topic. “Interdisciplinary” is one of the buzz words of the new modes of research support. But the problems indicated above of support, publication and assessment of work which crosses traditional discipline boundaries, or the new boundaries left in place by a country’s Scientific Wise Owls and Funding Agencies, are deeper than a buzz word, or even than a buzz concept. The logicians can’t read aeronautics and the aeronauticists can’t read formal logic and the computer scientists don’t understand aerodynamics and the engineers don’t understand the sociologists and I doubt that is going to change rapidly under the hierarchically-directed research-as-job model, buzz word or no.



Certification Requirements for Commercial Airplanes

14 08 2011

I was browsing the invited lectures given under Martin Abadi’s College de France lecture series and came across this elegant, simple explanation of so-called Byzantine failures by the gentleman who invented the term, Leslie Lamport. Leslie’s two papers on the subject with Rob Shostak and Marshall Pease in the early 1980′s, Reaching Agreement in the Presence of Faults and The Byzantine Generals Problem, are seminal. Kevin Driscoll et al.’s SAFECOMP 2003 paper, Byzantine Fault Tolerance: From Theory to Reality, as well as Kevin’s brilliant keynote talk at SAFECOMP 2010, Murphy Was an Optimist (of which the slides seem no longer to be on the WWW) shows how prescient the SRI work was.

I met Leslie at SRI in 1984. Rob had just left, to finish and then sell his PC database SW “Paradox” with Richard Schwarz, starting his second career as a serial entrepreneur. A colleague commented at the time that the market for PC database software seemed already to be saturated, so leaving a good job for that was risky. I guess that’s how some make millions and some don’t! Marshall was still there, was reputed to be quite a successful stock purchaser, but is no longer with us.

Leslie’s Slide 2 shows what appears to be an Airbus A380, computers of some sort issuing pitch control commands (probably primary pitch control; Byzantine failures in the FMGEC software, which includes the autopilot, would not likely be safety-critical). And Slide 4 speaks of an “FAA requirement” that the “probability of catastrophic failure” of an airplane’s computer be less than “10-10 per hour”.

It is common amongst computer scientists who deal with avionics issues to think that the reliability requirement for critical equipment with safety-related behavior is a probabilistic requirement. But it isn’t so. Probabilities of some sort do enter into assessment processes somewhere, but not so directly. It seems to me to be worthwhile to say some words about certification regulations. They can be somewhat abstruse unless you are a certification engineer (even for the regulator! See John Downer’s Trust and Technology: The Social Foundations of Aviation Regulation).

First, an aside about units: they should be “operational hours”, not simply “hours”. Most people probably correctly assume that. Besides, the difference between “operational hour” and “hour” for most commercial airplanes in continual, regular use is probably only a factor of two to four averaged over the service life of the airplane. Still, best to be precise.

Second, there is a figure known as the “10-9 xxxxx” (where “xxxxx” is variously “requirement”, “condition”, “criterion”, depending. I guess this is what Leslie is referring to, rather than a “10-10” criterion. There is a 10-9 criterion in the Accepted Means of Compliance (allied to the qualitative probability “Extremely Improbable”. The general functional safety standard IEC 61508, which does not apply to commercial aviation, although is sometimes used for military systems, is written to regard anything claimed below a reliability level of 10-9 per ophour as unrealistic (Ron Bell, Chair of the Maintenance Team for 61508 Parts 1-2, personal communication. Also, PBL self-communication: I am on the German national committee).

It is possible, though, that there are automotive systems, typically small electronics boxes fitted to many different common models of car, that might well get of the order of 1010 operational hours on them (Mike Ellims, personal communication).

The 10-9 criterion was looked at hard by John Downer, in his PhD thesis at Cornell The Burden of Proof (I don’t think it has been published yet, which is a shame. I have a copy).

So, on to the main theme.

The certification requirements for large airplanes (i.e., all commercial transports) are contained in a document known in Europe as CS-25, the 2003 and subsequent versions of which are available from the EASA WWW site.

First observation. Contrary to what it looks like from Leslie’s slide, the technical requirement for computers or computer behavior is nil. Computers inherit any conditions on failure behavior solely through the requirements on the pieces of kit which they control, in the sense that there are dangerous-failure requirements on the entire subsystem. And the requirements on the pitch control subsystems are purely functional, saying what loads they must also withstand under which conditions, and how they must dynamically behave. (Check them out for yourself here!) No probability, no probability terms, no quantitative probability. So it is misleading to associate any 10x condition with a requirement.

There is, however, an accompanying document to CS-25 called “Acceptable Means of Compliance” (AMC). That is, in order to demonstrate to the satisfaction of the certification authority that subsystem X does this and withstands that (as the certification requires), it is deemed by the authority acceptable to follow the guidance in the AMC. Of course, you can do it some other way also, if you can find one!

This is a notionally subtle but practically significant difference, between what is required and what is accepted as evidence that a requirement is fulfilled. If any system (such as the one Leslie illustrates) brings the airplane into a hazardous or catastrophic state, then it is an airworthiness issue and the problem has to be fixed. Full stop. And that is what is done. However, if the requirement were to be numerical, say “probability of dangerous failure of 1 in 10-9 per operating hour”, then one instance, or two instances, or even twenty instances, of a hazardous or catastrophic state, is/are compatible with that numerical requirement and the problem would not necessarily need to be fixed, since it could be argued that this very small probability had unfortunately been realised way earlier than expected. This difference is significant for lawyers arguing about the distribution of compensation (or “recovery” as they say), and compensation for loss is a universal principle some many thousands of years older than airplanes and their certification.

I note with some embarrassment, however, that IEC 61508 makes “probability of dangerous failure of 1 in 10x per operating hour” into a requirement, suffering the disadvantage I just noted of leaving it open, in the circumstance of a dangerous failure, if the requirement has been met or not. I guess the lawyers can expect some business :-)

Actually, the whole business of what “probability” means in “probability of dangerous failure” is a can of worms. Let me leave that for another time.

AMC uses terms for hazard: Minor, Major, Hazardous and Catastrophic. It also uses terms for probability: Probable, Remote, Extremely Remote, and Extremely Improbable. These are technical terms and when they occur in the requirements they are capitalised. The meaning of “Extremely Improbable” is (historically) “not expected to occur within the service life of the airplane type“, “service life of the airplane type” means here the total number of operational hours of all airplanes of that type throughout the entire use history of the airplane (assuming of course that the airplanes are maintained as designed). The meaning of “Extremely Remote” is “…..once….“; the meaning of “Remote” is “…once per individual aircraft, and several times in the service life of the type“; “Probable” is “…..several times in the life of an individual aircraft“.

These definitions come from previous versions of the certification documentation (when it was known as JAR 25) and may be found in a 1982 book by Lloyd and Tye, Systematic Safety, published by the UK CAA. These definitions will have been applicable directly to the certification of the two most popular airplanes flying today, the Boeing 737 series (certification mid 1960′s) and the Airbus A319/320/321 series (certification mid 1980′s), but not to the certification of, say, the Airbus A380, which is mid 2000′s. So let’s also look at later versions of the document.

The 2003 AMC-25 uses the terms for subsystem compliance, for example AMC 25-19 §6(c) says

(3) Extremely Improbable Failure Conditions: Extremely Improbable Failure Conditions are those so unlikely that they are not anticipated to occur during the entire operational life of all aeroplanes of one type, and have a probability of the order of 1 x 10–9 or less. Catastrophic Failure Conditions must be shown to be Extremely Improbable.

We see that in the current certification document the qualitative terms are firmly bound to quantitative probability statements.

The reason for this change is that, in the days of Lloyd and Tye, someone did a back-of-envelope calculation and figured that “service life of the airplane type” could be expected to be somewhat less than ten million hours. It was then! But, for example, Airbus’s safety chief, Yannick Malinge, when giving evidence to a Subcommittee of the Brazilian Parliament in August 2009, pointed out that the A320 fleet had at that time some 55 million operational hours or more (if I remember correctly. I also did a crude calculation of my own then, based on a guess at operational hours per year for a typical model, a uniform build rate since service introduction in 1988, and 25-year service life of an individual airplane, and came up with a similar figure). So for modern purposes that pre-1980′s back-of-envelope calculation is at least an order of magnitude too low.

Then, following on with the reasoning as in Lloyd and Tye, people apparently thought there would be about 100 airplane subsystems which could be a single point of catastrophic failure, and so the condition that no single-point catastrophic failure should occur in the service life is 1 in 10 million (1 in 107) divided through 100 airplane systems, so one in one billion per airplane system, leading to an average “probability” over the service life of 1 in 10-9 per operational hour.

Anyhow, that is where the 10-9 condition comes from, and nowadays the qualitative term is directly anchored to it, to avoid any calculations over expected fleet lives, since the actual fleet lives have proved to be rather different from that expected at certification time. Nobody expected they were going to sell going on for ten thousand airplanes of these types, but that is what it looks like might happen now!

And there is nothing in the AMC about reliability of computers. There are things about reliability of systems which are driven by computers, for example displays, AMC 25-11 §4(3)(i):

(i) Attitude. Display of attitude in the cockpit is a critical function. Loss of all attitude display, including standby attitude, is a critical failure and must be Extremely Improbable. Loss of primary attitude display for both pilots must be Improbable. Display of hazardously misleading roll or pitch attitude simultaneously on the primary attitude displays for both pilots must be Extremely Improbable.

So that’s what the regulations say and the acceptable means of compliance suggest you do. For insight into how this works out in practice, read John Downer!

I offer here many heartfelt thanks to Clive Leyman, quondam Chief Aerodynamicist of Concorde, who did his best to put me straight on all this over the last few years (I hope he thinks he succeeded!)



Concorde, Ten Years On, Part 2

9 12 2010

The Concorde accident to F-BTSC on 25 July 2000 is about as well understood as to causes as any accident can be. There is also, unusually, a more or less linear connection of causes from an exceptionally rare event: the deposition of a particularly hard and sharp strip of metal, which shouldn’t have been mounted in the first place exactly because of such possibilities, on exactly the part of the runway at which Concorde’s tires bear the greatest load – and the aircraft indeed running over it, and it’s not a big strip. The Concorde’s ground run goes up to just about 200 kts at rotation, I understand, compared with that of a Boeing 747 at about 160 kts. Furthermore, the delta wing generates some negative load, putting even more weight on the tires, at rotation, before it changes to positive and the aircraft lifts off. The sequence of events that then ensued was, as far as I know, not anticipated by anyone in the development or certification or analysis of the aircraft. To my mind, it is hard to see how it could have been. To me, this is a freak accident, the «not expected to occur during the operational lifetime of the aircraft», which is the strictest category of likelihood contemplated in civil aeronautical certification.

But some differ, for example Tom Ferrell in this note to the York Safety-Critical Systems Mailing List. Tom thinks the accident had precursors, which showed, in advance of the accident, that

Regardless of causal agent, the Concorde was susceptible to severe damage from a relatively common occurrence.

He means there had been tire burst incidents, which indicated problems with the design. So is this just a matter of personal taste, say, like wine? Ladkin tastes “freak” and Ferrell tastes “foreseeable” in the same glass, and that’s it? Or is there, as I would prefer to believe, an objective way of evaluating the views, such that one can be shown to be right (or more accurate) and the other wrong (or misleading) in some way?

I think it is partly a matter of what you lump together, and what you don’t. Do you lump together all tire bursts, including this one, and all damage, including this damage, or don’t you? Is this lumping arbitrary, a matter of individual perception? I don’t think so. I think there are objective principles, on which so far I have only an intuitive handle.

How to indicate these principles? I try to show them here by means of a hypothetical cross-examination of Ferrell’s claim. Here goes.

M’lud, regardless of causal agent, the Concorde aircraft was susceptible to severe damage from a relatively common occurrence.

I see, thank you, counsel. What was that common occurrence?

A burst tire, m’lud.

Thank you, counsel. And what was that severe damage?

A 32cm square hole in the lower wing skin, m’lud, which also served as the fuel tank skin.

I see. Had that ever happened before in the history of the airplane?

No, m’lud.

You say “susceptible”. Had damage ever occurred to the lower wing skin, except in this case?

Six times, m’lud.

And how many times was that due to your “common occurrence“, a tire burst?

The lower wing skin was punctured on five occasions when a tire burst, m’lud.

But that is not what I asked you, counsel. I asked you in which of these events the damage to the lower wing skin was due to the tire burst.

It is supposed, three times, m’lud.

You say “supposed“, counsel. Why so?

As far as we know, in those cases, m’lud, the damage sequence was causally initiated by a tire burst. It is conceivable, although very unlikely, that a contemporary but independent damaging event caused the lower-wing-skin penetration, but there was no evidence for that.

I see. Thank you for your care in phrasing this, counsel. And what were the two other events?

In one, on 29 January 1988, the tie bolts holding the two wheel halves together sheared, and in the resulting sequence one of the bolts penetrated the Number 7 tank, leaving a half-inch hole. In the other, on 15 July 1993, there was a braking-system jam, and the Number 8 tank was punctured as a result of the damage sequence.

So, if I understand you, counsel, you tell me that, before the fatal accident at Gonesse, three times it had occurred that the lower wing skin was punctured due to your “common occurrence“, a tire burst.

Yes, m’lud.

And how many years did the Concorde fly in service before the Gonesse accident?

Just over 24 years, m’lud. The first revenue flight was 24 May, 1976.

And how many flight cycles?

About 84,000, m’lud.

I see. That is quite a long time. And, to me, quite a large number of flights, although of course by no means so large as with most aircraft in commercial use nowadays. So are those three occasions a lot or a little, counsel?

With respect, m’lud, I offer no opinion on that question.

So there are these “common occurrences“, which had occurred – how many times, counsel?

Aviation Safety Network has a record of 55 occasions after service introduction in which tires burst, m’lud.

Common enough, I suppose. And these common occurrences had caused damage other than to the tire on – how many occasions, counsel?

Aviation Safety Network has a record of 28 occasions on which other damage occurred, m’lud.

Does that include the two above in which the damage was not initiated by a tire burst, counsel?

Yes, m’lud.

So there were 26 occasions on which, as far as we know, a tire burst initiated damage to other parts of the aircraft?

Yes, m’lud.

So I think you have established, counsel, that a common occurrence, a tire burst, could cause damage, and thus that the aircraft was susceptible to damage from this common occurrence. But you want to establish more than that, don’t you, counsel. You wish to say that the aircraft was susceptible to severe damage.

Yes, m’lud.

Is “severe damage” a technical term used in aviation, counsel?

No, m’lud.

So it is your term, counsel. What do you mean by it?

I mean that the safety of the flight is affected by the damage, m’lud.

Thank you, counsel. Is there any similar term used in aviation?

The U.S. National Transportation Safety Board Part 830 defines an “incident” to be an occurrence other than an accident, associated with the operation of an aircraft which affects or could affect the safety of operations. The same regulation defines an “accident” to be an occurrence [associated with the operation of an aircraft] in which any person suffers death or serious injury, or in which the aircraft receives substantial damage.

I see. Is there a definition of “substantial damage“, counsel?

Yes, m’lud. “…..damage or failure which adversely affects the structural strength, performance, or flight characteristics of the aircraft, and which would normally require major repair or replacement of the affected component. Engine failure or damage limited to an engine if only one engine fails or is damaged, bent fairings or cowling, dented skin, small punctured holes in the skin or fabric, ground damage to rotor or propeller blades, and damage to landing gear, wheels, tires, flaps, engine accessories, brakes, or wingtips are not considered ‘substantial damage’ for the purpose of this part.” This definition is similar to other definitions of significant damage, used in definitions of accidents and incidents in, say, the International Civil Aviation Organisation Annex 13, which defines reporting requirements for its member states.

Thank you, counsel. And in which of those 26 tire-burst incidents you enumerated above was “substantial damage“, according to this definition, incurred?

In the incident at Washington Dulles airport on 14 June 1979, m’lud. The performance of the aircraft was affected in that fuel was lost through the debris penetrations of the tank at a rate of up to 4 kg per second. It was unable to continue its flight to London. The aircraft lost 7 tonnes of fuel before it landed again at Washington Dulles.

And in others, counsel?

In no others, according to the definition, m’lud.

I see. Are there incidents in which a fuel tank was penetrated, in which the performance of the aircraft, its structural strength, or its flight characteristics were not substantially affected?

Yes, m’lud. On 29 January 1988, the incident in which the wheel-half tie-bolts broke and a bolt punctured the tank on take-off from London, the flight continued to its destination, New York.

I see. How large was this puncture?

The hole was half an inch, so about 1.3 cm, in diameter, m’lud.

So it appears that a puncture in a fuel tank, even a fairly large hole, does not necessarily count as “substantial damage“?

No, m’lud, it does not necessarily count so.

Are there any other common technical meanings of “severe damage” or “significant damage” which we might want to consider, counsel?

I think so, m’lud. For example, damage which could affect the safety of flight, the definition I suggested.

Could affect“, counsel, or “does affect“? For example, during the 29 January 1988, was the safety of the flight affected?

Apparently no, m’lud.

Was the safety of flight affected in any of the other tank-penetration incidents besides the 14 June 1979 incident at Washington Dulles?

I don’t believe so, m’lud.

Could it have been?

I believe so, m’lud.

How?

Maybe fuel streaming from a hole can catch fire when it meets engine exhaust, m’lud.

I see. Does it commonly do so, counsel? Do you know of any other incident in commercial aviation when fuel streaming from a smallish hole, such as this, caught fire?

Actually, m’lud, I don’t.

Are there any other ways in which safety of the flight could be affected by such a leak?

When the aircraft lands, m’lud, the brakes heat up, and leaking fuel could fall onto hot brakes and catch fire.

Has this happened, counsel?

Yes, m’lud.

Are there ways to prevent it happening?

Yes, m’lud. If a crew knows they have a leak – and if the leak is substantial you can usually see the stream behind the wing from the rear passenger seats during flight – then they can have fire services meet the plane on landing and cover the brakes and ground under the leak with fire-suppresant foam. This mostly suffices.

Thank you, counsel. So igniting this fuel is a event for which there exist known and effective countermeasures.

Yes, m’lud.

So although such an event “could affect” the safety of flight, it mostly doesn’t do so.

It appears not, m’lud.

So it appears that penetrations of the fuel tank in themselves do not count as “substantial damage“, and they do not necessarily count as damage which affects the safety of the flight. But they might count as events which could affect the safety of flight if we are sufficiently imaginative in devising scenarios.

It seems so, m’lud.

Let us see how imaginative I may be. As far as I understand quantum mechanics, atomic particles may engage in random motion, that is, displacement of position without apparent cause.

As far as I also understand quantum mechanics, m’lud, that is so.

So it could be, counsel, that all the atomic particles in a Concorde translate 4 meters to the left all at the same time, leaving the passengers sitting, well, somewhere in space outside the fuselage.

I suppose it could be, m’lud.

And those passengers would probably fall to the ground and injure themselves or die.

I suppose so, m’lud.

So it could be, counsel, that the Concorde, indeed any aircraft, suddenly leaves its passengers sitting outside the airframe, leading to serious injury or death.

I suppose so, m’lud.

I am, counsel, as you see, sufficiently imaginative in devising scenarios. You have presented me with two partially overlapping definitions of significant damage, of which the second is indeterminate between “could be” and “is“. I don’t find the “could be” interpretation very helpful, as you see, because I am, as you also see, sufficiently imaginative. And I don’t think any objective safety property of a commercial airplane should depend so heavily on my sufficient imagination. So I am going to interpret “severe damage” as meaning damage which is either substantial in the sense of NTSB rule 830 or which does (not “could” but “does“) affect the safety of flight.

Yes, m’lud.

On which occasions, then, did your “common occurrence“, a tire burst, initiate a causal sequence in which severe damage resulted?

On 14 July 1979 at Washington Dulles, m’lud, and on 25 July 2000 resulting in the crash in Gonesse.

The damage which resulted in the Gonesse crash was then, by definition, substantial, as well as severe, wasn’t it, counsel.

Yes, m’lud.

So, since this severe damage actually happened on that occasion, we can say that, even before this occurred, the aircraft was susceptible to exactly this severe damage, in the sense that, since it did happen, it follows that the aircraft was susceptible to its happening, simply through the usual meaning of the word “susceptible“.

Yes, m’lud, that is what I claim.

Let’s look a little closer at this word “susceptible“. There are some people who claim that human beings spontaneously ignite. Not often, but occasionally. All that is left is ashes. If that is true, and I believe that this is a very, very big “if“, then human beings are “susceptible to spontaneous combustion” aren’t they, counsel?

Yes, m’lud. But I share your scepticism of the phenomenon.

The point, counsel, is this. We know whether or not human beings are susceptible to spontaneous combustion only in so far as we know actual examples of human beings spontaneously combusting.

It seems so, m’lud.

And, further, let us suppose that there are certain circumstances C in which human beings spontaneously combust, and if those circumstances do not obtain, then they don’t. Then, surely, we are obliged, by virtue of not wishing to mislead our fellow men and women, to say that human beings are susceptible to spontaneous combustion in circumstances C and to indicate that, if circumstances C do not obtain, there is nothing to worry about.

That seems to me reasonable, m’lud.

And I take it that you do not wish to mislead me, counsel!

Certainly not, m’lud!

Then when you claim that the Concorde is “susceptible to severe damage [resulting from] a common occurrence“, which we have more or less agreed is a phrase which may be able to describe the Concorde aircraft, I now want to know if there are any circumstances C which you should be telling me about, under which severe damage resulting from your common occurrence, a burst tire, may be realised. Please note the condition: you are to tell me about circumstances C in which, if they obtain, the accident sequence results, and for which, if circumstances C do not obtain, the accident sequence does not result.

Yes, m’lud. The accident sequence is as follows. A titanium strip lay edge-on on the runway at or near the rotation point of the Concorde. It cut sharply into a tire, causing a tire burst resulting in at least two chunks of tire of size approximately 4.5 kg. It is presumed that one of these expelled chunks impacted the lower wing skin, the skin of one of the fuel tanks, causing a shock wave which blew out the fuel tank skin from inside, near the impact point of the tire chunk , resulting in a hole of size about 32 cm square being formed in the fuel tank, and of course fuel streaming out. The fuel ignited, and burned from very near the fuel tank hole, causing varying loss of thrust in two engines, as a result of which the aircraft was unable to attain positive-rate-of-climb flying speed, and was also subject to thermal damage from the fire under the left wing. As the damage progressed, control was lost and the aircraft crashed in Gonesse.

Thank you, counsel. How do we know that the loss of thrust rendered the aircraft unable to attain the appropriate flying speed?

That is elementary aerodynamics of Concorde, m’lud, and is not disputed.

Thank you. How do we know that the fire caused loss of thrust?

Calculations show that, if air is ingested into the engine intakes at a temperature approximating that of the burning fuel, thrust is lost at more or less the observed and recorded rate.

Thank you. How do we know the fire was present at exactly the unfortunate point to be ingested?

Photographs of the accident, m’lud.

Thank you.

How did the fire attain the state in which it was photographed?

We don’t know, m’lud. We would expect a fuel fire to start when it has been ignited by hot gases from the turbine engines, behind the engines, which of course were in reheat at the time. As far as anyone knows, the front of such a flame cannot travel relatively forward at the speed at which the aircraft was travelling.

So we would expect the fire to remain behind the wing structure, behind the engine exhaust?

Indeed so, m’lud.

But this fire didn’t. Its front came forward underneath the wing, and you have indicated we do not know why.

That is correct, m’lud. There is speculation that it might have been ignited by an electrical spark from some wiring in the undercarriage bay.

Do we know that, counsel?

No, we do not know, m’lud. It is speculation, because we cannot otherwise understand how the flame front came forward under the wing.

Thank you. So we basically do not know the causal sequence between fuel released from the tank and the the engines consequently operating at reduced power.

It seems so, m’lud. We do not know why the flame front moved forward.

But of course there would have been no flame front had fuel not been streaming out of a hole.

Indeed so, m’lud.

How big was the hole?

About 32 cm square, m’lud.

That is a big hole! Did such holes occur during any other of your “common events“, counsel?

No, m’lud. The largest was 1 inch x 1.5 inches, caused by metal debris on 15 November 1985. The second largest was the hole of size 0.5 inch diameter on 20 January 1988, which event we have already mentioned.

So this hole, in the Gonesse accident, was 160 times larger than the largest hole which had previously been caused, and 790 times larger than the second-largest hole which had previously been caused. That is an enormous difference, counsel! Why is that?

The hole in the Gonesse accident, m’lud, was not caused through tank penetration by debris, but through shock-wave convergence punching a hole through the tank skin from inside.

That is, if I understand you, counsel, a much larger hole, two to maybe three orders of magnitude larger than any that had previously occurred, made by a completely different mechanism.

That appears to be so, m’lud.

And are such kinds of events, reminding me of your phrasing “common occurrence“, counsel, common in commercial aviation?

No, m’lud. This occurrence of the phenomenon is unique in the history of civil aviation as far as we know.

Thank you for your frank answer, counsel. But people knew about this phenomenon, did they?

Military engineers knew of the phenomenon from battle-damage studies, m’lud. It is not clear if any engineer in civil aviation, if anyone involved with civil aviation, knew of this phenomenon before the Gonesse accident. After the accident, military engineers informed the accident investigators of what they knew.

And what of the tire pieces that caused this phenomenon?

It was shown by experiment, m’lud, that a piece of rubber weighing about 4.8 kg and travelling at a relative speed of about 120 m/s, that is something over 300 mph, which could in theory occur due to a Concorde tire bursting at the point in the take-off sequence at which it did, could trigger the shock-wave phenomenon with a proportionate loss of tank skin.

And you have said that two chunks of tire of about that size were found amongst the runway debris.

That is correct, m’lud.

Could any other phenomenon of which we know, say a tank penetration by debris consequent to a “common” tire burst (I use your phrasing), cause the release of a 32 cm square piece of the fuel tank wall?

Not that we know of, m’lud, no.

So we have only one explanation to hand of the known size of the hole?

That is correct, m’lud.

And this explanation, this phenomenon, is otherwise unknown in the history of civil aviation.

That seems to be so, m’lud.

Thank you. If I understand you, this phenomenon was triggered by the impact of a chunk of tire of about 4.5 kg or so?

As far as we can tell, m’lud.

And chunks of this size, of tire pieces or indeed of other material, are frequent, or usual, during your “common occurrences“, tire bursts?

Actually no, m’lud, they are not.

I see. Have they otherwise occurred in any of the tire-burst events, counsel?

Actually, m’lud, they have not.

That is, they are unique to the Gonesse accident?

It appears so, m’lud.

How did they occur?

The tire was apparently cut by a titanium strip lying on the runway near the rotation point of the aircraft, m’lud.

I see. Pieces of metal left lying on the runway cut Concorde tires into 4.5 kg chunks, apparently?

Not any pieces of metal, m’lud, according to experiments undertaken after the accident. Titanium. Titanium is unusually hard. Other metals just crush when the tires run over them.

I see. But titanium strips are to be found lying on runways every so often, I take it?

Actually no, m’lud. This is the only recorded instance ever of a sharp titanium foreign object lying on a runway with commercial operations. Of course we don’t know about the military, since they do not share their records.

Why would that be, counsel, that this is the only instance?

One reason, m’lud, might be poor record-keeping. Another reason might be that titanium is not used on aircraft in places in which it might fall off on a runway.

Oh! So why did it happen here, counsel?

A mistake, m’lud.

I imagine a very, very rare mistake, counsel?

Yes, m’lud. As I mentioned, this is the only recorded instance of such debris lying on a runway at a commercial airport.

So, if I understand you, counsel, the shock-wave phenomenon can only happen via large chunks of debris, and the only way in which large chunks of debris from a burst tire have been known to occur is in this very accident, through cutting by a titanium strip, debris of which there has not been another recorded instance, in part because the use of titanium in a way in which it might separate from the aircraft during take off or landing is proscribed?

That seems to be so, m’lud.

So the circumstances C in which your “common occurrence” can lead to “severe damage” are, as far as we can tell:
(a) a flame front in the streaming fuel from an unusually large hole “moving forward” through an unknown mechanism to burn under the wing, in front of the engine intakes;
(b) a fluid shock wave punching out the unusually large portion of the fuel tank wall to create the hole;
(c) an unusually large chunk of debris creating the shock-wave sufficient to punch out the unusually large hole;
(d) this unusually large chunk indeed impacting the fuel tank wall, rather then being ejected in another direction;
(e) a titanium metal strip lying on the runway near enough to the rotation point to cut a tire which happens to run over it into suitably, unusually large chunks.

That seems to be so, m’lud.

I conclude, counsel, in the words of your claim, that the Concorde aircraft is susceptible to severe damage resulting from a common occurrence (tire burst) under the circumstances (a)-(e) just elaborated. And that, in order not to mislead us, your claim should include the supplementary phrase “under the circumstances (a)-(e) just elaborated“. If you agree to include that wording, counsel, I shall grant your claim. If not, I shall reject it. Accordingly, do you wish to remain with your original wording, or to amend it?

Counsel’s reply is not recorded because the recorder had used up its batteries.



Concorde, Ten Years On

6 12 2010

I understand that Simon Foreman observed at a meeting of the RAeS Law Group on 28 April this year on the criminalisation of aviation accidents, reported here in Flight International by David Learmount, that the French legal system does not have a mechanism of the English legal system, the inquest, to determine what went on in an accident. It seems to follow that, in France, for the state to determine what indeed went on in an incident of public interest, there must be a criminal trial.

First point: there at least two reasons for society to determine what went on. The first is to prevent a recurrence. This is the reason for the ICAO-mandated accident investigation bodies, here the BEA. They have long done their job.

The second reason is to apportion responsibility for compensation, an age-old and widespread human activity. Concerning this second reason, it’s a shame for all that France doesn’t have inquests. I imagine many French people might agree. It is particularly harsh for one person by the name of John Taylor.

Second: why an inquest?

Amongst other things , the results of an inquest help figure out who should ultimately pay. There is an ancient general principle of compensating victims of mishaps and this should not only follow rules but also be seen to be “fair”, adjudicating amongst competing claims, and that is what an inquest does.

Some commentators, including the BBC in their report, have spoken of “gaining closure” for the victims’ families. This notion is a US import to Europe and not one with which I sympathise, even when I was living in the US. I don’t sympathise with it, in part because it gives cover to seeking revenge, an activity of which I expressly do not approve in the case of accidents.

In particular, an inquest is not a criminal trial. It doesn’t punish anyone. It assigns cause.

Third: some have speculated, as in this note on PPRuNe that this will be a bonanza for tort lawyers.

If this follows the time scale of most major commercial airline accidents, seeking compensation for victims’ families will be mostly over by now. The airline (that is, the airline’s insurance company) will have already paid to settle most or all tort claims, as is by now the general practice in commercial aviation. The cost is reported to be in the realm of €100 million.

Fourth, the ruling is reported to contain the following apportionment: Continental 70%, EADS 30%, everyone else (Air France, DGAC, Paris Airports Authority, etc) 0%. That means that the insurance company will be negotiating with those parties to recover the relevant proportion of its costs. Since there is now a legal ruling which will act as precedent, there would be little point in disputing it in court.

So that will settle the compensation bit.

Fifth, what is this ruling based on?

The ruling is based on the obvious physical ABC of the accident occurrence.

The report said: titanium strip fell off Continental onto the runway; Concorde ran over strip; strip sliced into tire and caused tire burst of unprecedented form and strength; large tire fragment hit tank; impact shock wave caused tank to explode from within; resulting hole allowed fuel to stream out in large quantity; fuel was ignited (not completely sure how, but probably by reheat); fire engulfed critical wing structure and contributed to critical performance degradation of two engines; Concorde cannot accelerate after TO on two engines alone (BTW, there is no evidence that Concorde was overweight at TO) and went down.

That’s what the court found also, as far as I understand the verdict (not yet having read it :-) ).

People have said “missing spacer“. Our work on that said: not causally relevant.

People have said “overweight at dispatch“. Maybe, but not at takeoff, as far as anyone can tell.

People have said “airport should have swept runway better“. Maybe , but that wasn’t a direct contributing cause in the intuitive sense of the above sequence of physical events. It would be like blaming the police for Fred’s broken jaw in a street fight because they weren’t around at the time. Thousands of years of legal tradition says the person responsible for the broken jaw is the person who threw the punch. So here: the court said that the entity responsible for the burst tire is the entity that left the titanium strip on the runway; and further, as I understand it, the person who mounted that titanium instead of an aluminium part (presumably because he was judged to have made a professional error: he should have known to mount a softer metal); as well as, to some degree, the people responsible for the aircraft design, even though (and others will agree with me here loudly) the airplane was a triumph of aeronautical technology, as well as the most beautiful artifact ever to have taken to the skies.

Other people (Continental, apparently) said the plane was on fire before it encountered the strip. The report, as well as all of the people I know who know about Concorde, indeed, physical common sense given the undisputed evidence of what happened, have no explanation at all of how that could possibly have been the case. The evidence presented is circumstantial – eye witness testimony from witnesses who were some way away from the scene. There is no physical explanation of the accident which coheres with that testimony at all, after ten years of thinking about it. I take it that that eye-witness testimony was rejected.

Now, that all seems to me, given the system, appropriate, fair, and straightforward.

What is inappropriate, in the minds of many including myself, is that it seems to need a criminal trial, rather than an inquest, to serve this necessary legal function of apportioning the enormous costs of compensation.

It seems to be particularly inappropriate in the case of poor Mr. Taylor. He repaired an airplane. Imagine a wise-owl supervisor, or some angel with perfect foresight, going up and saying “You can’t mount that there! It might fall off in Paris, and Concorde might run over it and lose a huge chunk of tire which causes a fuel tank to explode and dump fuel into the exhaust and lose power and crash!” and him saying “oh, yes, you’re right” and changing it, as it was in Dickens’ Christmas Carol.

Dickens notwithstanding, to English minds there just doesn’t seem sufficient proximity between act and event to justify a criminal-negligence connection. Dickens’ tale was, after all, a Carol. But there he is now, poor chap, with a criminal record, and a 15-month suspended sentence. Mr. Taylor, on behalf of many, probably most, Europeans, I am very, very sorry!

And that is why people are going to shut up and instruct their lawyers, rather than telling accident investigators all about everything they know, if accidents continue to be criminalised. Just as they are already known to do in rail accidents in Germany, for example.



Simulators and Veridicality in Airline Training and Pilot Currency Checks

9 09 2010

In his note in RISKS-26.15, Peter Wayner refers to the article Simulator training flaws tied to airline crashes in USA Today, 31 August 2010 (WWW version), which claims to have shown that «Flaws in flight simulator training helped trigger some of the worst airline accidents in the past decade» and that «More than half of the 522 fatalities in U.S. airline accidents since 2000 have been linked to problems with simulators».

I like to think I keep well up to date with commercial aircraft accidents, their analyses and causes, and am aware of simulator strengths and weaknesses. This suggestion struck me as somewhat thin. But if one reads the sentences literally, with their main verbs “helped trigger” and “have been linked to“, they do not speak of causes or causal factors. I can “help trigger” an accident if some USA Today journalist is so enraged by reading this note on hisher Blackberry that heshe runs a red light. And I can link USA Today with whom I wish simply by mentioning them in the same sentence in a Risks note. I am sure the newspaper intends stronger links than this, but it would be good to know what and how, and the article gives no clue. The NTSB uses the words “probable cause” and “contributing factors” in their conclusions and these terms have more precise meanings.

The article mentions three accidents: the November 12, 2001 American Airlines Airbus A300-600 loss of control on climb-out from New York; the December 20, 2008 Continental Airlines Boeing 737-500 takeoff loss of directional control at Denver; andthe February 12, 2009 Colgan Air Bombardier Q400 loss of control on approach to landing at Buffalo. The abstracts and links to the full reports are to be found on the NTSB WWW site as, respectively, DCA02MA001, NTSB Abstract AAR-10/04 and NTSB Abstract AAR-10/01. I invite readers to take a quick look at these very short synopses. These three accidents total 315 deaths and the USA Today article does not say which other accidents it counts.

Only the Denver accident causes and factors specifically mention simulators. The pilot flying lost directional control of the aircraft on the runway during takeoff, because of very high gusting crosswinds. The gust “exceeded the captain’s training and experience”, and according to the NTSB he failed effectively to use rudder to control the aircraft in the gust. The first contributing factor allows us to conclude that the crew did not receive timely and accurate information on the actual wind strength and direction. The second contributing factor is “inadequate crosswind training in the airline industry due to deficient simulator wind gust modeling“.

It is widely accepted in the industry that the most recurrent feature of most large-airplane commercial air accidents worldwide in the last few years has been loss of control. It used to be controlled flight into terrain, but it is now widely accepted that the Ground Proximity Warning System (GPWS) and its version Enhanced by terrain mapping using GPS and terrain maps (EGPWS) have reduced the incidence of such accidents considerably (although they still occur, as to an Airblue Airbus A321 on approach to Islamabad on 28 July, 2010 – see the Aviation Safety Net brief report).

The 2001 American Airlines accident was loss of control because of structural failure: the vertical fin separated from the aircraft. The NTSB found that the pilot flying had caused that separation by overstressing it through “rudder reversal” control inputs; contributing were the rudder control system design of Airbus, and American Airlines Advanced Aircraft Maneuvering [sic] Program AAMP. The NTSB heard both that AAMP discussed use of rudder to help recover from upsets, and that the FAA, Airbus and Boeing had expressed concern about this in a letter to American Airlines four years before. The pilot flying had been observed on a previous flight using rudder to control unwanted aircraft movement from environmental disturbance, and the captain on that flight, who gave evidence to the inquiry, had discussed it with him then. I refer Risks readers interested in more to the report, as well as to my paper The Crash of AA587: A Guide. The AAMP does involve simulator work, but a simulator cannot be known accurately to represent what would happen during unusual piloting rudder-reversal behavior because, well, until the accident nobody knew at what point airframe structure would fail (it turned out to be some one-third stronger than required by certification regulations)!

The pilot flying the Colgan Air accident aircraft reacted inappropriately to a stall warning, by pulling on the stick, and holding it back against the attempts of the automatic “stick pusher” system to push it forward. This resulted in the aircraft stalling at low altitude. Pushing the stick forward is the appropriate response. There was considerable discussion of the pilot’s aptitude, his level of awareness (relating to possible fatigue), and his overall Q400 training at Colgan Air. The NTSB remarked on features of that airline’s training program, which of course involves simulator work. But I don’t think it would be appropriate to conclude that there is anything much wrong with the simulators themselves.

Simulators do not necessarily accurately represent the behavior of aircraft close to the “edge” of their “flight envelope”, and they cannot be taken to do so for flight outside the envelope. Aerodynamicists study these “out of envelope” characteristics by use of wind tunnel models, but actual aircraft are not flown in flight test “out of envelope” except for certain restricted manoeuvres prescribed in the certification regulations (such as flying at “maximum operating airspeed” and initiating a 7.5° nose-down dive for 20 seconds, to mimic an overspeed excursion from cruise). For most “out of envelope” flight, aerodynamicists can make very well-educated guesses (from their wind-tunnel modelling) as to what might happen, but they are the first people to say that they are not at all certain. Nobody goes out to flight-test Boeing 747 aircraft in partially-inverted almost-vertical semi-spins, such as what happened to a China Air Lines Boeing 747 over the Pacific near San Francisco in 1985 (see the digitised version of the NTSB accident report in the entry in our Compendium. Incidentally, the human factors chair on this investigation tells me this was a watershed event for the investigation of human biorhythms and possible fatigue as potential contributors to accidents).

So there are limits to what simulators can achieve, and it is a matter for research how much “out of envelope” behavior can be usefully and veridically simulated. Since loss of control is now prominent amongst probable causal factors of accidents, it seems to me obviously worthwhile to perform this research. Where it will lead is anybody’s guess, as with most research. However, the NTSB’s concern in the Denver report is with situations that could be veridically modelled in flight simulators but currently are not. That could be, and probably should be, fixed.



Fully-Automatic Execution of Critical Manoeuvres in Airline Flying

3 09 2010

David Learmount’s semi-annual review of commercial air accidents has just appeared in Flight International (3-9 August, p34). There were three accidents to high-performance large commercial passenger jets: (1) a Ethiopian Airways Boeing 737-800 took off from Beirut over the sea at night and ended up in the ocean (25 January); (2) an Afriqiyah Airways Airbus A330-200 impacted the ground violently on approach to Tripoli’s RWY 9 (12 May); (3) an Air India Express Boeing 737-800 overran the runway at Mangalore (22 May). Recently, not included in David’s survey, (4) an Airblue Airbus A321 impacted high terrain while on approach to Islamabad (28 July); (5) an AIRES Colombia Boeing 737 landed and broke up on RWY 6 of San Andres Island (16 August); (6) an Embraer 190 of Henan Airlines impacted short of the runway and broke up on approach to Yichun (28 August).

Taking a random six months of accidents is not a sample conducive to pointing to trends using statistical methods; it is well-known amongst students of commercial air accidents that there are “fashions”, common features which cluster at a certain time, but which then reduce, without anybody necessarily doing anything much different. However, let us start here with the question that is the theme of this note:

Which of these accidents would likely have been avoided had the aircraft been fully automatically controlled?

Unmanned aircraft such as the military Global Hawk reconnaisance aircraft routinely fly complete missions under automatic control, from full stop to full stop. Other unmanned aircraft, such as the Predator «drones» used by the US Military in Afghanistan, and for US southern-border patrol, are remotely piloted, but have had control problems with the remote-piloting regime, as for example in this analysis of a US southern-border accident by Johnson and Shea. I want to emphasise here that we are indeed in the era in which fully automated long-distance flights are routinely flown (if only at present by the US Military, and, soon, other NATO allies with the Euro Hawk).

(1)Ethiopian had taken off into a «black hole» over the ocean at night, in other words into an environment in which there were no outside visual references whatsoever. The aircraft was performing a climbing turn, when it started to descend and disappeared from radar. There were electrical storms in the vicinity. The causes are not yet known, but certain factors have been proposed as hypotheses. The accident is almost certainly loss of control (LOC): no one presumes that the pilots committed suicide/murder. First, spatial disorientation of the pilots. This is a historical factor in the records of accidents in night takeoffs and landings in «black holes», such as over oceans. Second, a weather-related upset, say windshear of some kind causing loss of control (LOC). Such phenomena are also known historical factors. It is understood that no technical defects have been yet identified, but I also understand that the investigation is not yet complete.

If spatial disorientation of the pilots had been a causal factor, this would have been avoided by full automatic control of the takeoff and after-takeoff manoeuvring

(2)Afriqiyah was approaching RWY 9 at Tripoli, in clear weather but with reported «low, hazy visibility» (Learmount, op. cit.). «Information from the FDR and CVR indicates that there were no technical faults on the aircraft and fuel starvation was not an issue» (Learmount, op. cit.). Aviation Herald confirms this in its report, see in particular the update from the investigator’s information on 14 August. It impacted the ground heavily (even violently), some vertical distance below the approach path, indicating a high rate of descent. The impact was about 900m from the runway, according to Aviation Safety net’s report. The ground in the area of the airport is more or less flat. Although the VOR was NOTAMed unreliable, there is an NDB approach to RWY 9. The aircraft is capable through GPS equipment and NDB reference of constructing a «Continuous Descent Approach» (CDA) path, which gives a more-or-less constant rate or angle of descent to the point of touchdown, constructed by the Flight Management Systems using the exterior navigation aids, and it would have been able to do that at this airport at this time, as far as is now known. If the aircraft had been on a CDA, it would have been at about 200 feet altitude at this point (the arithmetic: assuming 3° approach path, about one-in-twenty, and a touchdown point 300m from the runway threshold, the aircraft impacted about 1200m from touchdown point, at which point it should have been at 60m above touchdown zone elevation (TDZE)).

Automatics are capable of controlling the airplane within a tens of feet of a given path, and routinely do so (indeed, they must do so in certain flight phases, such as cruise in european RVSM airspace). Given that there were no technical issues identified with the aircraft by the investigation, and violent weather was not a factor, a fully-automated CDA would have landed the aircraft on the runway; at least ensured it was not 300 ft below where it should have been assuming a normal 3° continuous-descent approach path.

(3)Apparently the Air India Express Boeing 737 «landed on RWY 24 just beyond the touchdown zone, in fair weather with no rain. It overran the runway end and plunged into a ravine (Learmount, op. cit.). According to the report by Aviation Herald, the runway has an ILS, required landing distance was 7500 ft and the runway length was 8100ft. There is no word yet, to my knowledge, on possible causal factors.

This seems to have been a routine landing, with no compromising weather. Such landings are routinely accomplished fully automatically, by the Hawk UAVs.

(4)The Airblue A321 had completed an ILS approach to RWY 30 at Islamabad, had turned right at low altitude and then left, to fly parallel to the runway. The crew is supposed at this point by many (with whom I currently concur, given the information available) to have been attempting a circle-to-land (CTL) manoeuvre, likely to land on RWY 12 (the reciprocal of the approach runway). CTL is a routine instrument flight rules manoeuvre, permitted from the ILS approach to RWY 30 as shown in this snippet from an approach plate, posted by «aterpster» in the PPRuNe discussion forum. In a CTL manoeuvre, the pilot, upon «obtaining a visual with» (i.e., seeing) essential parts of the runway or its environment, manoeuvres to land the airplane, provided the visual contact is continually maintained. If visual contact is lost, a routine «missed approach» manoeuvre must be immediately initiated. During the manoeuvre, the airplane must be flown within a given radius, just over 5 nautical miles, of a specified point on the airport. A diagram of this circling radius, overlaid on a plan of the airport and environment, appears in this post by «aterpster» in the PPRuNe discussion forum. A first approximation to the crash sight by, overlaid on a map with some of the navigation detail, including the CTL radius from a post by «aterpster» may be seen in this post by «PJ2», who updated his estimate of the approximate crash location some time later in this post. The crash site is reported by Aviation Herald to be about 10 nautical miles away, and in this early article in FlightGlobal, the WWW site of Flight International, to be 9.66 nm. The print version of the article (Flight International, 3-9 August 2010, p7) says 9.7 nm. There were reported to have been «no technical problems» in a later article in Flightglobal. So the impact site was at about twice the allowable CTL radius. The CTL radius encloses only flat land; the aircraft impacted «rising terrain», in other words a hill/mountain range nearby, but not so nearby as to constitute any danger to normal IFR operations.

There is a question, currently unanswered, as to why the EGWPS terrain-warning equipment did not enable the crew to manoeuvre to avoid the terrain.

Unlike a (presumed-)straightforward approach as at Tripoli, current commercial-aircraft automatics do not assist CTL manoeuvring in any reliable manner; the procedure should be hand-flown. However, it is a straightforward manoeuvre well within the capabilities of automatic control systems such as those on the Global Hawk to follow an ILS, and circle to land on the reciprocal runway, within the given limits. Automatics could have accomplished this manoeuvre within going outside the given CTL radius and therefore without a danger of impacting high terrain.

Furthermore, systems currently in test for the USAF, and shortly to become operational, perform automatic terrain-avoidance manoeuvres, even – expecially – during the kind of low-level manoeuvring performed by military pilots. The system is called Auto-GCAS and was extensively reported and flight-tested recently by Aviation Week (August 2, 2010, pp50-57). Here is a short blog on it by Stephen Trimble of FlightGlobal from last year.

Some proponents of EGWPS have suggested that avoidance manoeuvres in commercial air operations be automatically initiated and flown. This is well within current capability, as shown by Auto-GCAS.

(I have mentioned anonymous writers above. Here is what I know of them. “PJ2″ is someone I know, and with whom I have discussed accidents for a decade. He is a recently-retired captain for a major airline, where he was deeply involved in setting up the airline’s FOQA program. He is expert in aviation safety matters and I value his advice considerably. I do not know “aterpster”, but have read many public contributions by him. He self-indentifies as a former airline pilot who has been officially involved in accident investigations as a designated representative of pilots’ organisations.)

(5)Initial reports of the AIRES accident suggest that the aircraft landed short, for example this report in Aviation Herald. Weather is reported by FlightGlobal to have included thunderstorms in the vicinity. Some commentators on the PPRuNe thread have suggested that the main gear was torn off upon reaching the runway hard surface, which is elevated slightly above the surrounding terrain (one imagines the wheels sinking into software ground before the runway, and then impacting the hard runway construction).

It is not possible at this point to estimate the causal influence of the weather – one notes in the above references that the aircraft was reported to have sustained a lightning strike on final approach. But a landing of this sort to the TDZ is routine, even in stormy weather, for digital flight control systems. Providing, of course, they are sufficiently well insulated from the effects of a lightning strike.

(6)The Henan incident was also a landing-short, in reportedly benign weather – see for example the report in Aviation Herald – on a non-precision approach (NPA). The weather was reported as «foggy», but of course fog is incompatible with the kinds of atmospheric disturbances which might lead to control problems, and is not an issue for automatic control. A fully automatic landing was possible in these conditions, but not necessarily in the E190 accident airplane.

At this point, there is no public information about any technical problems with the flight. NPAs have been known for decades to be more accident-prone than precision approaches (ILS), but modern automation such as on the Embraer 190 can routinely perform CDAs, as discussed above with respect to the Afriqiyah accident.

None of the final reports are out, or expected yet, for any of these accidents. As things stand at present, the Ethiopian and ARIES accidents could have had the causal involvement of atmospheric disturbance, we don’t know. But other potential causal factors would have been mitigated if the manoeuvres had been performed fully automatically. In the case of the other four accidents, it seems quite reasonable to assert that, had the manoeuvres been performed fully automatically, outside the current capabilities of commercial-aircraft avionics but certainly within the routine capabilities demonstrated by Global Hawk UAVs, and the USAFs Auto-GCAS.

There are of course substantial safety issues with fully-automatic flight in civil airspace. It is correct to say that at this point it is not operationally feasible. For a recent review of some issues, see the forthcoming paper Computational Concerns About Integration….. by Johnson, to be read in two weeks at the SAFECOMP conference in Vienna.

So no one is yet suggesting, even for the medium term, pervasive fully-automatic commercial air transportation. But in light of the observations above concerning the six 2010 fatal accidents to large commercial jet aircraft, it does look as if it would be worthwhile to research whether standard approach and landing manoeuvres could be transitioned to routine fully-automatic execution.



Malware and the August 2008 Madrid Spanair Take-Off Accident

27 08 2010

On 20 August 2008, a MD-82 aircraft of the airline Spanair crashed on takeoff (TO) from Madrid-Barajas airport. The high-lift devices on the wing had not been properly configured to give the necessary lift on takeoff, and the aircraft was unable properly to lift off as planned. See Aviation Safety Net’s report of this accident for more details.

There had been a maintenance issue during a previous attempt at departure, and maintenance personnel had addressed this issue. In effecting the repair, however, the takeoff configuration warning horn, which aurally warns the crew that the high-lift devices are not appropriately configured for takeoff, had also been disabled. The crew is required, in the pre-take-off check list which they have to perform, to check that the aircraft is appropriately configured for takeoff, and it seems that they did not do so at the second departure: they performed some of the items, but not the full list.

Spanair uses a ground-based computer to process aircraft logs for maintenance issues. The fault which caused the accident aircraft to return to the gate had apparently occurred more than once the previous day, and been logged. But the press has recently reported that malware in this computer delayed the processing of reports, and so maintenance was not aware of the problem the previous day, when they would have been able to correct it, before the fated flight. The Press reports have thereby connected this malware with the accident. See, for example, a summary in english of the reports by Daniel Johnson on the University of York Safety Critical Systems Mailing List.

Brian Reynolds commented on these reports that “This is totally bogus” and clarified that he meant that it is “totally bogus” “[t]hat a virus or Trojan in a ground maintenance computer is casually related to this incident.

Reynolds seems to be denying the claim that malware in a ground-based maintenance computer is causally related to the accident. But he omitted to say what his criterion for causal-relatedness is.

I have one: the concept of necessary causal factor, proposed in 1973 by the philosophical logician David Lewis, who credits the concept to David Hume (his “second definition” of cause). I took over Lewis’s semantics 15 years ago for use in failure analysis.

According to this semi-formal, objective notion of causal factor, there is demonstrably a chain of causal factors leading from the presence of the malware to the accident. According to this concept, Reynolds is provably wrong.

So now let me show this.

Here is the Counterfactual Test:

Let A and B be events or states.

A is a necessary causal factor in the occurrence of B just in case:

If A had not occurred, then B would not have occurred.

This last sentence is called a counterfactual (or contrary-to-fact) conditional. “Conditional” comes from the “if…then…” form; “Counterfactual” from the fact that A and B did as a matter of fact happen, and one is supposing what the world would then have been like had A not occurred. In order to determine this, I adapt the Lewis semantics: suppose A had not occurred, but the world stayed otherwise as similar as possible to the actual state of affairs that pertained. Did B occur in this possible state of affairs? Most often, we cannot answer with absolute certitude “yes” or “no”, but it turns out that we can most often answer “most likely, yes”, or “most likely, no”. The Counterfactual Test is to ask this question I just posed. If the answer is “most likely, yes”, the Counterfactual Test is “passed” and A is a necessary causal factor of B. If the answer is “most likely, no”, then A is not a necessary causal factor of B. We have found the Counterfactual Test to be very useful in complex engineering failure analyses.

To show a causal connection between the presence of malware on the maintenance computer and the accident, here are five instances to check with the Counterfactual Test:

1. Had the malware not been present, the fault causing the phenomenon would have been noted by maintenance personnel in a timely manner (let us say: at latest, end of the previous day).
2. Had the fault causing the phenomenon been noted by maintenance personnel in a timely manner, it would have been appropriately repaired before the accident flight.
3. Had the fault been appropriately repaired before the accident flight, the TO-configuration warning would have sounded on the accident flight.
4. Had the TO-config warning sounded during TO on the accident flight, the TO would have been aborted when the warning sounded and the aircraft properly configured before subsequent TO.
5. Had the TO been aborted when the warning sounded, the aircraft would not have crashed as it did.

I consider all of these counterfactuals to be true according to the Lewis semantics. It follows:

1a. The presence of the malware was a necessary causal factor in the lack of timely awareness of the fault.
2a. The lack of timely awareness of the fault is a necessary causal factor in lack of timely repair.
3a. The lack of timely repair is a necessary causal factor in the TO-config warning inhibition.
4a. The TO-config warning inhibition is a necessary causal factor in continuing TO to loss-of-control.
5a. Continuing TO to loss-of-control is a necessary causal factor in the accident.

So, there is a chain of six causal factors, chain-length five, connecting the presence of malware to the accident. QED.

I emphasise, just to avoid misunderstanding, that these are by no means the only causal factors relevant to the accident: that the crew failed adequately to perform the pre-takeoff check list on the accident flight is most certainly a necessary causal factor in the loss of control. The reader is invited to try out the Counterfactual Test to assure himherself of this.

Applying the Counterfactual Test rigorously throughout the list of potentially-relevant factors, to see which ones are indeed causally relevant and which not, is the core of our analysis method Why-Because Analysis (WBA). For those interested in seeing relatively quickly how we perform WBAs nowadays, there is available a case study on how to perform a WBA using the SERAS Reporter and SERAS Analyst tools. Here is some general info concerning our experience with Why-Because Analyses. Typically, depending on the level of detail provided by the investigation, a detailed causal analysis (which we represent in graphical form as a Why-Because Graph) ends up showing a hundred to a couple of hundred individual factors, of which a quarter to a third are “root-causal factors”, that is, causal factors which are not regarded as themselves having pertinent causes. So WBA also includes a fair amount of bookkeeping, or “complexity control”, or whatever one wants to call it. For example, given a WBG with a couple hundred items, one would assemble these causal factors into a small number of subgroups, and give these subgroups appropriate titles, to provide an “executive summary” of the analysis. The SERAS Reporter and SERAS Analyst software is available as freeware from Causalis Limited .

We can well expect a full WBA of the Spanair accident to contain between a hundred and a couple of hundred factors.



Understanding Aerodynamics of Stalls

28 07 2010

Recently, most commercial transport airplane manufacturers have been revisiting their FCOM procedures for “stall recovery” (actually, procedures avoiding that an approach to stall turns into a stall). This may be related to the spate of recent accidents in which commercial airplanes have been stalled: Colgan Air in Buffalo, Turkish Airlines in Amsterdam, XL Airways in Perpignan. Such a spate of loss-of-control (LOC) accidents is a sudden new development in aviation accident statistics. People are concerned it might signal a trend and are looking for possible causes of this trend, if it is one.

A discussion on such matters started on the Professional Pilot’s Forum PPRuNe on a thread with the ungrammatical title of New (2010) Stall Recovery’s @ high altitudes. I agree with the moderator, who goes by the handle of John_Tullamarine, that the discussion has been stimulating, although I had my doubts as it started, which readers of the thread may observe.

The discussion has been enlightening in a number of respects. One aspect which startled me is the degree of understanding of stall – what it is, when it happens, and its functional relationship with other aerodynamic parameters. The stall is one of the most important, if not the most important, phenomena with which pilots must cope (preferably by avoidance). As is buffet. I conclude that such understanding amongst line pilots could be easily improved. There are graphs used by aerodynamicists; they are all more or less the same shape no matter what the airplane. You can find them in any intro-aero book, say John D. Anderson Jr.’s Introduction to Flight, or Richard Shevell’s Fundamentals of Flight, without any numbers on them, as well as in many FCOMs with numbers on them. Why are they not studied in type training and the knowledge tested?

It could be – has been in the thread – argued that “pilots don’t need to know” such things. As off and on a professional educator for the last few decades, I have participated in enough discussions about what technical practioners need, respectively don’t need, to know. I have seen what happens, at enough places. We as a profession are now – have been for a decade or two – giving computer science degrees to people who can’t really program very well, at least not according to the standards we used to have. Do I think this is – ever – a good idea? No, I don’t. Do I think people should be professionally flying complex airplanes without understanding the aerodynamics presented in the FCOM? No, I don’t. Although I imagine not everyone will agree.

My practical philosophy of education is as follows. My default answer to what people need to know is: everything. That said, there are practical limitations (of ability, of time) which entail prioritising knowledge and intellectual skills in using that knowledge (which, while related, are not the same thing).

Can we reduce such knowledge to algorithms, to operational instructions, as has been suggested in the thread: “if this happens, do Y”? I am sceptical. Choosing the correct action as a pilot requires appropriate situational awareness.

There was, for example, considerable debate about tail-plane stalls and training syllabuses following the Colgan Air upset in Buffalo. Colgan Air used a NASA video about stalls in icing as a training aid, and this video emphasised so-called tailplane stalls due to icing, for which the remedy is apparently inconsistent with the action required for a main-wing stall. The Q400 in the Buffalo crash is not at all susceptible to tailplane stalls, and is equipped with a stick pusher to prevent main-wing stalls. However, the pilot flying pulled the stick back, overpowering the stick pusher repeatedly, which is exactly the wrong thing to do if the main wing is on the point of stalling (it is, after all, the purpose of the stick pusher to do the right thing in this situation) but might be appropriate for tailplane stalls. It was therefore questioned whether the pilot had appropriate awareness of the aerodynamic situation the aircraft was in, and it has been concluded that he did not.

Having appropriate situational awareness requires understanding the phenomena, as well as understanding the limits of understanding. Needing to distinguish between what some people call “stalled” (namely, at or just beyond the maximum value C_L_Max of the coefficient of lift, C_L, when large parts of your wing may still be flying) and “fully stalled”, for example (when none of your wing is flying). The question arose in the thread whether one may use ailerons to lift a dropped wing at stall. The obvious answer is that you can if that part of the wing is still flying, but you most definitely should not if it is not. How do you tell which situation you are in? Trying and seeing is not a wise option.

Consider, for example, FCOM procedures concerning “stall warning” on a popular large airplane after lift-off (A330, 3.04.27 P 5a):

THRUST LEVERS ….. TOGA
At the same time:
PITCH ATTITUDE …..REDUCE
BANK ANGLE………..ROLL WINGS LEVEL
SPEED BRAKES……..CHECK RETRACTED

This assumes that you are at high angle of attack (AoA) but not yet stalled, and that the ailerons are still flying. The stall warning may also go off at high altitude, in which case: “relax the back pressure on the sidestick and reduce bank angle, if necessary”. In both cases, it is assumed that the wing is flying, but that bits of it are telling you they might not for much longer, and you need to back away from that point. These procedures obviously won’t help much at all if your nose gets to be way up in the air at 45°-60° of pitch, as happened at Perpignan with a related airplane.

The answer to telling which situation you are in is probably found in a good intuitive understanding of the aerodynamics in the FCOM, and for that one needs a good basic understanding of aerodynamics in general. One illustration of this is the suggestion that was made in the thread on a potential means of discriminating stall buffet from Mach buffet: the feeling of the frequency of the buffet.

This also illustrates the limitations of simulation, a topic on which it seems not all thread contributors are clear. It seems that many people still seem to think that flight simulators, including the expensive moving kind used for airline pilot training and recurrency, are veridical around upset scenarios. How on earth do these people suppose simulators can reproduce veridical buffet? That some aerodynamicist has sampled the frequencies of buffets in the wind tunnel, and given that to some simulator programmer to reproduce, as well as some engineer to make sure that none of it coincides with the resonant frequencies of the simulator? And most of those wind tunnel models that generate the data fly without horizontal tail pieces; what is the effect of the tail? Mostly, one doesn’t actually know, but extrapolates from one’s experience as an aerodynamicist. I feel that a basic understanding of aerodynamics would cure many illusions about the veridicality of flight-simulator behavior outside the normal flight envelope.

Whatever one thinks about what pilots should know or not know, it seems to me a good idea to clean up the vocabulary, suggested through the following examples.

“Stall” is a term of art: for example, sometimes it means the same as “at C_L_Max”, and sometimes it means “the point at which buffet is severe enough to discourage further increase in AoA” (cf. the definitions used in the airworthiness certification document, CS 25). Does being at or over the stall mean you have no lift? No, actually you may have more lift than in most other regimes of flight (just over C_L_Max) even though you might be shaking severely, or you may have much less (AoA way over that for C_L_Max).

Another terminological inexactitude resides in the terms “low-speed stall” and “high-speed stall”. The first refers to the situation in which the AoA is too high for the speed; the latter often refers to a transsonic overspeed situation, in which lift is reduced because of the formation of shock waves over certain parts of the wing, which waves, because they form at or near the leading edge, reduce lift forward and thereby move the center of aerodynamic lift rearwards, leading to a nose-down moment about the center of gravity of the aircraft, which gives nose-down pitch or “Mach tuck”. Use of this terminology leads one to the anomalous-sounding phenomenon of the “low-speed” stall at “high-speed”. Maybe the terminology “high-alpha stall” or “high-AoA stall” would be preferable to “low-speed stall”, and to use the word “transsonic” rather than “high-speed” to indicate effects of shock waves on lift?

Another vocabulary hang-up occurred in the discussion on the thread of V_s1g, or stall speed at 1g. Is it a constant speed or not? If not, with which aerodynamic parameters does it vary?

V_s1g would occur when lift at C_L_Max is equal to weight (W). Lift = q x S x C_L, where q is dynamic pressure and S is an area term usually taken to be the area of the wing planform. So at V_S1g, W = q x S x C_L_Max. Given that q = ½ x density x V^2, we can solve for V: V_s1g = Sqrt( (2 x W)/(density x S x C_L_Max)). S is obviously constant for a given airplane. What about C_L_Max? If you can ignore compressibility effects (i.e., below about 0.3 Mach for most wings) then C_L_Max is effectively constant, as is the AoA at which C_L_Max is achieved.

Now consider density. Air density obviously varies with altitude, indeed with the properties of the atmosphere on the day and at the place. So if one wants V_s1g to represent the true airspeed (TAS), then this obviously varies, but with a bunch of parameters not measurable with equipment on board most commercial aircraft. However, aerodynamicists like to talk Equivalent Air Speed (EAS), in which inter alia density is defined as sea-level standard-atmospheric density, 1.225 kg per m^3 (kg.m^(-3)).

So V_s1g, as EAS, varies only with (the square root of) weight. Weight obviously varies (with load, fuel burn and so on) but it is not an aerodynamic parameter, and is usually considered constant when talking aerodynamics. It follows that V_s1g, expressed as EAS, is constant.

However, V_s1g, as indicated, say, in the A330 FCOM (3.01.20 P7) is expressed in Calibrated airspeed (CAS), which is the pitot-static-measured airspeed corrected (usually digitally) for the effects of how the sensors are positioned in the air stream, and expressed in CAS there is a correction for pressure altitude, starting at about 20,000 ft for lighter weights, and going down to about 5,000 ft for heavier weights.

So, as a “practical” matter, is V_s1g (at fixed weight) constant or not? As an aerodynamicist, liking EAS, one would say yes, as a pilot, preferring CAS because that is what one sees on the airspeed indicator, one would say no. That could be a source of genuine confusion at times.

A more obvious but less insidious vocabulary hang-up is Mach number. Is it a speed? Strictly speaking, no. It has no units (it is a ratio of speeds: airspeed to the speed of sound, which varies with air temperature); whereas speeds have units of length per time unit (m. s^(-1) or ft . s^(-1) ). However, in response to a question “how fast were you going?” one might well respond “at 0.8 Mach”, and indeed Mach is used in preference to airspeed to adjust for many situations at high altitude. For example, limiting dive is expressed as both speed and Mach number, as is turbulent-air penetration, maximum operating (max. cruise), and so on.

Other vocabulary hang-ups occurred in the thread when talking about “approach to stall recovery” and “stall recovery”, and these I feel are insidious. Some correspondents (including the thread originator) insist they have been practicing “stall recovery” in an airplane with a stick pusher, despite the obvious point that if the airplane has a stick pusher and you respect the pusher, it is not stalled. Indeed, many “stall recovery procedures” are more accurately described as stall avoidance procedures, or approach-to-stall recovery. Surely such confusion would be resolved through a little aerodynamical knowledge and some common sense about safety-system design?

One correspondent, when asked repeatedly whether he thinks that test pilots have been going up and stalling Airbus airplanes, in order to rewrite the “stall recovery” FCOM procedures (actually “Stall Warning” in those for the A330 referred to above) and to calibrate simulators, wisely declined to answer. As a veteran pilot, with the handle 411A, said, Has anyone here actually stalled a large swept-wing airliner? I[f] so, what were the results?. Another, Airclues, replied In the early 80′s I was co-pilot on several C[ertificate] of A[irworthiness] air tests on the Boeing 747 when a full stall was completed (I believe that the UKCAA was the only authority that required this) and described his experiences. In other words, actually high-alpha-stalling large commercial aircraft, even for certification, is ancient history. I very much doubt it was done just to rewrite stall avoidance procedures and calibrate simulators.

A useful discussion indeed, but I suspect it will take more than a pilots’ forum thread to sort these issues out.



Risk Assessment of Volcanic Ash to Commercial Aviation

28 05 2010

Paul Marks of the New Scientist has a couple of good recent articles on the volcanic-ash problem for commercial aviation, one from today and one from last week.

I talked about a simple calculation of this risk in my Risk course this morning, since it is topical, it shows practical issues well, and it fits in about an hour’s lecturing (with anecdotes). It seems that few people want to or can perform an elementary risk calculation about flying in the volcanic ash from Eyjafjallajökull. Here goes. It’s very crude, but still leads to some insight.

Let us classify first the outcome categories per flight. I choose four:

1. No damage
2. Engine needs thorough inspection and cleaning
3. Engine needs major overhaul
4. Engines stop in flight.

All of these have happened. 1 to the majority of recent airline flights, 2 to a couple of Ryanair planes, and to the Finnish F-18s that had an encounter on April 15 , the day before the first ban, reported here previously, 3 to the (in)famous NASA DC-8
(at a cost of $3.2m, so one reads), 4 to Eric Moody on the famous BA 747 in 1982.

One can almost directly read off the severity from these. Let us consider units to be equivalently pounds or euros or dollars. The sign “^” means “to the”, the exponential. So, e.g., 10^4 = 10,000, 10^6 = 1,000,000.

Severity of events (event classes) 1-4
1. 0
2. 10^4 to 10^5
3. 10^6 to 10^7
4. If a catastrophe is caused (i.e. the airplane does not succeed in making a dead-stick landing on an airport) then 10^8-10^9

It is curious that these four categories fit so crudely but neatly into powers of 10, covering the range.

So the risk is (the old De Moivre definition from 1711):

probability(1).severity(1) + probability(2).severity(2) + probability(3).severity(3) + probability(4).severity(4)

In fact, this is only a crude estimate of severity, since if some engine is found to be damaged, then all engines on all airplanes flying into or from those airports that engine flew into and around those routes that engine took will have to be inspected as well, and that might run into the hundreds. This calculation does not take account of these knock-on effects.

Using severity(1) = 0, the risk per flight then lies between

10^4 x prob(2) + 10^6 x prob(3) + 10^8 x prob(4)

and
10^5 x prob(2) + 10^7x prob(3) + 10^9 x prob(4)

(using the factors of ten associated with the severity ranges)

Consider your average intraeuropean flight, say Air Berlin flying Paderborn-London Stansted. Boeing 737NG, let’s say 150 people on board (this is an overestimate), paying €100 per seat (actually, it’s lower, and much of that is airport tax). Your revenue for the flight is at most €15,000 (and a lot less if you take out airport tax). So your expected value of loss, the risk, above, must be less than this if you hope to do better than by not flying. So your decision criterion is

10^4 x prob(2) + 10^6 x prob(3) + 10^8 x prob(4) < 15,000

if you take the lower estimate of risk, and

10^5 x prob(2) + 10^7 x prob(3) + 10^9 x prob(4) < 15,000, that is

10^4 x prob(2) + 10^6 x prob(3) + 10^8 x prob(4) < 1,500

if you take the higher.

Let us take the lower estimate. You can handle a cleaning event without much trouble, but you had better be sure, to break even, that you have at most one chance in just over 60 flights of an overhaul event, and only one chance in just over 6,000 flights of an engine-out event.

Given what was known on April 16th about outcomes (for example, that the Finnish engines might be trashed), I wonder how much of what we heard from airline chiefs complaining about not being able to fly was political manoeuvring for government handouts to “compensate” them for being forced to do what a risk analysis would have told them to do anyway?

PBL