Thoughts on Engineering Communication (with a bit on Ice Particle Icing and AF447)

21 08 2009

I have been thinking recently about professional engineering communication.

I was reminded once again of the lack of consensus by Nancy Leveson’s comment that “[t]he type of limited interaction that is possible by email is just not conducive to communication” as well as her regret at being “… pulled into one of these web debates because it takes so much time and produces so little”, on the University of York safety-critical systems list http://www.cs.york.ac.uk/hise/safety-critical-archive/2009/0369.html. I don’t agree with this view on email. I am a heavy user of email, both for longer essay-style pieces (although I am now moving more towards blogging) and for short exchanges. I consider e-mail lists such as that run by York to be an appropriate and helpful form of professional communication. I might agree partially with her view on WWW forums, because I find some forms problematic for professional purposes, but then again I think some of them work well (for example, the York list archive is a WWW forum).

I think no one medium available to us satisfies all the communicative needs of engineers in a developing field. I propose that prowess in engineering communication, traditionally required for evaluation of academic personnel, be based on more than traditional journal- and conference-paper publishing.

Advance in engineering depends on communication somehow. If one person in the world finds out how to solve engineering problem X, then unless heshe spreads the word, or word gets around via hisher customers, that technique remains hidden and others will not use it to solve problem X.

For the solution of specific engineering problems, or for the communication of engineering problems themselves (such as the “hot topic” of ice particle icing), it seems to me that traditional journal and conference publication works quite well, even though there are all sorts of problems with peer review procedures.

However, for discussion of current practice, or historical practice, and for discussion in general, declamatory articles such as those which appear in journals or conference proceedings don’t work that well. Neither do the magazines (because articles are by their nature declamatory). Journal or magazine letters pages also don’t work that well – witness the recent interchange between Keith Miller and myself on the Gotterbarn/Miller paper in the June 2009 IEEE Computer, which proceeded much more rapidly and fruitfully, but also privately, by e-mail than it did through the letters/reply section in the magazine (IEEE Computer, August 2009). See previous blog posts here for the public exchange.

I hold discussion to be very important in the engineering profession. Witness, if you will, again, the Ladkin/Miller exchange. Had this not occurred, Messrs Gotterbarn and Miller would be on record as holding that the recent A330 incidents were an instance of SE ethical problems of a certain sort, whereas they now agree that the issues are more subtle, if not other, than they originally proposed. A change of view arrived at through discussion.

Consider another example: how does one best handle issues of best practice, such as formal-language specifications versus natural-language specifications? Such issues need discussion: some think “natural language specs are best”; others think “formal language specs are best”, and there are different communities of practice built around these views. If you work in safety-critical electronics in the European railway industry, you must use natural-language requirements specifications because the standard says so, even though you might think this is a load of junk. Whereas if you work in one of the more prestigious sectors of avionics, you would likely do formal-language specs, even if you were a nat-lang-spec enthusiast.

Some people think the standardisation processes suffice for communication of best practice. Others think, as I do, that the neither the standardisation process nor the emission of standards suffices to communicate best practice. Indeed, I would go further. I also do not think the emission of standards necessarily embodies best practice, as my contributions over the years on the functional-safety standard IEC 61508 on the York list may indicate.

So what does embody best practice and how does one tell? Well, one thing to observe about the engineering profession is that there is no one way to skin a cat. There are many, and the best engineers will be intimately familiar with all of them, or at least with as many as they can be. One engineer may prefer one way, another engineer another way. What could they suggest to a third engineer, also attempting to skin a cat?

Engineer A: “Do it my way.” Engineer B: “Don’t do it his way; do it my way”

or

Engineer A: “Do it my way.” Engineer B: “Yes, do it his way; don’t do it my way”

or

Engineer A: “I do it this way, but any other way will work. However, I can help you best with my way.”

All these answers are possible from responsible engineers, who would have taken into account their interrogator’s environment and that of hisher task.

Engineers must interact this way. It is an important part of what they do. It is communication, it is necessary, and the question I wish to address is how, using what form, it may best be accomplished.

Let’s make it more concrete, with a concocted example whose content appears regularly on the York list.

Question: “I am building such and such a safety-critical system and we have to use the programming language C because that is what we have a compiler for, for the chosen hardware. Is this OK or should I veto the project.”

Answer 1: “Your source code, if it is written in C, will have no well-defined unique meaning. C compilers have odd quirks such as producing different object-code behavior depending on which order one writes the arguments to a test, and ………. So you will not be able to tell exactly what your object code does and thereby not be able to assure the behavior of your system to the required degree. To get the highest degree of assurance attainable by any practice to date, use, say, SPARK and an Ada compiler to avoid the problems with C detailed above, and to take advantage of the documented quality of SPARK code development. This may necessitate changing the underlying hardware if there is no Ada compiler targeted to your hardware. If you can’t change the hardware then recommend SPARK for the above reasons and at the same time veto the project.”

Answer 2: “There exists enough experience with C and C subsets such as the MISRA subset and C static analysis tools that you can be fairly assured of a more-or-less unique meaning for what your object code does, providing you pay a lot of attention to the known weaknesses of C constructs as listed in [a ten-year-old book] and you are careful about your choice of compiler and carefully research the known problems with the compiler and avoid them. The available analysis tools aren’t perfect but they are pretty good for most purposes. And, besides, Engineer Y has shown one can [read: he can] do this in a significant project. And, besides, everybody does it. And, besides, if you are stuck with this hardware, as you say you are, you have no real choice.”

Ripost from Answerer 1: “Sure, Y is one of five people, or fifty people, or one hundred people in the world that have a track record of doing this. Hire him. Or one of the other 5/50/100. Then you might be OK. Else, do it the way I said.”

Now, imagine you are buying the car in which this equipment is installed, one of a few thousand, or a few hundred thousand, or a few million built, for your family. Wouldn’t you rather that such a discussion had taken place in a highly prestigious forum, which as many eminent engineers as possible read, and can contribute their views, as required? And that some sort of consensus had developed as to what the questioner should do, and that some sort of assurance was available that heshe had done it?

So what would that forum be? The York mailing list? Not really- not all professionals read that list, and some of them think about it that “[t]he type of limited interaction that is possible by email is just not conducive to communication.” Leading journals that everybody reads? Well, it doesn’t happen. Or, better said, in my experience the journals in which such things appear are not much worth reading. Why is that?

It is that way, I propose, because this kind of discussion is not accorded the prestige which, say, journal publication of research accrues. In my view, a way should be found to value participation in insightful and fruitful discussion as prestigiously as journal publication, because such discussion is equally vital to engineering, as I hope I have just shown.

Well, a gainsayer might say, Engineer A can publish hisher view in a journal. Then Engineer B can reply. And then Engineer C, and so on.

I don’t think that will work in general. Consider the following recent example.

On June 1, an Air France A330 crashed into the South Atlantic in an area of unstable weather, having sent a series of cryptic maintenance messages from the Central Maintenance Computer as its last communications. Bits of the aircraft have been found, but not the bits most important to knowing why it went down.

Somebody found and published a report from another airline of a flight which had suffered similar phenomena at a similar altitude. And then other reports surfaced. People who had access to these reports had their own professional interests which would induce them to certain behavior, such as keeping them quiet or broadcasting them. Broadcasting is the only stable state: you cannot keep something under wraps once it has been broadcast. One of the major players is an anonymous broadcaster, a WWW site, called eurocockpit.com. The advantage of broadcast in this instance is that all the various pieces of data, available only to some people and not to others, have been brought together into the public domain.

The result of this communication activity has been that, probably within a month and certainly within two months, almost all pilots are aware of and wary of a phenomenon which on May 31, 2009 was not known to exist: high-altitude ice-particle icing of air data sensors. There were individual incidents, indeed many, but nobody knew about them all, and if you just know about one or two perplexing incidents there are many possible causes of it or them. But when you have a dozen, or a couple of dozen, and another one occurs as you are wondering, then it concentrates the mind wonderfully. The result is that EASA has published a proposal for an Airworthiness Directive aimed at replacing all those sensors thought to be more susceptible to ice particle icing than others.

The odd thing about this example is that the airplane in question has been in service for well over a decade, indeed much nearer two, and these incidents have apparently only occurred since March 2008. Explain that one! (Anyone who says “global warming” must go stand in the corner for an hour :-) )

My view is that you cannot explain it at the moment, but that the communication behavior around whatever symptoms of whatever phenomenon we are talking about here (likely ice particle icing) could have been different from what it was up to the loss of Air France flight 447 on June 1, 2009, which apparently suffered these symptoms. And maybe it could have been different in such a way as to have led to measures which could have averted the loss of an airplane and its occupants? A fine article on this history, which raises this question, has recently been written by Jens Flottau and appeared in Aviation Week and Space Technology on August 10, 2009: Response to Airbus Pitot Tube Incidents Under Scrutiny.

To be clear: I am talking here about forms of communication which we use, and not at all about any specific individual or organisational behavior. I am not suggesting that any individual, group or organisation did less than the very best they could about the evolving issue. Indeed, this remark serves to strengthen my suggestion that the communication forms themselves can give us a level of control over engineering developments, such as experiencing, recognising and then handling ice particle icing of air data sensors, which we do not currently possess.

It is not just ice particle icing of air data sensors. Ice particle icing caused engine problems to one type of engine on the BA146 airplane. It was not known to occur to others, but some Boeing and Honeywell engineers looked at incidents of surge, flameout and other anomalous events at altitude on other airplanes and came to the conclusion that they were due to icing phenomena at high-altitude, sometimes in cloud which was so thin that it barely hindered visibility. This stuff has appeared in the journal literature: see The Ice Particle Threat to Engines in Flight by Mason, Strapp and Chow, 2006, which refers to Cloud Particle Measurements in Thunderstorm Anvils and Possible Threat to Aviation by Lawson, Angus and Heymsfield, 1998. And in 2006 there came NTSB Recommendations to the FAA. But there were still 20,000-hour long-haul pilots (for all I know, still are), a group of people to whom this phenomenon would surely be of great interest, who apparently do not know of this work. One said even as late as a month ago that he does not accept ice forms below -40°C: http://www.pprune.org/tech-log/381558-ice-crystals-2.html#post5070024, and http://www.pprune.org/tech-log/381558-ice-crystals-3.html#post5074951.

It is through the communication of incidents, each of which was previously known only to a few people, many of those people being different people, that a dangerous phenomenon, ice particle icing of air data sensors at high-altitude and cold temperatures, has been identified. This is a significant engineering achievement. How did it happen? WWW. E-Mail Lists. WWW Forums. And, also, traditional methods of communication amongst appointed representatives of involved organisations. But by no means solely the latter.

So, given that discussion and communication is vital to engineers, and the traditional form of journal publication does not suffice, how should the contribution of, say, a research engineer be assessed? (For purposes, for example, of awarding a prize, or awarding tenure, or of getting an academic job in the first place.) I propose that such assessment also look at participation in these other essential communicative activities and not just traditional publications. I agree there is a problem of parameters and quality control. Just getting hits on your blog isn’t necessarily a good measure; but getting the most hits on your blog of anybody working in your area just might be.

To finish up: what forms of communication work, and how?

1. Obviously peer-reviewed journals and conference papers work.

2. Obviously WWW sites with journal-style papers work.

3. I would contend that moderated, selective forums such as the Risks Forum work.

4. I think some sorts of blogs work. I am sceptical of the frequently-written 200-word anecdotal variety of the sort the IEEE is promoting , but I do like the weekly-essay variety employed to such notable effect by people such as Nobel laureate Gary Becker and Judge Richard Posner at the University of Chicago in their blog. It is by following such blogs for a while that I believe I have come to understand what they are good for, and have started trying to emulate.

5. For specific purposes, such as the wider collection and dissemination of controlled information, carefully-moderated anonymous forums such as eurocockpit.com

These are all declamatory forms, with only limited possibility, asymmetrical, for discussion. What works for the kind of essential discussion I illustrate above?

6. Not anonymous WWW forums. I don’t yet know a forum which can be successfully followed unless one has lots of free time and a huge tolerance for purposeless commentary or for poseurs. For example, I have made two unsuccessful attempts to develop a presence on PPRuNe, the professional aviation people’s forum, and PPRuNe seems to me to be head and shoulders above anything else in which one can discuss aviation accidents. The main issue seems to be that moderation attempts are often overwhelmed by the task on high-interest topics, and no one seems to have a good solution to this phenomenon.

7. Yes, non-anonymous controlled-access WWW forums. Such as the York mailing list. (Note that its archive makes this list into such a forum.) A colleague to whom I once mentioned that I had been contacted to write a textbook on safety suggested that all I had to do was collect what I had written on the York list over the years and organise it. (Yes, well, the organisation part. It was simpler to start writing from scratch :-) )

8. Something that does not exist, but well might. Peer-reviewed or moderated (same thing, maybe?) non-anonymous forums for publication of essays and for discussion. There is a fundamental tension between encouraging comment, insight and debate, and insisting on quality. Quality means taking time over composition, which in turn discourages people from contributing. There are such forums at present, for example the functional safety area on the IEC WWW site, but they are not hives of intellectual activity.

9. Jan Sanders suggested using video. A forum in which engineering questions could be put, and engineers give their answers verbally in a video, and videoconferencing could be used to resolve, or at least further to discuss, discrepancies. Like written forums, this would be moderated to ensure quality. The advantage of videos would be that it takes many people less time both to record their views and to receive the views of others through speech than it does through writing, and speech is most effective when one sees the speaker speaking.

I am a fan of debating, I like mailing lists and, newly for me, blogs and I wish there were some way of professionally assessing contributions to these forms of communication.



Eight Themes in System Safety Engineering

16 08 2009

I was led recently to think of some of the main issues in safety engineering of systems with computer-based components, when they occurred in the course of a discussion on the University of York safety-critical systems mailing list (look for “Certification of Tools/Components” in the archive). Here are some of these issues and my views on them.

Issue 1. “Where is the hard part of safety engineering? At the requirements level, or below?”

I think there is consensus that many of the most significant failures of the last decades have come down to requirements failures (mismatch between the engineering functional specification and the actual environment in which the system was operating). This was shown most visibly by Robyn Lutz on NASA mission failures some two decades ago. It was also substantiated by an HSE report whose name I don’t recall. And it fits with the experience of most of us.

So here is a heretical thought. Maybe this is changing as the systems become more complex? The 1997 Mars Pathfinder failure occurred because of a known weakness of an operating-system prioritisation scheme. It turned out that the designers didn’t know the right computer science; others did. With simpler systems than a Unix-based real-time kernel with priority schemes, it is likelier that everyone involved would have known the right science and any problems would not have lain in the OS.

The B777 has suffered two significant problems. One came to light in the 2005 pitch-excursion incident in Western Australia. The other was a series of Byzantine failures on the control-system bus which almost led to the airworthiness certificate being withdrawn. Neither of these were requirements failures.

The A330 has also suffered multiple pitch-excursion incidents, to two aircraft in Western Australia in late 2008. That seems to have something to do with the fault-tolerance architectural behavior. No requirements failure there.

Except, of course, that one person’s design is another person’s requirements. Dealing with requirements involves developing one specification. Design involves comparing two. Because of social constraints (subcontracting, proprietary information, and so on), maybe many engineers are working with one specification when they should really be comparing two? That doesn’t make associated failures into requirements problems, although the organisations involved may like to characterise them this way. They are traceability and refinement issues.

Issue 2. “Where should we concentrate effort in safety-critical system design?”

The obvious answer is: anywhere and everywhere there is a weakness. There are weaknesses all the way from initial conception (“requirements elicitation”) down to object code and thence to the HW.

One issue: Almost any code into which anyone has ever looked has bugs in it. Indeed, since Rod Chapman discovered the bug in the Praxis/NSA Tokeneer project, to that point thought likely to be bug-free, while writing the project up for public release, I think I can eliminate the word “almost”. (Recent work at NICTA in Sydney has verified a real-time operating system kernel in C, using Larry Paulson’s Isabelle prover. However, I would like to know what NICTA assumes about the operational semantics of the precise C code they verified, and how they assure that semantics in the compiler they recommend using: if they do so, I’ll put “almost” back in.)

These “bugs” are design features which lead in certain circumstances to the system behaving differently from intended. That is, the SW engineers knew what their programs were supposed to do, but what they delivered does not always do that. Sometimes it can be humdrum, but sometimes it can be very tricky: huge amounts of intellectual effort are expended by some of the best computing-scientific minds on the planet trying to solve such issues as Byzantine disagreement on an abstract level.

And here we come to an issue I address again below: there is some feeling that all appropriate techniques should be usable by an average engineer in the field. My view is rather that if it takes people with PhDs in computer science to do this work of ensuring as-bug-free-as-possible, then, please, let them, indeed, encourage them. I don’t think your average bachelor’s-level engineer can check for or handle Byzantine agreement problems. Indeed, I don’t think your average bachelor’s-level engineer even knows what they are.

I am not very sympathetic to views which suggest that any part of this general problem is not important, but some prominent people in the field appear to hold such views.

Issue 3. “Formal methods are only “math” and not “real world”. ”

This issue comes up regularly, and has come up regularly for the last quarter-century I have been in the field. It rests on a “folk” view of the nature of math, which to my mind has been largely discredited by philosophers of mathematics and science. Nevertheless, it remains current amongst some of us in system engineering.

Consider the numerical mathematicians who have built codes to calculate the exact aerodynamic behavior of modern aircraft before they are even put into a wind tunnel.

Consider the logicians who analyse a requirements specification and point out that such a system cannot be built (because the requirements are inconsistent).

I don’t think you would find either group sympathetic to the suggestion that what they do doesn’t have real-world import. I don’t see why anyone else should be sympathetic to it either.

Issue 4. “There was plenty of rigorous reasoning around before formal logic was invented. So we don’t need formal logical languages to write and check requirements specifications.”

The first statement is true. The very same history, however, gives general reasons to doubt the second. Most system engineers and computer scientists do not have sufficient background in the history of reasoning to draw this conclusion effectively for themselves.

I think there is general agreement that Aristotle’s analysis of syllogistic reasoning was largely accurate. However, for the next couple of thousand years people had difficulty formulating the principles of reasoning with multiple quantifiers (a quantifier is a term such as “every”, “each”, “some”). The medievals had an elaborate theory of “distribution” of terms. It didn’t work. Frege solved the issue with the notion of (what we nowadays call) scope and a formal language which had a means of indicating scope. There is no doubt that, after Frege, humankind had the means of saying rigorously what it meant and being understood. And those means were formal languages based on first-order predicate logic. And then others.

Meanwhile, linguists are still researching the semantic structure of natural language quantifiers. John Barwise discovered thirty years ago that some sentences appear to have a semantics best indicated with parallel sequences of quantifiers. Isn’t it a good idea in safety-critical system specification to stick to language structures we know all about and can handle, for example Fregean scoping, rather than possibly to have to confront all these semantic subtleties unwittingly? How does one check a natural-language requirement for consistency if the linguistic jury is still out on quite what it could mean? Yet some safety-critical system development standards, for example those for European railway electronics, still require natural language requirements specifications.

Another consideration. One of the main assurance issues in design is that of comparing two specifications and showing that the one implements the other. Indeed, this assurance issue arises at every stage of design all the way down to the box in your hand except for one. The work of three out of the four most significant logicians in the history of humankind (Frege, Gödel, Tarski) has shown unequivocally in the last one hundred and some years that the most effective methods for comparing two complex, rigorous statements (which is what specifications are) is through using the formal languages of logic.

A claim that, say, the language of first-order logic is not expressive enough for formulating all the specifications that might be necessary for any engineered system just seems to me to be ludicrous. You can say anything you want in first-order logic, as the great philosopher W. V. O. Quine pointed out regularly for fifty years until his death in 2000. You can’t necessary say it in the slickest possible way – although you can sometimes, despite the plethora of formal languages we have around us nowadays – but you can do it, and then check it rigorously for consistency.

Indeed, we tried this recently. Bernd Sieker derived a system for mimicking the required communications for train dispatching on German railways (these requirements are part of German administrative law) and using formal refinement with hand proofs was able to derive SPARK code for the mimick system from an demonstrably complete set of high-level safety requirements, which were derived and written in first-order logic. It works, just like Quine said. Maybe not always, but a lot more often than people in system engineering seem to think.

Issue 5. “The current formal languages proposed for various system purposes are inadequate.”

Well, sure. Or equally not. Almost any tool, such as a formal language, has areas for which it works well and areas for which it doesn’t. I am not sure how much sense blanket approbation or condemnation makes. And, of course, we can do better. No matter how good or bad we are in any area of human endeavor, we can do better.

For example, SPARK seems to work very well for control of coding at a level corresponding to Ada source code. It is not targeted, for example, at requirements elicitation and I can’t think it would work so well there. Indeed, its producer itself uses other methods for requirements elicitation and analysis.

Z works well for analysis of architecturally-complex systems in which scoping is complex and tricky. There are many of those. I don’t think it works so well for specification and analysis where operator interface plays a significant role.

TLA+ works well for analysis of algorithmically-complex systems in which scoping is tricky. There are many of those also. Same caveat on operator interfaces

SCADE works well (at least, I presume it does, for I have no experience. We can’t run it on any of our machines) for systems which predominantly have state-machine-like functionality (i.e. things Lustre can talk easily about). It might do better on operator interfaces than TLA+ or Z, since many of those have state-machine character even if their designers have no clue what a state machine is.

There are a lot of issues which these formal languages and tools try to solve simultaneously. Having unambiguous meaning; handling complex scoping questions precisely and well; being expressive, especially for those purposes for which they were devised (which might conflict with expressiveness for other purposes); being automatically checkable for consistency; enabling formal manipulation for checking statements for certain properties (safety and liveness, for example); and all of this as easily as possible. When one thinks about all this, it is quite understandable why there is no one simple, best solution to all these varying needs. And, when one thinks about it more, maybe there never will be.

Issue 6. “We need a requirements specification language which is easily understandable by domain experts and systems engineers alike.”

It sure would be nice. And probably more efficient. But I am not sure the issue is language. More than “a language”, I think it is important that all kinds of engineers understand the techniques involved. I mean, French is the language of diplomacy but that doesn’t mean everyone can write it like Moliere or Yazmina Reza. I certainly can’t. But it would be good to have some idea of how they do it and to try to copy it, if I were communicating to folks in that language.

In the aftermath of the Überlingen mid-air collision, I looked hard at the algorithms behind TCAS, including the human procedures (algorithms for humans). I needed to introduce some simple state-machines-evolving-through-time techniques to point out to TCAS domain experts that the cognitive architecture of TCAS had problems. These cognitive-architecture problems are present with all versions of TCAS which emit Resolution Advisories and they were not discovered by previous analysts, some of whom are quite distinguished practioners. I did have the benefit of an actual accident to work from in which the conditions were present for the problems to manifest themselves. The point here is that the domain experts needed to understand a formal means of expressing the problem – the techniques, not the language – before they could acknowledge that a problem was there.

Issue 7: “The techniques we use should be usable by the average engineer with a Bachelor’s or Master’s degree.”

I want the brakes on my car to work (at least, I would if I had one). I really don’t care if they were designed by 20 engineers with Bachelor’s degrees or one genius who had to be hired away from Harvard. And neither will the company if they don’t work, something bad happens, and it gets slapped with the bill. Ask a certain automobile components supplier to whom this happened a few years ago. And this company is stocked to the gills with people with doctorates in computer science, engineering and physics.

Boeing and Airbus got where they are by identifying and using exceptional talent. Airbus develops its SW using tools that have come from an institution, one of whose scientists has won the Turing Award in the area this tool exploits.

It would be great to leverage the work of the average engineer by exploiting tools heshe can use with only a Bachelor’s degree. But it’s not a necessity. If my brake company cannot hire away the Harvard genius, I am quite content for them to stick with hydraulics as long as the brakes will work.

Issue 8: “A sample of students makes more mistakes in specification when trying to do so in a formal language with rigorous semantics than the students make when specifying more “naturally”.”

Maybe, but so what? One could imagine that James Watt and Richard Trevithick would probably have made more mistakes if they’d had to use Sadi Carnot’s thermodynamics to design their steam engines, because there is no evidence that either of them were much good at mathematics.

As for me, I usually discover a lot of issues I had overlooked when I actually do sit down to formulate ideas formally. Martyn Thomas, a founder of the system and tool builders Praxis HIS, says this matches his experience also, as also the continuing experience of Praxis.

It takes many people a dickens of a time formulating their bicycle braking distances in calculus, and most of the ones I have suggested do it (all of them computer science students) turned out to do it better, and more accurately, if I asked them to assess braking distances qualitatively. But I’m not like that. I know how to check my calculus for mistakes and I use it to assess the braking performance of my bicycles and the phenomena involved, rather than the other way round. If there is any issue here, it is that of training the skill in others who apparently don’t have it.

Similarly, I see the main issue that of training the skill set of people who write specifications, not that of adapting the specification languages to their existing, rather fragile skill set.



AF447: Issues Clarified by the BEA Report

4 08 2009

There are some significant issues which are clarified by the BEA’s preliminary factual report, issued at the beginning of July: specifically the uncertainties and certainties in the meaning and partial interpretation of the maintenance messages received by ACARS; the question of structural integrity; the attitude and flight path of the aircraft on impact with the ocean surface; and the weather phenomena in the vicinity of the flight at the time it was presumed to be lost. The ACARS messages indicate strongly that there was a situation with unreliable airspeed indication. Since the accident more incidents of unreliable airspeed indications at high altitude have come to light. I comment on these continuing developments in a separate post. I comment here on structural integrity and what it tells us about how the airplane may have behaved; weather and position; contacts with ATC; and the interpretation of the ACARS maintenance messages received.

The vertical tail of the aircraft was the major piece of structure found during the search. It had separated, taking some parts of some fuselage, including box-section pieces, with it at one attachment. The question arose whether it could have separated in flight. Our collaborator, the aerodynamicist Clive Leyman, showed in work in June that it would not be possible above say FL 170- FL 200 to generate enough dynamic pressure on the vertical stabiliser of the A330 to cause it to fail. And even at that general altitude, overspeed would be a necessary contributing factor to any failure. His conclusion was based on dynamic-pressure calculations, based on the datum that the A330 vertical stabiliser failed during destructive testing at 2.0 times design load. The aircraft was cleared at FL 350. So we knew here from Clive’s work that an upset would have been a necessary precursor to loss of structural integrity. The main question thus is: what would have caused an upset?

Indeed, the BEA determined from inspection of the retrieved wreckage – over 600 individual pieces – that the aircraft hit the water intact, in more or less level attitude with a high vertical rate of descent. This does not conform with the flight path of an aircraft under full control. It suggests, indeed, that the aircraft was aerodynamically stalled when it hit the ocean surface.

The BEA determined, from the loading of the aircraft on takeoff and the estimated fuel burn over the flight profile that the aircraft had an estimated weight of about 205 tonnes and CG between 37.3% and 37.8% MAC at around the time of disappearance. The half-percentage variation in CG estimate comes from the fact that fuel is pumped around between fuel tanks at cruise, to optimsie the lift-to-drag ratio of the aircraft, and there is a limit of 0.5% MAC on the CG shift allowed to occur through pumping. There has been some speculation on the Internet about the margins between stall speed and limiting Mach number at FL350 and weight of 205 tonnes. The margin is some 80-100 kts; this is large enough to allow the pilots considerable leeway in dealing with any in-flight abnormalities, such as having to fly the airplane on “pitch and power” when airspeed indications are unreliable. However, severe or extreme turbulence could make dealing with abnormalities such as unreliable airspeed a very tricky control situation indeed, at any moderately high flight level. It is plausible that an upset could thereby have occurred. The BEA report is factual and does not speculate on this.

There was considerable convective activity in the ITCZ at the time of AF447s passage. The weather, though, was pretty typical for the time of year, and had no unusual features from the point of view of meteorologists. There was a convective mass extending about 400km E-W, which the route of flight of AF447 crossed. This convective mass had formed at about 0130Z by the fusion of four powerful storm masses, deriving from convective columns (“towers” in French) , which had reached their limit and spread out horizontally as their tops reached the tropopause. The strongest of these had attained its most powerful stage many hours before. At 0200Z, the cumulonimbus clouds forming the mass had for the most part attained their mature stage. Although there may have been new columns forming between the mature columns underneath the top of the spreaded mass, there is no evidence for that in the form of a later “overshoot” into the stratosphere, which happens in the case of the most powerful columns. The temperature at the tops of the mass was by and large similar to that of the tropopause, around -80°C, as recorded by satellite 7 minutes before and after the presumed passage of AF447. The tropopause was estimated by the climate model ARPEGE to be at around FL520 at the date and time of the disappearance of the aircraft. Another aircraft participating in real-time weather data collection via AMDAR passed along the route half an hour later at FL325, then climbing to FL350 and did not record anything unusual, confirming largely what one may infer from the satellite images.

The BEA says it is “very likely” that some of the cloud mass contained significant turbulence at FL 350. Electrical activity was also “possible” at this FL. But, crucially for those wondering whether the pitots iced up because the aircraft may have flown into heavy supercooled-rain clouds, the presence of supercooled water was said to be “not very likely” and would necessarily have been limited to very small quantities. I consider the developments with possible pitot icing in a separate article.

The last known position of AF447 was transmitted automatically over ACARS at 0210Z. This position was N2°58.800′W30°35.400′, or N2.98°W30.59° in decimal degrees. The position transmitted was that contained in the “Flight Management” data, which is partly based on the inertial reference system. It could be, said the BEA, that the GPS position differed slightly from this.

This position puts the flight in or close to the column of what had been the most powerful of the fused storms, whose column had attained its most powerful stage some many hours before and was at the time in its mature stage. The position is between ORARO and TASIL waypoints and looks to be slightly off the airway.

The last verbal contact with AF447 was by the controller of FIR ATLANTICO, in Brazil, at 0135:43Z. The controller then asked AF447 four times for his estimate at TASIL, without response. There were apparently three attempts at an ADS-C connection with DAKAR, at 0133Z, 0135Z and 0201Z. These failed with code FAK4, indicating either the absence of a flight plan, or a significant discrepancy between flight number, reported position, and planned position. Section 1.9.2 says that at 0146 the DAKAR controller asked for information about AF447 because there was no flight plan. ATLANTICO gave type (A332), airport of origin, destination airport, and SELCAL sign. DAKAR created and activated a flight plan, but there was no connection with the aircraft either on voice or ADS-C. So the first two ADS-C attempts were rejected because of, we may presume, lack of a flight plan with DAKAR at those time. The report does not determine whether the flight plan at DAKAR was activated before or after the last ADS-C connection attempt at 0201Z. Although the transcript of the exchange between ATLANTICO and DAKAR at 0135Z is included in the appendices, the later exchange is not.

As I mentioned in my note of 11 June, the order of the ACARS messages received does not necessarily reflect their order of occurrence. The reasons why are largely the reasons I gave there, with one addition. Fault messages received by the CMC are cached but not sent for a minute, to accumulate and summarise in one ACARS transmission other messages associated with that fault from other avionics devices. These associated messages are indicated by including the reporting device in the fault message compiled by the CMC (using a * for associated messages of type 2, which are not reported to crew because they have no “operational consequences”). There is prioritisation within the CMC, as well as possible race conditions from various BITE devices to the CMC, as well as prioritisation of transmission: the report explains how ACARS messages are prioritised by class. And, of course, possible delays in the transmission and processing of messages through the ACARS transmission system itself.

The interpretation of the messages is, as the BEA says, “delicate”. This is not just because of the indeterminacy of order, but also because, while a fault may be recorded, a subsequent return to normal is not reported; certain alarms such as overspeed are not registered; and although all faults (type 1) are accompanied by a cockpit effect (type 2), not all faults have their cockpit effect registered, and not all cockpit effects have the associated fault registered.

Of the type 2 effects, the BEA says it has not succeeded in explaining the meaning of the cockpit effect NAV TCAS FAULT (cockpit effect is a flag on the PFD and ND) but has explained the significance of the others.
There are five type 1 fault messages, of which the significance of two are unexplained:

the ADIRU2 fault (IR2), associated with messages from EFCS1, IR1 and IR3. The involvement of EFCS1 is a type 2 message, and it is suggested that the correlation window may have been opened by this message;
The FMGEC1 message that was the last received before the cabin pressure warning.

The BEA concludes that the type 1 and 2 messages taken together show that there had been unreliable airspeed measurements and their consequences.

That is it. Not a whole lot more than we knew in mid-June, but some of it more firmly established, especially the interpretation of the weather and the integrity of the airframe.