Some Figures from Industry on Use and Training of Formal Methods

22 09 2009

On 18 August I wrote an essay on eight themes in System Safety Engineering which addressed the use (or not) of so-called formal methods. On 28 August, Rod Chapman of Praxis HIS wrote a note to the University of York Safety-Critical Systems Mailing List which gave some figures for Praxis’s experience on a medium-large project in which they delivered requirements specifications and software performing to those specifications.

Praxis’s “Correct-by-Construction” (CbyC) approach to developing software aims to minimise errors in the development process at all stages. They use a requirements-engineering methodology called REVEAL, which produces requirements in the formal specification language Z. The requirements in Z are read by engineers who produce source code using the SPARK toolset. The SPARK language is an annotated operational programming language whose declarative part consists of assertions about how the executable part behaves, and whose executable part is an behaviorally-unambiguous subset of Ada. Various SPARK tools such as the Examiner and Prover are used to validate statements in the assertional part. Recent Praxis projects have also used the GNAT Pro Ada toolset from AdaCore. In short, Praxis HIS is a heavy user of formal methods.

I follow Praxis’s experiences from the sidelines, because they have quite simply the best record for quality of delivered code over the last decade of any company anywhere, and I believe that quality of code is a very important characteristic of safety-critical systems development. Praxis has regularly achieved quality levels for medium-size systems (between 50,000 and 500,000 lines of source code – LOC) of less that one error in 25,000 LOC, which is some 25 to 75 times better than what are reputed to be current “industry standards” of 1 to 3 errors per 1,000 LOC (or kilo-LOC, kLOC. For some, a kLOC is 1,024 LOC but here it makes no difference). I wish that Praxis’s policy of monitoring and publishing their achieved code quality levels would become industry standard.

Rod reports on a project whose (requirements) specification phase involved 14 people in the team, only 3 with significant experience with Z. 11 were trained to read and write Z specifications. Only 3 of the 14 had Ph.D.-level qualifications. For maintenance of the specification, 2 more team members with some Z skills were added. For the implementation team, the tasks were to read Z and derive from the specifications either SPARK or test cases. Three training courses were run. The first, teaching participants how to read and interpret Z, involved 31 Praxis staff, 10 client staff and 47 staff and consultants from the partner company (I take it the system integrator). The second, Introductory SPARK, had 20 Praxis, 12 client and 25 partner/consultant participants and the third, Advanced SPARK, 9 Praxis, 4 client and 3 partner/consultant participants.

Praxis draws the following conclusions:

[begin Praxis conclusions]

1) It seems we can teach customers and domain-experts to
read and write Z effectively. In particular, these
people have made a significant contribution to the specification
production and maintenance. The Z-writing learning curve is
non-trivial though – probably 3 months after initial training
before real Z-writing fluency is achieved. The Z-reading learning
curve is much shorter and shallower.

2) We have been able to take on a significant number of contractors
and train them to both read Z and produce SPARK.

[End Praxis conclusions]

This experience bears on various issues in my essay of 18 August, Eight Themes in System Safety Engineering. The first two issues, which raised the questions of where the “hard part” of the engineering lies, and where to concentrate the effort, is answered here: the main effort on this project went into the implementation of the specifications, not in the requirements specification itself. Issue 5 concerned whether current formal languages were adequate. Praxis uses Z and SPARK successfully, and they are driven by their data on code quality, so, whatever the pros and cons of these languages, they are at the forefront of the production of quality executable code.

Issues 6 and 7 concerned whether requirements specifications should be easily understandable by domain experts and system engineers alike, and whether the techniques are usable by “average engineers” with Bachelor’s or Master’s degrees. Praxis says, in their conclusions, essentially that Z satisfies this condition.

Praxis’s main business lies in delivering high-quality software systems for critical applications. They do so through use of Z and the SPARK Toolset, both of which count in my book as formal methods. They also demonstrate that they produce higher-quality code for such systems than others are able to show that they do.

It is a mystery to me why any debate about the use of formal methods in developing critical systems has not been long settled through such examples. (And I do emphasise that Praxis’s methodology is just one example.) There are two conclusions I would wish to draw. First, methods that enable rigor in the development of systems really do that, and do so, as Praxis’s example shows, at little if any cost. And, second, rigor is important in assuring high quality of software systems under development.

Sort-of-disclaimer: I am not necessarily a fan of Z (too much special notation for my taste) although I am a fan of SPARK (used successfully by my Ph.D. advisee Bernd Sieker in his thesis for a system development in which the source code is proven to fulfil a demonstrably complete set of safety requirements).



Screwy Reasoning and Its Study

7 09 2009

Those of us interested in commercial aviation accidents have to deal with a lot of what I shall call screwy reasoning.

Last week, I read a September 2 article in The Times on the crash of AF447 and its aftermath which I felt was somewhat screwy. It suggested that Air France’s attempt to introduce specialised training for the scenario of loss of airspeed data in cruise on A330/340 machines had “provoked anger”. The angered party was unspecified, but the initial paragraph had a “pilot’s union” “accusing” AF of trying to cover up the cause of the AF447 crash.

At the end of the article, the author, Charles Bremner, says that simulator training for “speed problems” at cruising altitude had not previously been offered by Air France, but this is now being done “at the request of all airlines’ unions.”

Assuming the pilots’ union was the party to have been angered, there is an obvious contradiction between a union requesting training, and being provoked to anger by (apparently) the same. So, if not the pilots’ union, it would be good to know who it was who is being angered by Air France’s specialised training.

Not that Air France has much choice in the matter. European Aviation Safety Agency (EASA) Safety Information Bulletin 2009-17, issued on 9 June,2009, which may be found by searching EASA’s Safety Information Bulletin list, requires inter alia that “familiarisation of flight crews with unreliable airspeed indication procedures should be provided through adequate training.” It is hard to imagine how anger can be provoked through airlines following a regulatory safety directive.

The Times article began by suggesting the the Air France pilots’ union had accused “accident investigators”, by which I take it is meant the BEA-convened AF 447 investigation committee, of “covering up the cause” of the crash of AF447. I doubt this is correct, whatever the union is annoyed about. One can only cover up a cause if one knows what the cause is, and I doubt that any professional thinks the BEA or AF or anyone else knows.

Bremner has written quite a lot on AF447. He is a pilot (a private pilot I take it) and he does have useful information on the negotiations between stakeholders (pilots’ unions, airline, government, manufacturer, investigating agency, maybe even passengers….) to share. However, he seems to do so in a selective way which I often find unhelpful, as in this case, and his phraseology is sometimes unfortunate, also as here.

I wrote some comments on the article and the discussion, which appear in the Commentary section under the article. Colleagues who look at the aviation discussion forums have been amazed at how often the 1988 Habsheim accident comes up whenever anything Airbusy surfaces, and so it does in Commentary to this article. But, in these contributions, it is no longer Habsheim. It is the “Paris Air Show”. One wonders what, if anything, is going on in these people’s heads.

Well, there are partial answers. Sociologists have been studying people’s actual reasoning behavior in detail, but I was always missing a link into the literature. Now I have one, courtesy of the New York Times’s TierneyLab blog, which discusses the issue of who was first to the North Pole. One possible answer: Roald Amundsen in an airship in 1926. On the ground – well, ice – it might well have been Ralph Plaisted in 1968 on snowmobile, and Wally Herbert a year later by dogsled.

Tierney points to research on belief formation and retention, by Steve G. Hoffmann and colleagues, who used the Saddam-9/11 non-link to conduct a survey and then interview some responders, and wrote an article about it in the journal Sociological Inquiry, published in 2009. A number of belief formation and retention mechanisms have been proposed in the literature, and Hoffmann et al consider them. I find the paper very readable and it contains a number of references to related work. So this is a way in to studies of reasoning phenomena, for those of us who might be curious about it.