Don Hudson and PBL on the ITU’s proposal for real-time flight data transmission

10 04 2014

The International Telecommunications Union has been conducting its four-yearly meeting. Its president has apparently promised everyone to make possible the real-time transmission of flight data from commercial transport aircraft in flight. This has been supported by the Malaysian delegate. All according to this news report: MH370: ITU Commits to Integrate Flight Data Recorders with Big Data and Cloud, writes Vineeta Shetty from Dubai

Captain Don Hudson is a valued colleague for well over a decade. He has some 20,000 or so flying hours, has flown a variety of transports, including Lockheed L1011, Airbus A320 and varieties of A330 and A340 machines and is an active professional pilot although formally retired from scheduled airline flying. While with Air Canada he contributed significantly to the development of the airline’s safety-management and flight-quality systems while he was a captain on intercontinental flights.

Don points out, as have others, that the technology exists to do what the ITU proposes. However, he finds the proposal problematic, as do I, and economically and from a safety point of view barely justified, if at all.

It almost goes without saying that people expert in the standardisation of telecommunications are not necessarily expert with the human and organisational factors involved with aviation safety programs. Don is expert. I recommend that ITU delegates read what he has to say below. Some comments of mine follow.

Captain Don Hudson: Response to the ITU’s proposal for real-time flight data transmission

Some important issues have not been addressed in the ITU’s suggestions.

Aside from any commercial priorities and processing/storage/retrieval issues regarding DFDR and CVR data, a number of important issues are not addressed in this announcement.

I suspect that each individual country would “own” their carriers’ data. Given the difficulties with establishing even a “local” distributed archive of flight data within one country, even purely for safety purposes and with limited access, I doubt that such a flight-data archive will be hosted by a world body anytime soon. Within such a proposed arrangement lie a number of serious impediments to international negotiation, not the least of which is the potential for legal prosecution of one country’s flight crews by other countries. Such data could be a legal goldmine for various interested parties and that is not what flight data recorders and CVRs are for. Their purpose is expressly for flight safety.

I submit that such a suggestion to stream flight (and CVR?!) data would be an initiative to solve the wrong problem – the one of disappearance, of which there have been only two very recent cases among millions of flights and billions of passengers, all carried under a safety record that is enviable by any standards.

The main problem that many are trying to come to grips with is certainly real. We needed to know what occurred to AF447 and the results of that knowledge have materially changed the industry in many, many ways. We need to know what happened on board, and to, MH370.

What makes more sense, in place of a wholesale sending of hour-by-hour flight data from every flight at all times, is a monitoring function something along the lines of the work Flight Data Monitoring people perform on a regular, routine basis, but do so on-board the aircraft, using a set of sub-routines which examine changes in aircraft and aircraft system behaviours to assess risk on a real-time basis.

Flight data analysts look for differences-from-normal – a rapid change in steady states or a gradual change towards specified boundaries, without return to normal within expected durations. It is possible to define a set of normal behaviours from which such differences can trigger a higher rate of capture of key parameters. This approach is already in use on the B787 for specific system failures. A satellite transmission streams the data set until either the differences no longer exist or end-of-flight.

Flight data quantity from aircraft such as the B777 is in the neighbourhood of 3 – 4 MB per flight hour. Most events prior to an accident and relevant to that accident are moments to minutes in duration.

The two industry events which have triggered the ITU interest were a rapid departure from cruise altitude & cruise Mach (AF447) and MH370, with which a sudden change in aircraft routing concurred with a loss in normal air-ground routine transmission by automated equipment, (transponder, ADS-B). Both these events lasted moments and would be events that would initiate data-capture and transmission in my proposed scenario. In the MH370 case the transmission would remain active until end-of-flight. If the AF447 aircraft had been recovered from the stall and had made destination, the data would still be in place off-aircraft. Loss of satellite signal occurred on AF447 but the problem would not have prevented an initial data stream (See the BEA Interim Report No. 2 on AF447, p39).

The flight phases defined as “critical” by AF447 and MH370 are from the top-of-climb to the top-of-descent phases, in other words, the cruise phase. From the takeoff phase through the climb to cruise altitude and the descent/approach and landing, no such need for this kind of system exists, because any accident site is going to be within about 150nm or about one-half-hour’s flight time from departure and arrival.

An “on-condition” data transmission would be more practical and cheaper than full-time transmission of flight data, which would bring the notions expressed by many regarding these issues a bit closer to implementation.

Besides flight data, there is cockpit voice recording (CVR). The data issues with CVR transmission require a parallel and separate examination.

Don Hudson

[End of Hudson's Essay]

PBL Comments on Hudson

Concerning the ITU’s statement, I fail to see what either big-data analysis or cloud computing have to do with real-time data transmission from aircraft in flight. I suspect an infection by buzzword.

The ITU is suspected by many people concerned with the Internet, and certainly by most in the IETF, to be a political body more interested in control than in enabling technology. This current announcement does seem opportunistic, and, as one can see by connection to “cloud” and “big data”, some of its senior officers apparently don’t really understand the technology which they want to regulate.

There are further questions which Don does not address above.

Who is going to pay for it, and will the expense be justified? Likely will the satellite companies gleefully support the proposal (see the article’s comment on Inmarsat), given that through it they would be rolling in cash. But where shall that cash come from in a boom-and-bust industry such as airlines? Likely from the passengers through an increase in fares. So one would be asking passengers to pay for a service on their flight for which the rest of the world would only benefit if the flight crashed and said passengers died. That seems to be stretching everyday altruism to its limits.

As Don points out, such a proposal would have helped in precisely two cases in five years. But Air France was found, at a cost according to this Newy York Times article of €115m, and it currently looks as if MH 370 will also be found, so someone will be able to estimate the cost of that. A cost-benefit analysis (CBA) is thus possible, though I guess there would be lengthy argument over the components of the calculations. With a CBA a decision on implementation would come down to attempting to reduce, respectively to raise, the cash paid to satellite companies, and that seems to me to be a commercial issue and not one which governments would or should care to resolve in the absence of demonstrated need. I doubt governments would want to end up paying out of taxes. Surely, any individual government would prefer to put such resources into improving its surveillance capability, and expect to use those covertly in the very rare cases such as MH 370?

To the main point. How would real-time data transmission kit have helped in the search for MH 370?

Likely it would not have helped at all, if the current hypothesis of deliberate human action is validated. Such a system, like the transponder and ADS-B, can be turned off. For reasons of basic electrical safety, a discipline established over 100 years ago, you have to be able to turn off any given electrical circuit in flight. You can make it easier or harder, but no certifiable design can allow that it be prevented. Thus no such system is resilient against intentional human action.


Hijacking a Boeing 777 Electronically

17 03 2014

John Downer pointed me to an article in the Sunday Express, which appears to be one of their most-read: World’s first cyber hijack: was missing Malaysia Airlines plane hacked with mobile phone? by James Fielding and Stuart Winter.

The answer is no. To see why, read on.

The authors interviewed a Dr. Sally Leivesley, who is said to run “her own company training businesses and governments to counter terrorist attacks” and is “a former Home Office scientific adviser“.

…Dr Sally Leivesley said last night: “It might well be the world’s first cyber hijack.”

Dr Leivesley, a former Home Office scientific adviser, said the hackers could change the plane’s speed, altitude and direction by sending radio signals to its flight management system. It could then be landed or made to crash by remote control.

Apparently Ms. Leivesley thinks one can can hijack the Flight Management System on a Boeing 777 with a mobile phone.

First point. Suppose you could. When the pilots noted that the aircraft was not doing what they wanted, they would turn off the Flight Management System. Problem solved. It’s nonsense to suggest that the aircraft “could then be landed or made to crash by remote control“.

One needs to make the distinction, clear to people who know anything about aviation systems, between the Flight Management System (FMS) and the Flight Control System (FCS). If you could take over the Flight Control System you would be able to make the airplane land or crash. The Boeing 777 is an aircraft with a computer-controlled FCS, so it is reasonable to ask whether it is vulnerable. Indeed, I did, on a private list which includes the designers of the AIMS critical-data bus (a Honeywell SAFEbus, standardised as ARINC 659; ARINC is an organisation which standardises aviation electronic communication technologies). The FMS and other systems on the Boeing 777 such as the Electronic Flight Instrument System (EFIS) and the Engine Indicating and Crew Alerting System (EICAS) use the AIMS bus to transfer critical data and control commands (from the LRMs, the computers processing the data on the AIMS bus) between components. A good generally-accessible reference is Digital Avionics Systems, Second Edition by Cary R. Spitzer, McGraw-Hill, 1993.

Second point. Both FMS and FCS are electronically well-shielded. They have to be – it is part of their certification requirements. They are not vulnerable to picking up signals from a device such as a mobile phone. In fact, there are laboratories where you can park an airplane in the middle of banks of very powerful electromagnetic radiators and irradiate it, to see whether your electronics are shielded. These installations are mostly used for military aircraft, against which aggressors might use such powerful radiators, but they are used by civilian aircraft too.

Third point. Any communication requires a receiver. If you want an electronic system S to pick up and process signals from a radiating device such as a mobile phone, there has to be a receptive device attached to S. So anyone wanting to put spoof data including control commands on a bus must do so through such a receptive device. Either there is one there already (such as in one of the components sharing the bus) or someone has to insert one (a physical tap). And it has to “speak” the communications protocol used by, in this proposed case, an intruding mobile phone. As far as I know, none of the usual components sharing the critical buses on the Boeing 777 has a mobile-phone-communications receiver/transmitter, but some airlines do attach such devices to their systems so that they can download data from the Quick Access Recorder (QAR, a flight-data recording device, such as a “black box” but with fewer parameters and not a critical component) at destination airports after a flight, for quality management purposes. As far as I know, a QAR is a passive device that is not able itself to put data on the bus, so hacking the QAR through its transmitter wouldn’t give you access.

Fourth Point. Could you pre-install a receiving device somewhere on the buses, to allow someone in flight to communicate with the bus, to perform what is called a “man in the middle” (MitM) attack? An MitM spoofs one or more components transferring data on the bus. Well, theoretically someone on the ground with maintenance-type access could install such a component, but let’s ask what it then can do. The AIMS bus carries pure data between components who know what the data means. The bus is clocked and slotted, which means that data is transferred between components according to a global schedule in specific time slots; the bus time is given by a clock, of which values all bus users have to be aware. Components communicate according to certain “timing constraints”, that is, slot positions, which constraints/positions are only “known” to the components themselves. So to spoof any component you need to reverse-engineer the source code which drives that component to find out what its specific “timing constraints” are. So you need the source code.

Not only that, but the AIMS bus, for example, runs at 30 Mb/s (million bits per second), and there is a 2-bit gap between words on the bus; that is, one fifteen-millionth of a second. It is questionable whether the available mobile-phone protocols are fast enough. The fastest protocol in the CDMA2000 family is EV-DO Rev. B at 15.67 Mb/s, so no; there is no time to perform computations to determine a “gap” and start putting your bits on the bus. (Basic CDMA2000 is 144kb/s, extendable to 307kb/s; two orders of magnitude slower.) The HPSA+ protocol, from another mobile-phone protocol family, gets 28Mbp/s upstream and 22 Mb/s downstream, so has half a chance of being compatible. But, really, to synchronise with something running at 30Mb/s and a 2-bit sync gap you’d need something running at more like double that speed, I should think. The wireless computer-network protocols IEEE 802.11g or 802.11n could do it in their faster versions. You’d need a device (and receiver) speaking these (why would it be a mobile phone?)

To visualise what has to happen, imagine trying to merge with traffic travelling on a densely-packed motorway at rush hour – on your bicycle. Even if you are going fast, you’re not likely to be successful.

Then you need to figure out, not only the timing constraints, but what data you want to put on the bus. Again, to know this, you need to reverse-engineer the source code for the components you want to spoof. Indeed, to put the spoof data on the bus consistent with the timing constraints for the component you are spoofing, it would probably be easier to have stolen the SW and use that to satisfy the very narrow timing constraints. All millions of lines of it.

Rather than tap in to the bus for a MitM attack, it would seem more reasonable to install a receiver/transmitter surrepticiously on one of the LRMs (“line replaceable modules”, the processors in the AIMS cabinet) and swap it in.

Summary: to achieve a MitM attack on the critical buses of the Boeing 777, you would need in advance to modify physically some of the hardware processors which run the bus so that they have a transmitter/receiver for WiFi signals (of the more powerful 802.11 standards family), and someone has to install such a modified LRM on the target aircraft beforehand. Then, the SW for the various components which the attacker intends to spoof must be obtained and reverse-engineered to obtain the timing constraints and the data-interchange formats, many million lines of code in all. That must all be installed on a portable device, probably not a mobile phone, which you then use in flight.

Dr. Leivesley refers to Hugo Teso’s demonstration in Amsterdam a year ago, in which he showed how to transmit messages from a mobile phone to a flight simulator running on a PC, and to modify FMS commands on the simulator via ACARS and ADS-B vulnerabilities. Neat demo, but it didn’t show you could take over a flight control system, as pointed out by many commentators. For one thing, the PC already has relevant communications devices (Bluetooth, WiFi, and one can insert a USB dongle for mobile-phone reception). Second, it’s just flight simulation software. Who knows what kinds of vulnerabilities it has and who cares? It is not the certified SW running on an actual airplane, or on the actual aircraft HW, and has not been developed, analysed and tested to anywhere near the same exacting standards used for flight control software (Design Assurance Level A, according to the RTCA DO-178B standard to which I believe the Boeing 777 flight control system was certified. Consider even the exacting and resource-intensive MCDC assessment required for DAL A. If you don’t know what MCDC is, let me recommend Kelly Hayhurst’s NASA tutorial. Nobody performs MCDC assessment on PC-based flight simulation SW).

To summarise, the demo was a little like hacking an X-box F1 road race game. It doesn’t mean you can take over Sebastian Vettel’s car in the middle of a real race.

Unlike the authors of the newspaper article, I have put most of these thoughts past a group of experts, including the designers of the SAFEbus. I think Philippe Domogala, John Downer, Edward Downham, Kevin Driscoll, Ken Hoyme, Don Hudson, Michael Paulitsch, Mark Rogers, Bernd Sieker, and Steve Tockey for a very enlightening discussion.

There has been some thought about whether it is feasible for an interception aircraft with transmission capability to fly formation with MH 370, so that there would only be one blip on primary radar, and accomplish such an electronic takeover. The considerations above would still apply, as far as I can see – you still need to modify the physical HW in advance on the target airplane to allow electronic intrusion from outside.


We Had An Accident

2 03 2010

On Friday evening, 26th February, we suffered an accident. In what is known as the Swiss Cheese Model, stemming in all but name from Jim Reason, all the holes lined up.

Pictures of the Swiss Cheese Model abound, for example here and here and
here. The idea of the Swiss Cheese Model is that there are multiple «layers of protection» against unwanted events (the slices of cheese), each with weaknesses (the holes in the slice), and so for an unwanted event to occur (a straight trajectory through the slices), all the weaknesses must be exploited simultaneously (the holes line up). It is a compelling image, some quarter century old now, cited often by people in on-line forums such as PPRuNe discussing unwanted events such as airplane accidents.

We do Why-Because Analysis, of course. We also get pretty pictures (Why-Because Graphs, WBGs), but the pictures in this case are not all the same. Indeed, we have observed the odd phenomenon that you can tell what accident is being shown, just by looking at the shape of the WBG and leaving out all of the factual information. We haven’t quite got a handle on this cognitive phenomenon yet, but it does suggest the truth of the old adage that, whatever similarities one may see, all accidents are causally unique. (This observation also indicates the limits of the Swiss Cheese Model as explanatory device, for all one can vary schematically in that are the number of slices.)

So what happened to us on Friday evening? Our mail server fell over. I have been running my own mail server and WWW server independent of faculty resources since 1996, and have thereby had much better service than my colleagues for most of that time (the administration was performed for almost a decade by Marcel Holtmann, for many years the maintainer of the Linux Bluetooth suite, BlueZ, whose Diploma thesis contained a thorough analysis of the first successful Bluetooth exploits). Recently, the faculty services have become more dependable, due to personnel changes, and indeed my Sysadmin and AbnormalDistribution blogger Jan Sanders works for the services group. However, the personpower is very thinly spread. The informatics part of our faculty has grown from six research groups to seventeen and is also provided essential services for the CITEC Excellence Cluster and the CoR-Lab robotics institute, with still only two full-time-equivalent positions. Our backups alone are more than the entire rest of the Uni combined! So shouldn’t our faculty find the resources for an appropiate increase in personnel? Sure, but German faculty negotiations resemble the situation so well described by Lamport, Shostak and Pease almost thirty years ago in their canonical papers Reaching Agreement in the Presence of Faults and the more colorful The Byzantine Generals Problem. The third piece of work relevant to such negotiations is Garrett Hardin’s Tragedy of the Commons, passably described in this Wikipedia article.

So what happened with the mail server? It fell over Friday early evening, the chosen time for essential electronic services to croak. Normally Jan restarts it remotely and all is well – happens about once a year. Not this time. Saturday afternoon he removed the hardware, took it back to his office – and couldn’t even get a console to start. First hole in the cheese. There wasn’t any obvious sign of electronic or thermal damage when we looked at it, and both disks fired up quietly when they got juice, and they are RAID-configured, so the data would be there.

We decided over the weekend to migrate the service to the faculty machines. We assumed that messages over the weekend would be stored on the university computer center machines for forwarding when our mail domain name awoke again, because that’s what the MX records say. And the mails I had already received were stored not just on the server, under IMAP, but on my laptop, as well as on the external USB disk I have configured to be a Time Machine backup server. But I haven’t backed up in a month or so, because we had the builders in at home and I stored it away from all the fine dust (some vain hope!). Second hole in the cheese. I use email as a form of notepad, so losing the last month’s worth of notes as well as messages did not appeal to me.

Bernd Sieker came in specially, Monday morning early, to help rescue the data. He and Jan figured how to get the machine to boot. One of the disks had bad blocks, but the other seemed OK, and it is RAID, so – wait a minute, it’s configured for RAID but there isn’t RAID running on it (we take it somebody ran out of patience and didn’t finish the job). Third hole in the cheese. All the data is on the bad disk.

No matter, I have the mails on my laptop and can copy them over to my IMAP account on the faculty machine when it’s up. Jan configures it; I give the relevant server data to my mail client, and we call up the friendly university administrator to change the MX record for our mail server in the local DNS database. Then I call my uni mail box and see, not the thousands of message in my Inbox I have since July 2009 (the last time I archived it), but one new message. Where have they gone? I look in the local mail storage files on my laptop and see – they are just gone! Just like that! No dialog box «do you really want to do this?» or anything. Just gone. Fourth hole in the cheese. So much for trusting manufacturer SW that is supposed to be archiving and synchronising.

But not quite all the holes lined up. It seems as if the client mail storage on our original mail server is mostly or maybe completely healthy. So we should be able to get those files copied across. And in any case I have everything up to about the last month on the Time Machine. Or so I think. By the time we are finished, it should be four or five person-days of work. Because we thought we had a passable set-up, but it turns out we didn’t. That is, I didn’t – everybody else did but me. And I am supposed to be expert. How embarrassing!

Gee, I hope all this new kit works the way Jan assures me it does……..