Assurance of Cyber-Physical Systems

17 11 2011

I attended Seminar 11441 on Science and Engineering of Cyber-Physical Systems at the Leibniz Centre for Informatics at Schloss Dagstuhl in the Saarland on 1-4 November, 2011. It was organised by Holger Giese, Bernhard Rumpe, Bernhard Schätz and Janos Sztipanovits. There is huge interest in cyber-physical systems in the US at the moment, backed by plenty of research resources, and in Germany also, although on a lesser scale, somewhat more industrially-oriented and mostly concentrated in the South, it appears.

I attached myself to the subgroup concerned with the assurance and certification of such systems.

We all seemed to have a whale of a time figuring out what a cyber-physical system (CPS) is. Tom Maibaum and others wondered how they might differ from embedded systems. People said, well, it is important that there are lots of subsystems interacting more loosely than with a hierarchically-developed complex embedded system. So John Fitzgerald wondered whether they were mostly systems of systems. (Actually, the “so” is causally misplaced. John, being an “F”, had his one-minute say before Tom, being an “M”). Social systems of mostly artificial agents, of which many examples were given, seemed to fit the “cyber-physical” conception, so CPS includes at least those. Platooning road and rail vehicles, swarms of robotic aircraft or ground robots, coordinated flying or other motion, coordinated searching tasks, and so on. There are enough examples to point and say “that’s what we mean!”.

I also learnt, once again (strange how short one’s memory can be!) to avoid uttering the phrase “emergent behavior”, at the risk of inciting a riot, or at least the closest one can come to a riot at a Dagstuhl seminar.

So what about assurance of such systems? Sadly, as I was on my way back, having had a beautiful bike ride back over the Hunsrück to Trier and caught the train, there occurred a horrendous road accident in Britain on the M5. You can read commentary about it on the York safety-critical systems mailing list. Go to The 2011 collection, sort by date, read the contributions on Sunday 6 November through Tuesday 8 November including “M5 Road Accident” in the title, or go to Paul Cleary’s initiating query and follow the thread(s) through (there are two slightly different titles, but the thread-following links persist through). I also had some private correspondence with Gérard Le Lann, who now works on road-vehicle platooning algorithms and associated questions.

As a result of the Dagstuhl discussions, and the e-mail discussions of the accident, I was able more concretely to formulate what I think is a new assurance problem which arises with (this conception of) cyber-physical systems. It is a little too long for a blog post, so I wrote it in a note called The Assurance of Cyber-Physical Systems: Auffahr Accidents and Rational Cognitive Model Checking and put it on the RVS WWW site Publications page.



The Definition of Risk – Yet Again

16 11 2011

In a message to the York Safety-Critical Systems Mailing List, Tracy White recounted a discussion with someone from the field of “Risk Management” who was taking a course he was giving on system safety. There is apparently a series of international standards, designated ISO 31000, on “Risk Management” (so says Wikipedia ). Tracy says

The term ‘risk’ in 31000 is described as the ‘effect of uncertainty on objectives’ where one of the ‘effects’ can be ‘a deviation from the expected’ (4360 describes it more succinctly as: ‘a chance of something happening’). These ‘risk’ definitions differ markedly from…..

…the standard definition which has been around for 300 years and 10 months: Abraham de Moivre, De Mensura Sortis, or On the Measurement of Chance, Phil. Trans. Roy. Soc No. 329, January, February, March 1711, reprinted with a commentary by O. Hald in International Statistical Review 52(3):229-262, 1984, which may be retrieved from JSTOR. The definition given there is, in modern terms, that risk is the expected value of loss. “Expected value” is a technical term from probability. I give the word-for-word de Moivre definition below.

This definition is also that used for “risk” in finance. See Peter L. Bernstein, Against the Gods: The Remarkable Story of Risk, John Wiley & Sons, 1996/1998. Which book, as the publisher proudly proclaims on the cover, was a “Business Week, New York Times Business, and USA Today Bestseller” and includes praise from reviews by Galbraith, Heilbronner, the NYT, the WSJ and The Economist on its cover. (Indeed, Bernstein is where I got my original lead to Le Moivre).

The meaning of the term in system safety is always close to that of de Moivre, but usually avoids the explicit arithmetic of finance, expected value of loss, by saying “combination of” likelihood and severity. There are good reasons for being somewhat vague, namely that in many cases in system safety the numbers are not there to enable a calculation of expected value. Especially, for example, in a completely new type of system. (An example I am currently working on is the recharging systems for electric road vehicles. There aren’t many around, so in particular there are no reliable numbers on frequencies of untoward things happening.) In response to this common situation, engineers have developed “qualitative” and “semi-quantitative” methods for assessing risk.

One of the issues then becomes what you take the word to mean in technical contexts. Any definition which is not equivalent to the expected value of loss defines a different concept from that, but the same word, “risk”, is used. For good reason: most definitions are conceptually related and the main issue is to get “close” while not having all the numbers.

So what do you do when some branch of human activity, indeed apparently some standard, takes the same word, “risk”, and uses it to mean something different? (I don’t actually know what “effect of uncertainty on objectives” is supposed to mean. I don’t see how “objectives” can be affected by uncertainty. I can see how your chances of attaining them are.)

Well, maybe you cite de Moivre, the finance industry, and system safety use, and say to your correspondant “you mean something different. I think that is unhelpful; and indeed our notion has historical precedence, so for the purposes of this conversation let’s use a different word for your new notion.” Or heshe could say the same to you. In any case, you agree to use two different words.

And for good measure, you write a blog post about it, as here.

This is not a new issue. Here’s a story from six and a half years ago. In the May/June 2005 issue of IEEE Software, Richard Fairley proposed a definition of risk for the Software Engineering Glossary of the IEEE (which is supposed to be canonical, although it turns out that Prof. Fairley doesn’t think so):

(Richard Fairley, proposed IEEE Software Engineering Glossary): The probability of incurring a loss or enduring a negative impact.

So a risk is a to be a probability, which means all risks have values between 0 and 1. Tell that to Lehmann Brothers. Well, I guess you can’t any more. Try Bear Stearns and Morgan Stanley. But we’re talking software, not money.

In common use, someone talking to his teenager speaking of “the risk of your not catching the bus in time” is likely talking about the chances of that event. Someone talking of “the risk that Lehman Brothers will go under” is likely also meaning the chances. But someone talking of “the risk of Lehmann Brothers going under” is likely also thinking of the repercussions as well as just the chances. So much meaning can a relative pronoun versus a copula+gerund carry! As with any other term you wish to be a technical term, you need to decide which meaning (of, here, two) you are going to use. And stick with it. What should be clear is that software engineers working in safety-critical systems need to speak both of likelihoods or chances, and about expected levels of loss. It seems obvious to use “chance” or “likelihood” or “probability” for the former, and some other word for the latter. Since it has been called “risk” for 300 years, why not carry on doing so? And so it is. But some people choose differently. If one is then going to use “risk” to mean “likelihood”, what word does one choose to mean the combination of likelihood and severity? There is not an obvious candidate. But you do need a word for it.

I wrote to the author, Prof. Fairley, Richard Thayer, the person overall responsible for the SW Glossary, and Merlin Dorfman, I believe the IEEE editor responsible for the section, pointing out de Moivre’s definition, the definition from Nancy Leveson’s book Safeware (Addison-Wesley, 1995), and that from the standard for functional safety of E/E/PE systems, IEC 61508, which all cohere modulo the caveats above.

Here is de Moivre:

The Risk of losing any sum is the reverse of Expectation, and the true measure of it is, the product of the Sum adventured multiplied by the Probability of the Loss

Here is Nancy Leveson:

the hazard level combined with (1) the likelihood of the hazard leading to an accident… and (2) hazard exposure or duration…

[The notion of hazard level is] the combination of severity and likelihood of occurrence.

Here is IEC 61508:

combination of the probability of the occurrence of harm and the severity of that harm

I also copied my note to Fairley in this note to the York Safety-Critical Mailing List.

Dorfman agreed that the definition could be misunderstood, but that “I believe the reader is given a fair, complete, and accurate picture of the use of terminology in this area.”. “Accurate”?

What do you do if you are a sofware engineer working in safety-critical systems? Use the IEEE SE Glossary definition, or use the IEC 61508 definition? Use different definitions for different meetings, depending on who is there? And what happens if you misjudge your audience?

Thayer was dismissive. The entire content of his reply:

The overall title of the glossary is Software Engineering Glossary.  This covers it I believe. 

In other words, he doesn’t care much for the dilemma of the software engineer working in safety-critical systems. One could well wonder why he is editing this vocabulary if he doesn’t care about such issues.

I responded to Thayer and Dorfman:

The use in finance and in PRA of the notion of risk equates it to the expected value of loss. A partial list of standards that use some version of this notion is

* IEC 61508, the international standard on functional safety of E/E/PE
safety-related systems
* IEC 300, the international standard on dependability management, in
Part 3, Section 9, “Risk analysis of technological systems”
* IEEE 1228, the standard for software safety plans
* the American Institute of Chemical Engineers guidelines for safe
automation of chemical processes
* US DoD MIL STD 882C, System Safety Program Requirements
* USAF Systems Command, Software Risk Abatement
* CENELEC 50129, Railway applications: Safety related electronic systems
for signalling (the European norm for railways; derivative from IEC
61508)
* European Space Agency Glossary of Terms
* UK Ministry of Defence Standards 00-56, safety
management requirements for defence systems; and Def Stan 00-58,
HAZOP studies on systems containing programmable electronics
* German Standards Institute (DIN), DIN-V-VDE 0801, Principles for
computers in safety-related systems

In particular, I expressed my concern that the IEEE as an organisation had publically given two meanings for risk pertaining to software engineering: one in IEEE 1228 on software safety plans, and another in the Glossary proposed by Prof. Fairley. I got no response.

Prof. Fairley responded, inter alia:

Concerning my definition of risk:  In most, if not all, situations encountered in software engineering, “risk” is the composite result of numerous factors.  In the glossary, I characterize these as “risk factors,” each of which is assigned a probability and an impact (or a range of each).  Risk factors are usually interrelated (e.g., an inaccurate size estimate affects schedule, budget, memory usage; an inaccurate schedule estimate affects product quality) so overall risk (i.e., probability of suffering loss) must be calculated using conditional probabilities or Bayesian analysis.  It is not possible to characterize a situation by a simplistic pair of numbers, unless one is dealing with a narrow, well-defined situation such as a game of chance.  It is dangerous and misleading to attempt to characterize a complex situation in this way.

Given the constraints of a glossary, it was not possible to explain the rationale for my definition or why it differs from the traditional definition; nor was it possible to explain the basis of definition for the other terms in the glossary.

Which to my mind is confused. If risk is “the composite result of a number of factors” each of which is “assigned a probability and an impact”, why ignore the impact and define it as a probability? Either it is a probability simpliciter, or it is the composition of a number of items, each of which exhibits a probability and an “impact”. It can’t be both.

That was it. End of story. The section editor thinks the definition is “accurate”; the Glossary editor is unconcerned; the author is confused. No one seems to worry about the IEEE proposing two incompatible definitions of risk in software contexts.

I wrote to some colleagues I thought might be interested: Dave Parnas, John Knight and Bev Littlewood (as well as a couple of German colleagues), explaining my dissatisfaction with this state of affairs.

Dave sympathised with my frustration, which was similar to his. He said he had seen lots of examples, and that he considered trying to write a glossary for SW terms a fool’s errand, and explained why. John thought this situation to be serious, the Fairley definition of risk wrong, and deserving of public correction. He also said that many people are concerned about a lack of precision and took Dave’s comments to reflect that. Bev strongly agreed with both John and Dave. He was particularly concerned about the dismissive response.

Continuing along the same lines, here is the definition of risk from the US National Research Council study Understanding Risk: Informing Decisions in a Democratic Society (National Academies Press, 1996), p215 (you can read this study on-line):


A concept used to give meaning to things, forces or circumstances that pose danger to people or to what they value. Descriptions of risk are typically stated in terms of the likelihood of harm or loss from a hazard and usually include: an identification of what is “at risk” and may be harmed or lost (e.g., health of human beings or of an ecosystem, personal property, quality of life, ability to carry on an economic activity); the hazard that may occasion this loss; and a judgement about the likelihood that harm will occur.

So descriptions include a likelihood of harm and an identification of what may be harmed or lost. Unless you are a software engineer using the IEEE Glossary (but not IEEE 1228), in which case it’s just a number between 0 and 1.

Here is the definition from a standard text, Probabilistic Risk Assessment and Management for Engineers and Scientists, Hiromitsu Kumamoto and Ernest J. Henley, IEEE Press (them again!) 1996, a book “sponsored by the IEEE Reliability Society”, p2:

Primary Definition of Risk: A weather forecast such as “30% chance of rain tomorrow” gives two outcomes together with their likelihoods: (30%, rain) and (70%, no rain). Risk is defined as a collection of such pairs of likelihoods and outcomes:

{(30%,rain), (70%, no rain)}

So they don’t even go for the combination of likelihood and outcome, nor do they designate certain outcomes as harmful. But if you do designate certain outcomes as harmful, then you can combine these values to calculate de Moivre risk and system-safety risk from this set.

The standard textbook Probabilistic Risk Analysis: Foundations and Methods, Tim Bedford and Roger Cooke, Cambridge University Press, 2001 (not the IEEE for a change :-) ), discusses the definition of risk over some three pages in Section 1.2. They base their notion on that of S. Kaplan and B.J. Garrick, On the Quantitative Definition of Risk, Risk Analysis 1:11-27, 1981.

A risk analysis tries to answer the questions
(i)What can happen?
(ii)How likely is it to happen?
(iii)Given that it occurs, what are the consequences?

Kaplan and Garrick … define risk to be a series of scenarios s_i, each of which has a probability p_i and a consequence x_i.If the scenarios are ordered in terms of increasing severity of the consequences, then a risk curve can be plotted [of severity against probability of at least that level of severity]. The risk curve illustrates what is the probability of at least a certain number of casualities in a given year. Kaplan and Gattrick…. further refine the notion of risk in the following way [to talk about frequency of an event instead of probability, and then uncertainty associated with a frequency]

Again, this concept is somewhat different from that of a number between 0 and 1.

John suggested I contact the then-editor of IEEE Software, Warren Harrison, which I did. Warren suggested that the appropiate action would be a letter to the editor, allowing the author and the section and glossary editors to respond if they wished.

I never did so. I regret it.

So six and a half years later, here I am writing a blog post on it. I doubt the issue will go away. Neither will this note. I do think the IEEE should work to get its definitional house in order.



John McCarthy

11 11 2011

John McCarthy has died. The great John McCarthy. Brilliant and entertaining, fun to be around, accessible unlike many of his stature, who carried an aura about him which blessed you with the feeling, if you came within it, that you were doing the Thinking That Really Mattered. Even if you were just flapping around at a loss for ideas.

The German Wikipedia describes him as a logician and computer scientist. The English version as a computer scientist and cognitive scientist. The German has it right.

John used to be quite happy to get in discussions with everyone about anything and became well known for it as Internet news groups really got started in the mid-1980′s. He had a knack for posing simple questions that turned out to be hard to answer.

And not just in AI. For example, check out his proposal for a new civil right on what counts for him as his personal page:

Remark: Ideally one would put all the information that one considers public about oneself on a page like this. When asked to fill out a form, one would simply put down the URL in place of any information that is on the page and tell the recipient of the form to just look it up.

One step beyond that is that any program needing this public information would just take it from the somewhat standardized web page.

More precisely, here’s a proposed new civil right. No Government agency, educational institution or business should ever be able to require anyone to supply anew information that the institution already has or is publicly available.

Typically for John, it is simple, doable, but somehow not done, and has significant social consequences. Let’s consider it a little further.

There are inadvertent violations. I tried to hand in a technical review of a paper submitted to an IEEE Transactions to the IEEE ScholarOne “system” (I use the word loosely) and found it wouldn’t let me do it without requiring me to fill in a lot of personal information. (I sent the review by email, and someone else has now tweaked the system enough to let me file, apparently.)

But the phenonemon is also used – and this, I suspect, is an insight of John – for political purposes. I had been asked on five or six occasions in the last year by a grant-supported institution with which I am associated to deliver information about activities (publications and talks and so on). Always the same stuff, but somehow not in quite the right format, or not quite the right selection. I began to suspect that someone is looking for a “reason” to ease me out, and so it turned out. Bureaucracy-overload as a political instrument; and of course always deniable.

The focus of this institution is, well, the successor “discipline” (I use the word loosely) to AI. John would have loved it!

Jon Hind informed a mailing list on Tuesday 25th October of the Guardian obituary that had just come out. There is a joint obituary with Dennis Ritchie in The Economist which appeared in the Novermber 5th print edition.

The Economist suggests that John did not suffer fools gladly. That is not quite how it was, as I recall. He engaged with all sorts of people – students mouthing off on Internet bulletin boards, for example, which nobody else did at the time. But he didn’t condescend. Anyone could talk to and argue with John, but he didn’t adjust his intellect to your capabilities – you had to adjust yours to his; for almost everyone an impossibly tall order. As well as being exactly what bright Stanford students need.

The Guardian article seems to me to miss most of what John was about during the 1980′s and 1990′s (the Economist, unusually, even more). Of course, after the invention of LISP, still the longest living programming language with over half a century of use (C, eleven years its junior, still has to catch up), one could regard anything else as a coda. But it was just a start. I’ll talk about the decade I know about, from the mid-1980′s to the mid-1990′s.

John had discovered, or invented, the Frame Problem, with Pat Hayes, and then came up with the cleanest purely technical proposal for solving it, Circumscription. Unfortunately, Circumscription didn’t turn out to solve the Frame Problem sui generis, but it did start a little industry all of its own. This little industry frustrated people such as Danny Bobrow, then-editor in the 1980′s of the premier journal in the field, Artificial Intelligence. Danny is a programmer through and through, who feels that to do AI you have to build stuff, that is, to program. The Circumscription industry consisted of a largish collection of mostly ex-Eastern European logicians, many of them eminent and all of them both capable and productive, who wrote great technical papers in mathematical logic and sent them to the Artifical Intelligence journal- where of course they had to be sent off to be refereed by other members of the group, and they took over about the third of the journal with all that ***** Math!! All good stuff no doubt, but it didn’t seem to some as if much was getting built………

It mirrored a significant split in AI, indeed in all computer science, which continues to this day. There are people who incline to solve problems intellectually before they solve them practically, and there are people who attempt practical solutions and solve, or resolve, issues as they come up to them. In AI in the 1980′s, they were known respectively as “neats” and “scruffies”. The neats have it right in that you cannot program solutions you do not have. The scruffies have it right in that a computational solution to a problem consists in an implementation. You might imagine that they could agree on a division of labor, but research is a little messier than that. The neats have it wrong in that abstraction is also a fine way of subtly changing the problem to fit the solution you happen to have, and the scruffies have it wrong in that a clever programmer can build wonderful programs which fail to solve the problem they set themselves, but “almost” do – the permanent, ineffable “almost”, which turns out to mean “never”.

John’s view on progress was that you knew a field was technically mature when you couldn’t understand the work of someone working on a different problem from you. Let’s turn that on itself. In some sense the division of AI research into neats and scruffies, say Danny’s frustrated view of all that math, could thereby have constituted a proof of some sort of maturity, although the way the squabbles were conducted left many wondering if that was the word for it! Maybe that was John’s point?

And John was the living contradiction to this view on progress. Of course. He could explain to anyone with a modicum of understanding of propositional calculus exactly what he was interested in and what problems he thought were worth solving. Check it out at his group WWW site. They were all so simple! Until you realised that, John being John, if they were really as simple as they looked, he would have solved them already. I recall one evening after dinner at the IJCAI conference in Detroit in 1989 when a bunch of us formal people were chatting away after late dinner. Along comes John. Says, “you know, I was thinking about this…… do you know how to do it?” and posed, as usual, what appeared to be a simple problem in propositional logic. Well, after a few minutes, everyone else made their excuses and left. I couldn’t solve it. Then came another problem. And another. All simple, all propositional logic, all needing to be solved if machines were going to mimic human decision capabilities. And, of course, AI meant that machines should be able to do this, somehow, so even if you programmed them with genetic algorithms or neural-network problem solvers, they would still just have been solving John’s “easy” problems in propositional logic. Surely a problem posed in logic can be solved in logic? Well, sometimes.

This went on for an hour and longer. At such sessions one could choose to feel stupid and frustrated at not being able to solve anything, or to revel in the creativity exhibited before your very eyes. For anyone can solve problems, but very, very few people know how to ask exactly the right questions. John was one. Astonishing performances, puzzles rolling off his tongue as if he were discussing the bus timetable. Anyone – and there were many – who claimed that “symbolic AI” was dead just hadn’t been listening properly. Symbolic AI wasn’t and isn’t “dead”. John’s simple problems need to be solved one way or another. But no longer by him, unfortunately.

Circumscription? Let me have a go. Circumscription is a syntactic (and therefore computationally feasible) way of doing the following. Say you have a description of part of the world in front of you, and what is going on in it. Say your description is in some language which allows deductive inference. Circumscription is a way of drawing rich inferences about features (“predicates”) of that scenario under the supposition that the world doesn’t have anything else in it but those objects expressly described plus whatever else needs to be there for the description to be accurate. Not just rich inferences, more than you could obtain with deduction alone, but rich, correct inferences. To logicians, it is a set of inference rules about what is true in certain minimal models of the set of sentences.

That is logically very important. Modern logic arose with Frege considering the logic of arithmetic, of counting and adding and so on. But in Frege’s logic, it turns out that you can’t just restrict your talk to the positive whole numbers. Circumscription was a way of trying to do just that for “worlds” which had a finite number of objects in them. It resolves many of the issues in the Frame Problem (maybe more appropriately called the Framing Problem), by implicitly defining what you are framing. However, it doesn’t neatly resolve the conundrum posed in 1986 by Steve Hanks and Drew McDermott and known as the Yale Shooting Problem. That was first resolved by using other principles. The conundrum it posed has now dropped out of fashion, as far as I know.

To see the rich the tradition around Circumscription, one may look at the Stanford Encyclopedia of Philosophy entries on Non-Monotonic Logic, on Defeasible Reasoning, on Logic and Artificial Intelligence, on the Frame Problem, on Ceteris Paribus laws (that is, rules based on “all other things being equal”).

John was not concerned merely with mimicking intelligence with machines, but with the more elusive reasoning phenomenon of common sense, which occurred in the title to a collection of his papers in 1990. There was a whole branch of AI research devoted to “common-sense” reasoning about the world; which in turn spawned a branch of reasoning called Qualitative Physics: how the world works ; check out for example this book chapter by Ken Forbus. (John, though, would have distinguished common-sense physics from qualitative physics.) If you put a round ball on a slanting table, it will roll down the slope and drop off the edge and hit the floor just beyond the edge, and how far beyond depends largely on its speed relative to the table when it drops. This phenomenon is known to every two-year-old a couple of decades before they can understand the Newtonian version, but we adults have far more trouble getting a handle on the qualitative reasoning than we do on Mr. Newton’s mathematics. Yet another delicious irony.

And one could go on. Maybe without end. Qualitative physics will not end; it’s a phenomenally hard problem. It may go out of fashion, but it’ll come back. And somebody will have to solve all those common-sense physics problems as well, and maybe differently. Circumscription didn’t solve the Yale Shooting Problem, but it did open up the study of rigorous forms of defeasible reasoning.

And always there was an irony, a delightful little joke in the tail. Somehow, you never quite knew whether you were thinking about a new subject or an elaborate joke. Look closely at the picture in the Guardian. Can you, also, maybe, see the slight smirk that I always thought I saw? Maybe, just maybe, AI was his very biggest joke of all…..