4 Requirements Errors

This chapter is about requirements errors. When people talk about software errors or bugs, they usually think of programming mistakes. Many software errors, however, are due to requirements engineering mistakes. This chapter defines requirements errors and how they differ from implementation errors. It then describes three broad categories of requirements errors: misunderstanding stakeholder goals, relying on invalid domain assumptions, and misunderstanding machine requirements due to incorrect reasoning. It then describes two examples of requirements errors that have contributed to catastrophic failures of an airplane ground braking system and an ambulance dispatching system.

4.1 Failures, Defects and Errors

Before talking about requirements errors, we first need to recall what engineers mean by failures, defects and errors:

We will apply these concepts to software engineering, starting with failures.

Machine failures vs. system failures

In software engineering, we must consider two types of failures: machine failures (also known as ‘software failures’) and system failures.

Machine failures vs. system failures

Observe that machine failures and system failures are not necessarily linked. A machine failure can happen without a system failure: the machine could violate its requirements without leading to a violation of important stakeholder goals. Conversely, a system failure can happen without machine failure: the machine could satisfy all its requirements and yet some important stakeholder goal could be violated.

We will later see two examples of system failures without machine failures:

In practice, determining whether a system has failed can be subject to interpretation. In some cases, like the two examples above, the system failures are clear. In other situations, different stakeholders may have different views of whether the system has failed or not, or they may argue about the extent of the failure. This is usually due to different views about what stakeholder goals are essential.

Determining machine failures can also be subject to interpretation. Sometimes, the situation is clear-cut: the machine clearly fails to satisfy its expected behaviours. In other cases, people may disagree about whether the machine has failed or not. This is usually because of a missing, incomplete, or too vague description of machine requirements. These disagreements can be amplified by diverging interests in recognising the presence or absence of a software failure, particularly when the issue involves determining blames and liabilities in a court case.

The important point is that software engineering is about preventing both machine and system failures. Software engineers are responsible for ensuring that their code meets the requirements and that the system as a whole meets the stakeholder goals. This means that understanding stakeholder goals and defining suitable machine requirements to satisfy these goals are essential parts of software engineering.

Implementation defects vs. requirements defects

We consider two types of defects.

Implementation defects vs. requirements defects

Implementation defects, also known as bugs, are properties of the software code. They are located in the code, and they may cause machine failures. They are detected through testing, program verification, code reviews, or during the machine operation if they have not been caught before.

Requirements defects are characteristics of requirements descriptions. Examples of requirements defects are missing requirements, ambiguous requirements, untestable requirements, infeasible requirements, etc. We will study requirements defects in more detail in Chapter 13 .

Requirements defects can cause system failures. If the requirements description is incorrect with respect to essential stakeholder goals, this could lead to a situation where the essential goals are not satisfied, even if the machine satisfies its requirements.

Requirements defects can also cause machine failures. If the requirements description is too vague, incomplete or hard to read, the development team could misunderstand the requirements and write code that violates the intended requirements. The development team can dispute the machine failure by claiming that the machine does not violate the requirements as they are written or as they understood them. For the requirements authors, however, the situation would still be a machine failure: the machine does not behave as they intended.

Not all requirements defects cause machine or system failures (luckily!). Many requirements defects, like minor ambiguities, are innocuous or can easily be resolved during the development process. Some requirements defects, however, can have dramatic consequences or can be very expensive too fix if they are not caught early. Chapter 7 will present empirical studies and expert opinions about the prevalence, criticality, and cost of requirements defects.

Implementation errors vs. requirements errors

The next distinction is between implementation errors and requirements errors.

Requirements errors vs. implemenation errors

A symptom of requirements errors is a situation where the machine satisfies its requirements, but the system as a whole fails to satisfy some essential stakeholder goals.

Requirements errors can be made in the process of discovering, analysing and writing requirements.

What is less obvious is that requirements errors can also be made by programmers while coding! Because the machine requirements are rarely (if ever) defined completely and without any ambiguity, programmers have to fill in the gaps. Sometimes, they can do this by talking to the requirements engineers, the product owner, or stakeholder representatives, but often, they will rely on their own knowledge and beliefs about the application domain and stakeholder goals. In this process, they could make wrong inferences about what needs to be implemented. These are requirements errors because from the perspective of a programmer, if the machine behaves as they intended but not as required, the mistake is in understanding the requirements, not in writing the code.

4.2 Three Types of Requirements Errors

In Chapter 3 , we saw that requirements engineering involves three broad activities:

  1. identifying, formulating and agreeing stakeholder goals (Goals),
  2. identifying and formulating valid domain assumptions (Dom), and
  3. formulating machine requirements (Req) such that Req, Dom \(\vdash\) Goals.

Requirements errors can be made in each of these activities:

Three types of requirements errors
  1. Misunderstanding stakeholder goals: the set Goals does not accurately describe the essential stakeholder goals;
  2. Invalid domain assumptions: some assumptions in Dom are not true of the World.
  3. Inadequate machine requirements due to incorrect reasoning: the machine requirements Req are insufficient to satisfy the stakeholder goals Goals under the domain assumptions Dom; in other words, the formula Req, Dom \(\vdash\) Goals$ does not hold.

These error types highlight once again the importance of discovering and analysing stakeholder goals and domain assumptions. Paying attention to domain assumptions is particularly important because many system failures are caused by invalid domain assumptions. The next two sections present examples of failures due to invalid domain assumptions for the ground braking system and ambulance dispatching systems.

4.3 Example 1: Airplane Ground Braking System

Consider again the ground braking system’s goals, requirements, and domain assumptions of Section 3.3 . The top part of Figure 4.1 recalls how goal G1 is satisfied by requirement R1 under domain assumptions D1 and D2. The logical reasoning is correct: R1, D1, D2 imply G1.

However, if the domain assumptions are wrong, a machine satisfying R1 may no longer be sufficient for satisfying G1. The red boxes in Figure 4.1 are called obstacles. They show situations where the assumption D1 would be invalid: The assumption D1 would not hold if the plane is moving on the runway and its wheels are not turning, which could be caused by the plane skidding on a wet runway.

This violation of D1 did happen and was one of the contributing factors to the crash of an Airbus in Warsaw in 1993. The plane landed on a wet runway, the wheels were skidding and, as a result, the software did not allow the pilot to deploy the full braking system for a certain time. This resulted in the plane crashing at the end of the runway.

Obstacle Analysis: A Preview

In practice, one way to reduce errors due to invalid domain assumption is by examining each domain assumption one by one to identify possible ways in which the assumption could be violated and possible ways to revise or extend the requirements to prevent, reduce or mitigate such violations. This is the idea of a goal-oriented requirements analysis technique called Obstacle Analysis that we will study in Chapter 19 . An obstacle is a property that violates some goal or some assumption. For example, the plane skidding on a wet runway is an obstacle to assumption D1. Figure 4.1 shows how obstacles are represented in goal models. The two red boxes are obstacles that block the satisfaction of the assumption: O1 is the negation of D1, and O1.1 is a subobstacle that implies O1. Chapter 19 will describe obstacle analysis in more detail.

Airbus A320 Crash in Warsaw in 1993

Our ground braking system example is a simplified version of the real system. The conditions that led to the 1993 Warsaw crash are more complex than described above.

In the real system, different conditions are used to activate the reverse thrust, the ground spoiler, and the wheel brake. The software also checked not only that the wheels were turning but also whether the struts on the landing gears were compressed. For example, one of the machine requirements looked like FR1 from Chapter 1 :

[FR1] The ground braking system must be enabled when, and only when, at least one of the following conditions is met:

The domain assumption on which the system relied was that during landing, the two main landing gears of the plane would be on the ground with either their wheels turning or shock absorbers compressed. That assumption turned out to be wrong on that particular landing. On that day, due to heavy rain and wind, the plane landed on a single landing gear, and the wheels skidded on the wet runway, which meant that the software did not allow the pilot to brake for 9 seconds after touchdown. The accident investigation report attributed the causes of the accident to a combination of pilot errors and inadequate software design. Further information about the accident can be found in the end of chapter notes.

4.4 Example 2: 1992 Failure of the London Ambulance Service

Let’s now look at our second example: the failure of the London Ambulance Service in 1992. The following description is based on the detailed report of an inquiry that analysed the failure.

The Previous Paper-Based System

Up until 1992, the London Ambulance Service used a paper-based system to allocate ambulances. The process operated as follows.

This paper-based system was slow, laborious and inefficient. The ambulance service was thus unable to meet the Government target requiring that, for at least 95% of incidents, an ambulance arrives at the incident scene within 14 minutes after the first call.

The New Computer-Based System

The ambulance service decided to replace the paper-based system with a new computer-based system that would have the following components:

Following a procurement process, the contract for building this system was awarded to a consortium led by a small software development company with no previous experience with similar systems. Because of political pressure, the project had a very short deadline and had to be launched in a single phase. There was no time for small-scale trials.

The system was launched on time, but things did not work as planned. Instead of arriving faster, ambulances started to take gradually longer to arrive. In one case, the ambulance arrived 11 hours after the call. Eventually, the system suffered a complete failure. Call handlers had to improvise to recover the situation. They could no longer use the paper-based system because the infrastructure had been disassembled.

Goals, Requirements, Assumptions

To understand the failure, let’s first look at the goals, requirements and assumptions for this system.

The main stakeholder goal is:

(G1) An ambulance should arrive at the incident scene within 14 minutes after the first call reporting the incident.

To contribute to that goal, the Computer-Aided Despatch software relies on a series of domain assumptions, including:

(D1) Call takers encode the correct incident’s details and location.

(D2) The GPS gives the correct ambulance location.

(D3) When an ambulance is allocated to an incident, the ambulance crew drives the ambulance to the incident location.

(D4) When the ambulance arrives at the incident location, the ambulance crew signals arrival on their Mobile Data Terminal.

The software requirements were such that one can be reasonably convinced that if the software satisfies its requirements and the domain assumptions are valid, then the stakeholder goal will be satisfied. So far, so good.

Invalid Domain Assumptions

Unfortunately, many of the domain assumptions were incorrect.

The global effect was a complete system failure, as ambulances gradually took longer to arrive at incidents, up to the point where the control room decided to stop using the software and improvised a paper-based solution.

The Inquiry Report

Soon after the failure, an inquiry was set up to investigate the causes. The publicly available inquiry report provides detailed information about the whole project, from procurement to operation. It describes the many errors that led to the system failure. One of the main conclusions is that the system failure was not due to a software failure:

“On 26 and 27 October 1992, the computer system itself did not fail in a technical sense. Response times did on occasions become unacceptable, but overall the system did what it had been designed to do. However, much of the design had fatal flaws that would, and did, cumulatively lead to all of the symptoms of systems failure."

The report identifies many other factors that contributed to the system’s failure. Many of these factors reflect common problems that we continue to see in software projects today:

Reading the inquiry report provides valuable insights into what can go wrong in the procurement and management of software projects. These insights can help you recognise the early signs of a dysfunctional project if you ever encounter one in your career.

4.5 Notes and Further Reading

The airbus ground braking system was first used to illustrate the role and importance of domain assumptions in software development in Michael Jackson’s ICSE’95 keynote talk (Jackson 1995b) and his requirements engineering book (Jackson 1995a) . Peter B. Ladkin made the report about this accident publicly available (Link) and published an analysis of the technical description of the Airbus A320 Braking System (Ladkin 1995) . Ian Sommerville discusses the accident from a software engineering perspective in an informative video (Link). You can read about later evolutions of the A320 ground spoiler logic in the Airbus Safety First magazine (Boissenin and Salavy 2010) .

Details of the 1992 failure of the London Ambulance Service can be found in the inquiry report (South West Thames Regional Health Authority 1993) . Anthony Finkelstein and John Dowell gathered a series of resources related to this failure (Link) and introduced that system as a case study for the software engineering community (Finkelstein and Dowell 1996) . Countless research papers have used that system as a motivating or illustrative example, including the work from my PhD thesis (Letier 2001) .

Boissenin, Stéphane, and Elisabeth Salavy. 2010. “A320 Family/Evolution of Groundspoiler Logic.” Safety First Magazine 09.

Finkelstein, Anthony, and John Dowell. 1996. “A Comedy of Errors: The London Ambulance Service Case Study.” In Proceedings of the 8th International Workshop on Software Specification and Design.

Jackson, Michael. 1995a. Software Requirements & Specifications: A Lexicon of Practice, Principles and Prejudices. ACM Press/Addison-Wesley Publishing Co.

———. 1995b. “The World and the Machine.” In 17th International Conference on Software Engineering (ICSE’95).

Ladkin, Peter B. 1995. “Analysis of a Technical Description of the Airbus A320 Braking System.” High Integrity Systems 1: 331–50.

Letier, Emmanuel. 2001. “Reasoning about Agents in Goal-Oriented Requirements Engineering.” PhD thesis, Universit é catholique de Louvain.