Diagnostic skill in process operation

(Note : I use the word 'skill' here to mean developing level of ability between beginner and expert, not to label a specific type of cognitive processing.)

In process operation, fault diagnosis can be a crucial skill, and increasing in interest as the operators’ main task in automated systems may be to deal with things that go wrong.

This paper was published in 1984. Some papers published up to the early 90s are listed at the end.

Hopefully there has been much relevant work on this since then. And operators’ control consoles now have digital displays and keyboard controls, rather than a room full of dials and knobs. But I think the studies done up to that time still raise many important points.

The cognitive processes people use in diagnosing have been studied in detail in a couple of tasks - faults in dynamic continuous processes, and faults in electronic equipment. These studies show that different cognitive processes are used in each, so it is not helpful to extrapolate from one to the other. 'Diagnosis' is not done in the same way in all contexts.

When something goes wrong in an industrial process, process operators are not only diagnosing but simultaneously trying to keep a dynamic process under control, and the process’ responses to control actions may provide information about what is wrong, so diagnosis and compensation are not clearly distinct. Also they only work with one process, about which they have a great deal of specific knowledge.

While maintenance technicians mostly do not have to control as well as diagnose, and they need a diagnostic strategy which works for many different types of equipment.

So rather different models of their thinking may be needed to represent what they are doing, and different display formats may best support their activities.

Topics and main points :

Introduction - during plant faults, operators have to both diagnose the fault and control the process, and these are not independent. The process’ behaviour in response to efforts at fault management is part of the evidence used in diagnosis.

Evidence about operators’ fault management :

Operator aspects of nuclear power plant incidents - summary of the O’s thinking during 6 real nuclear power plant incidents, identified from post-event interviews.

Review of the difficulties operators had with diagnosis and fault recovery in those incidents, and the practical implications.

Some comments on the data required from operator studies.

Diagnosis and compensation behaviour in a simulated incident.

General methods of diagnosis by process operators :

- origin of hypotheses about fault; testing hypotheses;

- words and models used to describe cognitive processes - comments on the adequacy of skill/rule/knowledge based accounts.

Form of the operators' knowledge, and display support :

- images or sentences;

- process information used in diagnosis - variables or components, comparison of diagnosis by maintenance technicians and process operators leads to different display recommendations;

- what types of information are used by the operators, and what might be the best form of display for them; problems because it may not be easy to map from one representation to another;

- levels of knowledge - aggregation/abstraction, are they sufficiently distinct or stable in an operator’s thinking to design specific displays.

General aspects of interface design and performance prediction : perceptual-motor skill; predicting performance and working memory capacity.

DIAGNOSTIC SKILL IN PROCESS OPERATION

Lisanne Bainbridge

Department of Psychology, University College London

August 1984

Proceedings of the 1984 International Conference on Occupational Ergonomics, Volume 2 : Reviews. May 7-9, Toronto, Canada, pp. 1-10.

Introduction

Interest in the ergonomics of process control has increased recently. This has been partly as a result of the Three Mile Island nuclear power plant incident, in which operators made errors which could be attributed to inadequacies in both interface and training (Malone et al (1980). Also recently developed potential for supporting human decision making by computer has led control system designers to ask what form this support should take.

Earlier research focussed on the process operator as a controller (e.g. Edwards & Lees, 1974; Bainbridge, 1981). In large modern processes, the operator is expected mainly to deal with infrequent plant transients such as start up, shut down, and system failure. The operator has to detect and diagnose as well as to recover from system failure, so the emphasis is on their cognitive functions ('cognitive' refers to memory, attention, interpretation and thinking).

Recent theoretical work has emphasised diagnosis rather than control (e.g. Rasmussen & Rouse, 1981), although the reason for this emphasis is not clear, as studies of operator behaviour during real incidents (see below) show that the operator may have more difficulty during compensation/ recovery/ fault management than with diagnosis. It is frequently stated that perceptual-motor control skills are no longer used in process control but in fact this is not the case. The operator does have to use control skills in these transient situations (e.g. Ainsworth & Whitfield, 1983), and has little opportunity for practice.

This paper will cover two main topics : the way operators deal with system failure, and the implications of these findings for interface design, under the following headings :

- Diagnosis and Compensation using process faults

- Operator aspects of nuclear power plant incidents.

- Data required from operator studies.

- Diagnosis and compensation behaviour in a simulated incident.

- General methods of diagnosis by process operators.

- Interface support.

- Form of the operators' knowledge.

- General aspects of interface design and performance prediction.

- - -

Diagnosis And Compensation During Process Faults

Operator Aspects Of Nuclear Power Plant Incidents

The six nuclear power plant incidents commented on here have been reported as follows :

These reports give detailed post-event analyses, made with operators, of what happened during each incident. Reference numbers below refer to numbered decisions in the above reports.

Operator activities in these incidents :

Fault diagnosis :

In 3 out of 6 of these incidents there were no problems with diagnosing the system failure.

In at least two cases the operators considered alternative causes for the plant failure symptoms, and looked for confirming evidence (PI/1 and G).

There were 3 diagnostic failures :

1. At TMI-2, diagnosis took a long time because an indicator implied that a valve was closed when it was actually open.

2. In OY/2 and OY/3 one operator assumed wrongly that another operator had followed earlier instructions correctly, and made further decisions on this basis. (It is possible that interface design led to the initial error).

3. In G/3 the operators accepted inadequate evidence as confirmation of their hypothesis about the fault, possibly because they were busy.

Compensation for system failure :

The operators predicted the effects of alternative actions in order to choose between them.

In two cases they clearly show good control (NA/2, OC/4).

There were various reasons why the operators' other action decisions were made under high uncertainty :

1. The action had unpredictable and risky effects (PI/3).

2. The displays gave inadequate information about present system state, either because of poor interface design (OY/1, G/1B, OY/2, OY/3, OC/3, G/5, TMI/1, TMI/3) or through instrumentation failure (OC/2).

3. The operator assumed wrongly that another operator had made the correct actions (OY/2 and OY/3), or that the process automatics had functioned correctly (OC/3, TMI/2).

4. The operators could think through the direction of change in a cause-effect chain, that is whether variables increase or decrease (for a formal representation of such chains see Nakamura et al, 1982). However the operators did not have sufficient knowledge of process dynamics to predict the size and timing of these effects (NA/2, G/2, G/5).

5. The operators could think through a cause-effect chain, but did not have enough knowledge about the conditions in which some actions should not be used (G/1C1, TMI/1).

6. The operator did not know enough about cause-effect chains in the plant, due to inadequate training (TMI/1).

7. The reports show that operators did not follow 'procedures' blindly, but thought out the effects of suggested actions and assessed whether they were appropriate. There were several occasions when the operators had difficulty in deciding whether the required procedure was the best action in the circumstances (PI/3, NA/2, G/1A, TMI/4).

8. The operators were sometimes distracted (NA/1, NA/3) and sometimes preoccupied (G/1C2, G/3).

Practical Implications

These findings suggest several practical recommendations :

A. Operator faith in automatic equipment can be misplaced, so it could be a mistake to allocate task functions or train operators on the assumption that automatic equipment is failure proof, unless there is adequate back-up equipment.

B. The operators' control activities during plant transients must be better supported. The findings suggest that operators usually know about the causal chains in the process, but not enough about the dynamics (size and timing) of the process changes. The operators would have less difficulty in evaluating alternative actions if they had less uncertainty about the future development of the present plant state, and the effect of actions on it.

Information about causal chains is intellectual, and can be learned from lectures or conversations.

Information about dynamics is a 'feel' skill which can best be learned through hands-on experience with a well designed interface. It is important to ensure that operators can identify time lags and rates of change from the interface. It is also interesting to ask whether it is possible for the operators to learn 'feel' skills by using a keyboard and button pressing sequence to indicate the size of the control action they want to make, rather than a control on which effort of movement correlates with size of effect.

C. The studies showed that procedures could be difficult to look up, and ambiguous. Also :

1. The procedures were treated as advice, but the operators could not evaluate alternative actions without uncertainty and therefore risk, when they knew too little about plant dynamics.

2. The operators assessed whether a procedure did not allow for the particular circumstances and should therefore be overridden. Process operators are frequently instructed to take this approach. The operators did not always think that NRC procedures suggested the best action, but these actions have a regulatory status so the operator was under extra stress in evaluating them.

3. In each case where the operators questioned the use of a procedure they were concerned that if they followed it they would loose some important and preferred controlling function, such as a method of cooling. This suggests that operator training should include experience of controlling the plant when major control functions are not available, so that they have a better basis for evaluating whether loss of a function will be crucial.

D. Distraction is known to be a major source of human error (Reason & Mycielska, 1982). Several ways of combating this can be suggested.

1. Interface design should support considering all eventualities, and give feedback that actions have been made.

2. Operators should have experience of completing tasks, when they have other competing tasks of high importance, so that they are aware of distraction as a human limitation.

Data Required From Operator Studies

The above reports all contain detailed information about what the operators thought and did, from which readers can draw their own conclusions about what was going on. At this stage of our understanding, when there are no generally accepted concepts for cognitive tasks, let alone a theory sufficient to account for them, then detailed data is necessary. For example, it is not adequate to study only incident reports, from which one is simply likely to discover the implicit accident theory of the people making out the reports.

In considering support for decision making, the crucial questions are about how the decision making is done, not just the end result. We need information about the operators' understanding of the situation and their thinking, not only a record of their actions. (Duncan, 1982, distinguishes 'process' models - models of thinking, from 'product' models - models of actions.)

For example, in Hollnagel's (1981) study of verbal ('think aloud') protocols collected during simulated nuclear plant operation, the operator said (S1 at 03.15) :

'so if I run it [the rod bank] up now, then I have to take in some water to get the rods back in again'.

This appears in the activity summary as :

determine status of system

increase water batch.

The protocol shows that the operator is anticipating the need for an action, he is not looking at the system state without prior expectations or intentions and then deciding to respond by increasing the water flow. If only the activity summary had been reported in this study, it would not be possible for a reader of this report to make this interpretation.

Pew et al (1981) divided the operators' behaviour into categories of : available information, event signalled, knowledge or belief about system state, intention, expectation, decision/action, source for decision/action, immediate feedback.

Woods (1982) used : detect, interpret, control, feedback.

Both these reports give a brief summary of the behaviour in the category, so these categories provide a structured precis of events, described with a cognitive emphasis.

If verbal protocols, rather than interviews, have been collected, they should be presented in expanded form. Operators tend to talk cryptically, and say things like 'I must do that because ....', without completing the phrase. The reason must be obtained from an interview, and the meaning of 'that' must be identified, for use by a reader who does not know the plant well. A diagram of the plant and components mentioned is also necessary for a reader not familiar with the industry.

Diagnosis And Compensation Behaviour In A Simulated Incident

The reports on the nuclear plant incidents give rich information about what the operators did. The analyses were however made after the event. The operators' reports may have been influenced by changes in memory for events. Also, in post-event interviews people can give reasons which were not thought about explicitly while doing the task (e.g. Bainbridge, 1981). Post-event interviews may not therefore give valid information about whether operators reached their conclusions unconsciously or by thinking through at the time. Post-incident interviews may also focus on the strategies used to find the fault, while at the time the operators may have focussed on explaining or anticipating unusual process behaviour.

The above reports must therefore be supplemented by data gathered during diagnosis and compensation, rather than after the event. I know of only one study (on a full scale simulator, using experienced operators who did not know which fault to expect so that the situation was as much like a real incident as possible) in which detailed protocols have been collected during response to the failure, and analysed from a cognitive viewpoint. Page et al (1983) studied a team of 3 commissioning engineers working in a PWR training simulator.

In this 'incident', the first few things to happen were :

1. audible and visual alarms indicated that a pump had failed.

2. the Shift Leader initiated the procedures for stabilising the plant in response to this failure.

3. the Shift Leader asked the Reactor and Turbine Operators to monitor for the possible effect of the out-of-action pump.

4. he then telephoned the technician on the plant to ask him to find out what was wrong with the pump.

Even this brief extract shows that 'diagnosis' and 'recovery' are not single processes, but are general words for several different types of activity.

There are three ways of detecting changes such as non-normal plant conditions :

1. responding to an alarm. To psychologists this is an 'orienting response'.

2. thinking of something which needs to be checked. This is active attention, or hypothesis testing, and is difficult to distinguish from diagnosis.

3. incidentally noticing that something is wrong while doing something else. In psychology and artificial intelligence this may be called the operation of a 'demon' (Charniak, 1972). [In some process operation studies this has been called ’serendipity'.]

At this stage in this incident these operators thought that they knew what was wrong with the plant (information that this was not true only appeared later). Their main concern was to maintain system integrity, to prevent the 'disturbance' from becoming large enough to set off the shut-down systems. This is an important reminder. The six real incidents above were analysed because the reactor had tripped and a dangerous possibility had arisen. As the worst failure in a PWR can develop within a few seconds it is of course important to design the safety systems to cope with this, but this should not distract from the fact that in many more everyday failures the operators are not dealing with a situation in which the shut-down safety systems have operated.

The operators in this case knew what part of the plant was not functioning correctly, at the level of the component which caused the abnormal process behaviour, i.e. the pump, but did not investigate in more detail. A technician was given the task of finding out which component within the pump needed to be replaced or repaired.

There can be several phases of stabilisation/ compensation/ recovery, e.g. :

1. The operator tries to keep the process in, or return it to, a stable state.

The operator may know what is wrong and how this affects plant behaviour ('compensation for fault', Rouse, 1982),

or the operator may not know what is wrong ('compensation for symptoms'). Compensation for symptoms may be necessary either because the operator is still trying to diagnose the fault, using in part the information gained by trying to stabilise it, or because the operator does not have adequate knowledge of plant causality in this fault situation.

2. a technician repairs the faulty component.

3. the operator brings the process back up to its normal operating level.

- - -

General Methods Of Diagnosis By Process Operators

Diagnosis is a form of problem solving: the operators have hypotheses about what is wrong, and these hypotheses have to be tested. General models of problems solving can include a first stage of devising a problem solving strategy. Experienced process operators do not appear to do this, which suggests that their general strategies are already developed.

Origin of hypotheses

The models for how operators produce their hypotheses about the reasons for plant failure, which are reviewed by Rouse (1982), are of two basic types :

The selection of things to consider further could arise 'unconsciously', that is they may be thought of without any conscious awareness of the mental processes by which the alternatives were suggested,

or they may occur as a result of thinking explicitly about the potential alternatives.

It is known that the unconscious process can be a highly efficient way of suggesting appropriate behaviour, given experience, and it is an important form of cognitive skill.

Before asking which of these methods is used by experienced process operators, we need a method for identifying them. Any method has to depend on reports by the operators, and 'introspection' has well known difficulties. For present purposes I suggest a 'negative' method of inferring from verbal protocols. We know, from the protocols of individual operators and the conversations of teams, that the operators do explicitly think through the effects of causal chains in the process when comparing alternative actions during compensation. I suggest that if material of this sort does not appear in the protocols, then the operators have not explicitly thought through causal chains to identify possibilities. Of course this is weak evidence about the operators' conscious experience, but it is clearly identifiable.

On this basis, there is unpublished evidence from Page et al 's (1983) study that the operators think of the hypotheses to test by unconscious cognitive skill [i.e. they know what to do without thinking it through]. If this is so, then a major implication is that studies of plant fault diagnosis must be done using skilled operators, as extrapolation from the methods of inexperienced operators may be invalid. The efficiency of this unconscious process will depend on the operators' experience of faults and knowledge of the process plant, so may be incomplete and must be supported by the interface design.

Two further points must be made.

One is that in the Page et al example the operators were commissioning engineers, who would be expected to have more experience of dealing with plant failure than the average operator.

The second point is about the type of fault training. Training of English operators, at least until 1981, was in the form : they are told what fault has occurred, and trained to work out what the process behaviour will be as a result. This is the reverse of a real fault situation, in which they see the abnormal process behaviour and have to find out what caused it. Unfortunately cognitive processes are not instantly reversible. It might be that one would find explicit reasoning sequences in the generation of hypotheses by operators who had been trained to think about faults the appropriate way round. However German operators receive mixed training, with some faults presented without prior warning, and these operators still appear not to reason explicitly in the symptom-fault direction. (Reasoning from event to effect is used by the operators in the compensation part of their task, when they anticipate the effects of actions in order to choose between them. This is the easier direction to handle as it reduces the combinatorial explosion of possibilities to consider.)

Testing the hypotheses

There are three ways in which the operators could test their hypotheses about what is wrong with the plant :

a. by checking the interface or the plant for direct information about whether the hypothesised faulty component is working correctly. There were many examples of this in the 6 real incidents and in the Page et al study.

b. by deliberately making a change to the process which will have one anticipated and useful effect if the hypothesised component is faulty and another if it is not. There was one example of this in the 7 analyses which give information about the operators' intentions (OY, time 1421-3). This is evaluative diagnosis (see below). There is [happily !] no example in these process operation incidents where the operator 'injects a test signal' into the process just to see what happens, as maintenance technicians do.

c. Thinking through and evaluating predicted consequences only seems to occur (in these examples) during fault compensation, as analysed above. Although both diagnosis and compensation are problem solving situations, the hypothesis testing stages are essentially different.

During diagnosis the hypotheses are about the state of the external world, which must be checked directly.

During compensation, the hypotheses are about 'good' actions, and evaluating the 'goodness' of a proposed action consists of mentally thinking of its consequences and comparing these predictions with known criteria.

Cognitive skill

The above analysis makes use of the notion that operators could think of hypotheses to test either unconsciously/ automatically, or by thinking though causal chains. This is a superficial categorisation of the possibilities, which is convenient for this level of discussion but is inadequate as an account of the nature of cognitive skill ['skill' meaning amount of expertise, rather than a particular type of processing]. Whether an operator uses automatic skilled behaviour, or thinks out what to do (which can itself be more or less skilled) depends on their experience and on the unpredictability of the environment. A highly skilled operator is more likely to act automatically, but should be able to change freely to 'thinking it out' methods if something unusual happens. Operators should use both interchangeably as required. There is some discussion of this flexibility in Bainbridge (1978).

[Issues with the words and models used to describe cognitive processes]

The account of cognitive behaviour which control engineers may be most familiar with is that given by Rasmussen (e.g. 1983a), which is simply based on this automatic/think through distinction. This simple categorisation does have value. Pew et al (1981) found both Rasmussen's behaviour taxonomy and his pyramid model were useful in explaining to operators what Pew et al were interested in finding out about. The diagrams are useful for giving a basic idea, to people who know nothing about cognitive processes, of the sequence of stages in making a decision, the ways a decision can be made, and the flexibility of the processes. However they do not give an account of cognitive mechanisms which is sufficient for a specialist making design decisions. Two types of way in which Rasmussen's account is incomplete can be illustrated.

One is the interpretation of the words 'skill', 'rule', and 'knowledge'. Rasmussen (e.g. 1983a) suggest three main types of cognitive behaviour, which he calls skill based, rule based, and knowledge based. There are problems with using this taxonomy.

For example, when an operator detects abnormal plant behaviour they may immediately think, without conscious deliberation, what faults it could be due to. This immediate thought, in which the person is not aware of the processes mediating between input and response, might be described by a psychologist as cognitive skill. It is conditional, i.e. 'if x then y ', behaviour so could be described as a rule or production system. It is also knowledge based, in the sense that it could only be done effectively by someone who knows a great deal about the process.

Inversely, the word 'rule' may be used to describe :

following a given procedure or 'algorithm', or

using a standard method which has developed on the basis of experience, or

using a 'heuristic' or 'rule of thumb'.

In accounts of 'rule' and 'knowledge' based behaviour it seems to be implicit that an 'if variable value is i then do action j ' sequence is an example of a 'rule', while 'if variable A changes then variable B changes' or 'if component x changes then variable y changes' are 'knowledge'. In an expert system these could all be rules in the knowledge base.

These are not just semantic quibbles, the difficulties arise because more than 3 different types of processing are required of the operator, and it is difficult to find a way of assigning them to only 3 categories which people will agree on. In the taxonomies which have proved most useful in analysing real tasks, the categories [labels for types of cognitive processing] used have distinguished the function of the behaviour within the operator's thinking (e.g. Pew et al, 1981), rather than making assumptions about mechanism. For example, Pew et al (1981) distinguished : knowledge or belief about the process state, intention, expectation, decision. Whether it is appropriate to use more detailed categories of cognitive behaviour (such as compare, explain, recall) depends on the task being investigated and the purpose of the study, e.g. Umbers (1975), Ainsworth & Whitfield (1983).

The second problem [apart from problems with the meaning of the words 'skill', 'rule', 'knowledge'] is with the associated model for the organisation of cognitive behaviour, described Rasmussen as a pyramid. Again people with no knowledge of cognitive processes find this gives them useful insights, but it does not contain mechanisms sufficient to account for complex behaviour.

In Rasmussen's 'pyramid' model (e.g. 1983b) he places 'skilled' behaviour at the base of the pyramid and 'knowledge' based behaviour at the top. Indeed anyone with an academic background is taught to consider conscious problem solving as the highest form of mental activity. However, some of the most important contributions to problem solving come in 'eureka' or 'creative’ experiences, when a solution appears without any conscious thinking activity. [Evidently the person has been doing mental processing about the problem unconsciously.]

Rasmussen shows various routes through his three behaviour types, from the input stimulus to the output action. All the routes through the model are from stimulus to response ['bottom up']. This can be misleading as it does not include most of the ['top down’ - using existing knowledge to initiate or sequence behaviour] aspects of cognitive behaviour which make human thinking so powerful, and does not recognise important aspects of its flexibility which must be supported by interface design.

[more on this in later papers, see 'Development of skill', and 'Change in concepts', also ’Types of Representation', 'Multi-plexed VDTs', 'Planning Training']

There is also the issue that Rasmussen studied the detailed cognitive processes used in diagnosis by maintenance technicians. Such people do not simultaneously have to keep something under control, so they do not need a 'mental picture' of the state of current state of the device and how this relates to its dynamic behaviour. That means there is no need in his cognitive models for 'working storage', which is such an important part of the cognitive processes of people doing dynamic tasks.

At least the following mechanisms affect the operators' behaviour relative to current goals and anticipated events :

1. Feedback : Feedback of information obtained as a result of doing the action, so errors can be identified and parts of the previous behaviour repeated.

2. Recursion : While someone is solving a problem they may come across another problem which must be solved before they can solve the first one. This embedding of the same operation within itself is called recursion.

3. Mental simulation and anticipation : the primary use of a 'mental model' of the behaviour of the outside world is as a basis for preplanning, anticipating events, or for thinking through the effects of an action to evaluate it before a signal arrives or an action is made.

4. Working memory and multiple goals : All the forgoing types of behaviour are coordinated by reference to the person's knowledge of the actual and desired state of the external world. Goals can be adapted to the present possibilities (e.g. Hayes-Roth & Hayes-Roth, 1979).

All these aspects of cognitive behaviour are powerful, and it would be difficult to produce a single diagram which showed how they interact with each other. The complexity of the sequencing processes in cognitive behaviour, and the flexibility of interchange between 'skill' and 'problem solving' types of behaviour are indicated in Bainbridge (1981 [about the type of model used to describe the human contribution to systems]), which is also too simple an account.

- - -

Interface Support

Form Of The Operators' Knowledge

We are beginning to know a little about how the operator think when responding to a process fault. What sorts of knowledge does this thinking depend on and build up, and how can this be supported by interface design ? The general term for this knowledge is the operator's 'mental model', but this knowledge is not some sort of simple unit. A recent collection of papers on mental models (Gentner & Stevens, 1983) is of interest to psychological researchers in this area, but the papers use such general terms as 'schema' without giving explicit information about the form of the user's knowledge from which one could make interface design recommendations.

The process operator could have a large number of different types of knowledge, the uses of which are interrelated, and which might be generated/ inferred from each other by more or less obvious mappings. Are there basically different types of knowledge, which are processed in different ways ? What information is used by the operator in diagnosis ? During diagnosis, is the focus of the operator's attention on the process variables or the plant components ? What forms of knowledge do they need, and what are the problems of moving between these forms of knowledge ?

Visual images or sentences

The models of diagnosis which Rouse (1982) reviews are concerned either with unconscious skill or with explicit thinking through. They can also be characterised on another dimension : the data they work with are either in the form of images/patterns or of logical predicates/ propositions/ language. Information in pattern or language form may be best represented by different types of display. There is a major debate in psychology about whether there actually are these two distinct forms of mental representation, and whether it is ultimately possible to distinguish between them (for the debate in action see Kosslyn et al, 1979).

In this paper, the concern is whether a task can be done more easily using one form of display than another. Such a result is usually taken to have implications about the nature of the mental representation used in doing the task, but questions about whether this inference is valid are not of concern here. Presumably anyway, in any complex behaviour all possible forms of mental representation will be used.

Display support. A few examples can be quoted :

1. In laboratory experiments which have compared the performance of people using image and language forms of display, studies using deductive reasoning tasks (i.e. ones in which the answer lies within the information given) give inconclusive results (e.g. Mayer, 1976; Polich & Schwartz, 1974), while there is evidence that a spatial representation is easier to use in more creative tasks, in which the solution requires an integration and interpretation of the information given (e.g. Gerwin & Newstead, 1977; Carroll et al, 1980).

2. Thorndyke & Hayes-Roth (1982), in a study of training for spatial knowledge, found that training in using a map led to better performance in some tests, while experience of walking around the space led to greater ability on others. After considerable experience the difference between the two forms of training disappeared. This does not mean that the two forms of representation are equivalent, but suggests that after practice the user develops both forms of mental representation. Starting with a map, some spatial relations are easier to handle, but knowledge about paths through the space develops with experience, and vice versa. Propositions about individual parts of the structure can be extracted from a pattern, and a pattern can be built up by relating individual propositions, so the question in display design is which form of knowledge needs to be most easily accessible.

In most aspects of process control the operator is concerned with the structure of relations between facts, rather than with isolated pieces of information, so one might infer that visual patterns would be a more effective display format. There are general points as well as experimental findings which support using spatial displays for structured information. Information can often be expressed in pictures more succinctly and using a less specialised professional vocabulary. The problems of transforming predicates about spatial relations disappear. Patterns can contain implicit information, from which propositions can be extracted if necessary.

However an important problem with graphic displays is that it is difficult to represent some types of information, e.g. to represent the many types of organisation of cognitive behaviour in a single diagram. An example of the power of visual presentations comes from engineering papers on cognitive processes : the diagrams or tables are called 'models' and are considered more rigorous [and more easily communicated] than the verbal passages which discuss the difficulties of a simple account, such as appear in the text accompanying Rasmussen's diagrams.

Process information used in diagnosis

[Different diagnosis tasks, e.g. diagnosis by maintenance technicians or by process controllers, may have different optimum strategies and use different information, so it may not be possible to extrapolate task design recommendations from one to the other.]

As usual, observation of actual behaviour shows that the theoretical analyses leave out some important aspects. Rasmussen has published the most interesting analyses of the ways in which information is actually used in diagnosis. He has published two different studies and uses the term 'topographic search' in both, with different meanings - which confused me considerably. Rasmussen studied diagnosis by electronic maintenance technicians, and has extrapolated from his findings to make suggestions about diagnosis in process control. Now we have more information about process control diagnosis it is important to ask whether this extrapolation is valid.

One use of the term 'topographic' is concerned with the information used in diagnosing abnormal process behaviour. Rasmussen (1981) has distinguished between :

1. 'topographic search' : using information about the normal level of the variable being checked to identify abnormal behaviour.

2. 'symptomatic search' : using specific information about the relation between actual symptoms and particular faults, so that it is possible to go straight to the component or group of components which might be faulty.

Symptomatic search does not necessarily require knowledge of the causal chain linking fault and symptoms. Duncan (1981) has devised ingenious techniques for training operators to recognise such symptoms. Symptomatic search is very effective, but other methods are also necessary when dealing with faults which have not been experienced by the operator or anticipated by the plant/training designers.

Rasmussen's second analysis concentrates on the information guiding the sequence of tests made during diagnosis. Rasmussen (1983b; Rasmussen & Jensen, 1974) distinguishes three types of sequencing :

1. 'topographic search' : the electronic maintenance technicians studied by Rasmussen usually used the wiring diagrams [the 'topography' of the device being repaired] to indicate the information flow in the equipment they were testing, and followed this in their sequence of tests, so they did not need to understand the system in order to check it. Rouse (1982) calls this 'context free search'.

2. 'functional search' : The technicians less frequently used knowledge of the system to test functionally related sub-units in the equipment. The result of one test (good/ bad) led to a functionally related next test.

3. 'evaluation' : the technicians had richer knowledge, a 'mental model', of the equipment which enabled them to relate system function and specific behaviour, so they could immediately go to components.

In analysing the evidence from process operation it is helpful to make a finer division. [Letters refer to lines in the following table. There is not only information about components and links between them, but also about dynamic changes over time.]

The operator could :

deal with each variable independently (A), or

could know which variables change together, the structure in the process.

Within such functional groups the operator could :

test each variable separately (B), or

know patterns of variable values which may occur in one of these groups, or use the functional knowledge to think out what might be wrong (E).

The specific knowledge about patterns might be either :

knowledge of the normal patterns of behaviour used to recognise that 'something' is wrong (C), or

specific fault patterns (D).

The alternative accounts of the information used in diagnosis can be mapped onto each other as in the following table. (For interest, a recent account of automated maintenance diagnosis is included : Davis, 1983). [For more information about Rouse’s study, see below.]

Which of these methods are used by process operators ? Rasmussen & Jensen (1974) found that electronic maintenance technicians primarily use topographic search. Technicians work with a variety of equipment and this strategy does not require special knowledge of each. The technicians do not show what would be considered classic problem-solving behaviour. What they do is less efficient in terms of testing procedure but requires less complex mental work. Rouse (1981) has shown that inexperienced technicians practising diagnosis on randomly connected components acquire some general skill which transfers to related real diagnosis situations.

In contrast, process operators work for many years with one system. They are not in need of a strategy which reduces the problems of dealing with several different systems each day, so one might expect they will show some form of context specific behaviour. The dangers of extrapolating from Rasmussen's maintenance technician results to process control are illustrated by Rouse's (1982) theoretical suggestions. He suggests that context specific pattern recognition behaviour should be easier than context free sequential search, the opposite of Rasmussen's findings with technicians.

In the Page et al (1983) study, the operators worked by recognising that a pattern of process behaviour is not normal (C), as far as can be identified. It is important to note that they did not have the sort of alarm annunciator panel (matrix of alarm lights) which is characteristic of English and US control rooms. Operators using those claim to be able to recognise specific fault patterns, unless too many alarms go off at once.

Process variables or plant components

The second question is whether the operators' patterns of knowledge are primarily in terms of relations between process variables, or between plant components. The maintenance technicians studied by Rasmussen & Jensen (1974) interpreted the results of their tests in terms of acceptability of components. This has led Rasmussen & Lind (1981) to recommend displays for process plant diagnosis which are based on the state of components, or of functional groupings in the plant. Their suggested process representation is a network in which functional components or component groupings are the nodes, and process variables are implicitly represented by branches between the nodes.

However an important difference between technicians and operators is that operators have a primary responsibility for maintaining plant stability [their primary responsibility is to keep the process ’safe’, faulty components are replaced by someone else], while maintenance technicians usually work on equipment which is not simultaneously in operation [their primary responsibility is to replace faulty components]. During normal process control the operators focus on acceptability of process variable values rather than the adequacy of the plant components. Baerentsen et al (1983) present the 'mental model' of operators controlling a conventional oil-fired power plant, based on the plant knowledge mentioned by the operators in verbal protocols and interviews. This mental model is a network with the variables as nodes, the emphasis of the representation. The functions relating the variables are represented as branches between the nodes.

This leads one to ask whether operators structure their knowledge of the present plant state, during diagnosis, with the main focus in terms of variables rather than components. The Page et al study (Bainbridge and Reinartz, 1984) shows that those operators thought in terms of explaining why variables were not behaving normally, for the purpose of which they checked whether component states were acceptable. This focus on variables has the advantage that it is usable for the parallel task of compensation. As mentioned above, once the operators have identified which component is not functioning correctly, at the level of component which influences the behaviour of the process variables, further exploration and diagnosis of the components is passed to technicians. There may be a different allocation of responsibility in different countries or in different industries, or even within one industry (de Keyser, 1984). However, as the results of this one example are the inverse of the recommendations made by Rasmussen and colleagues it is important to investigate this further.

Several different display formats could be available to the operators for the two tasks of diagnosis and compensation. However the Page et al study suggests that operators focus on the same aspects of process information in both tasks. Even if this is not the case, it could cause difficulties for the operator to use different displays for tasks between which they interchange as frequently and flexibly as they do during real incidents, unless both displays are available simultaneously.

Descriptive mode

The operator uses at least five different forms of information about the process. Three of these describe overall relations, and two describe events over time.

1. A representation of the cause-effect relations in the process, which focusses on the process variables, such as a signal-flow-graph.

An SFG shows each variable by a standard symbol, connected by branches labelled to indicate the dynamics of the connecting function. Standard symbols do not have the same mnemonic effectiveness as a mimic diagram, but this representation can show the main causal chains more clearly than the mimic as it can show energy flows and chemical changes explicitly.

2. A mimic diagram of the plant.

This shows the plant structures by symbolic representations (using the word 'symbol' with the display design rather than semiotic meaning) linked by the major flow paths. This shows the plant context within which process changes occur. Static mimics focus on plant mechanisms rather than process behaviour. We have seen that the operators' focus is on variables, this could explain why static mimics are not much used by experienced operators. Dynamic mimics can include analogue or digital information about variable values. For mass flows, mimics can show causal changes and conditions on events, for example by showing the status of pumps or valves. There is not necessarily a 1:1 mapping between mimic and SFG, and as they show different aspects of the process the optimum spatial layout for each may be different.

3. The geographical location of displays and controls on the interface, and of parts of the process on the plant.

In a well designed interlace there should be a meaningful functional mapping between interface and process. There is no necessary mapping between geographical location of parts of the plant and their function.

4. A representation of major phases during a process transient such as start-up or shutdown, extracting the most important causal chains and the most important dynamic changes in each phase.

This makes explicit information which is implicit in chart recordings, but which takes time and experience to extract, and is not represented in a form which is easy to think about.

In some task types, sequences of events can be described by a state-transition network, in which states of the task are the nodes, and actions which change the state from one to another are represented on the branches. I do not find such networks helpful in describing most process operations, as the networks require discrete states [while process changes are continuous], and networks include no mechanism for describing the dynamic changes in process state with which the operators' actions are concerned.

5. A chart recording of changes in the process over time, as a result of step changes in input variables or in manipulated variables. In theory there is a 1:1 mapping between this and the SFG. The SFG makes explicit ('compiled') the overall relationships in the plant, with a description of the connecting functions, from which its behaviour over time could be generated, given information about changes in input variables not under the control of the operator. In practice this prediction is difficult, especially as the connecting functions are often not known sufficiently accurately.

Inversely it should be possible to infer the causal relationships in the plant from the chart recording, which makes explicit the process behaviour over time, but it is only easy to do this when you already know what you are looking for. There are however impressive examples of operators who have discovered a great deal about observable functional relations in the plant, after extensive experience of processes which are not theoretically well understood.

There are therefore several types of plant description which do not map onto each other, and several types which do contain other information implicitly but to extract it takes knowledge and time. This is not to mention the potential that computer generated displays give for showing data which have been transformed or inferred in some way, e.g. displays for monitoring for off-normal states or for showing temperature distributions. These suggestions also do not cover all the different types of data which are used in cognitive skill, and all this ignores the fact that in many real plant the operators get much of their information from talking to other people.

Several of these display formats contain unchanging information about process structure, and the general properties of its behaviour over time, which do not need to be displayed to operators who are continually refreshing this knowledge by interaction with the process. In the diagnosis context the main problems arise however in unfamiliar situations, where knowledge support is needed.

Levels of knowledge

Within these types of information, the process operators' knowledge could be at several levels of detail. One of the many interesting things which Rasmussen has focussed discussion on is the use of computer generated displays which give information about the state of combined functions of the process or parts of the plant, rather than individual components. Rasmussen & Lind (1981) propose two hierarchies of combination :

1. 'aggregation' refers to the level of resolution at which parts of the plant are described, for example a pump or a cooling system.

2. 'abstraction' refers to the type of descriptive mode being used, e.g. a physical component or a mass-energy flow.

In practice the two types of usage may be correlated, as one type of concept may be more appropriate for describing a given level of detail. These proposed hierarchies raise the question of whether operators do actually think using different levels of representation.

Aggregation. The evidence suggests that operators do work at least two 'levels' of functional detail. They consider the behaviour of individual process variables, but they also think in terms of aggregates such as cooling. Individual variables are mentioned repeatedly in their reports and conversations. Evidence that they think of aggregates is given by their behaviour in real incidents, for example when they are questioning whether to follow a procedure because it will remove the availability of a cooling function.

Other considerations however suggest that it is not possible to identify one 'level' of a given variable or component which is true in all circumstances. Instead, the knowledge is in the form of a heterarchy/ network, which appears as a hierarchy in relation to the focus of attention in a particular task. Consider for example the relation between main cooling water pump and reactor in a PWR. The reactor and pump are at different 'levels' from the perspective of production, as the reactor is primarily concerned with energy production while the pump is part of a subsidiary function of maintaining process efficiency. However the pump and reactor are at the same 'level' from the point of view of fault management, as the most effective compensatory action if the pump fails is to reduce heat output from the reactor. This suggests that it might be rash to devise separate displays for the operator which show process information at different levels of aggregation, or at least this should only be done after careful study of the contexts in which a given piece of information might be used.

The evidence quoted from the Page et al (1983) study suggests that operators work mainly with changes in process variables, and the flow or other functions which can affect these. At least in this example they do not go down to the level of combinations of components, within a pump for example. This level of consideration is handed over to the technicians.

Abstraction. Rasmussen and Lind (1981) use this term to refer to the type of 'descriptive mode'. They relate descriptive modes to a hierarchy, and suggest that the operator is concerned with at least three levels of abstraction :

- the purpose of the process, the goal : 'why' things are done.

- the nature of the process : 'what' it does.

- the physical properties of the process : 'how' functions are implemented.

Actually it is confusing to use these 'why', 'what', 'how' words in association with a hierarchy of abstraction, as each of these words can be used within one 'level' of 'abstraction'. Suppose for example that the operator knows that :

increased fuel flow increases temperature.

This information can be used to answer several types of question :

what happens if fuel flow increases ?

why has temperature increased ?

how can temperature be increased ?

why has fuel flow been increased ?

It cannot be used alone at this level to the answer the question :

why has temperature been increased ?

which can only be answered by reference to the next item in the causal chain, i.e. to what is affected when temperature changes.

Operators do tend to give explanations within this type of description, e.g. Ainsworth & Whitfield (1983) (MA.3 at 0.12) :

'the temperature on this mill has perhaps dropped by 2 degrees, that's all, since we started, that's as a result of reducing in the PA [primary air] flow, because we have put very little coal in the mill'.

This might not be what an engineer would consider as an explanation, but we are concerned here with the mental models of operators not the mental models of design engineers. However, although an operator gives this level of explanation while under pressure of work, they may be able to give a fuller account in an interview. Cuny (1977) has done extensive interviews to investigate how much operators do understand about their process at the technical rather than the functional level. He found that less than a third of explanations given by 3 experienced operators were at a technical level, the majority of explanations were in terms of empirically observable relationships.

There is an interesting and important question about the level at which operators need to call on specialist advice when responding to plant failure. Within the seven nuclear power incidents analysed above, there is one example when the Technical Support Center was manned (Woods, 1982, G/5). There was extensive discussion between the TSC and control room operators about whether use of the safety injection pumps should be continued or terminated. From the summary provided, it appears that both were concerned about the process behaviour, and with establishing the availability of future recovery functions, given insufficient knowledge about both the present state and system response. From the data provided it appears that they differed in the priority which they assigned to the availability of different functions, rather than in their knowledge of the process dynamics or the way they reasoned. (One might add that in retrospect it appears the TSC were incorrect.)

General Aspects Of Interface Design And Performance Prediction

Knowing what the operator needs to know does not ensure a good interface. If this information is conveyed badly, perhaps using indiscriminate symbols or a layout which does not clearly map the structure, then this sort of factor, which is apparently a detail, can outweigh the effect of an excellent task analysis. There are three main aspects to consider. However most of the issues are not unique to process control so they will not be discussed in detail here.

Perceptual-motor skill. This heading refers to the physical and cognitive skills of using the interface, of looking at or reaching to the correct place on the interface and interpreting the information there, rather than to the perceptual-motor skills of controlling the process. It should be trivially easy to find a particular place in a data base, and to interpret the information once it has been found. If the operators have to solve problems in order to obtain the information they need, this will interrupt their thinking about their main task. The above analyses of real nuclear incidents gives many examples where the operators' problems were increased by the interface.

Performance capacity. Engineers looking for advice from ergonomists tend to ask for absolute numbers for performance levels. 'We don't want to know about cognitive processes, just tell us what is the human error rate/ information transmission capacity/ memory capacity/ perception capacity, and we will design the system accordingly.' Unfortunately the task categories used as a basis for asking for such numbers are too simple.

[And it is important to know the cognitive processes, to know why and what are the best interface supports.]

Suppose for example one asks for human failure rates in 'deductive reasoning'. The '3-term series' problem (if A is bigger than B and C is smaller than B, which is the biggest ?) is a task used in reasoning studies which supplies a simply described example. Hunter (1957) found (to adapt his results for this purpose) that the number of people who can say which is the biggest item within a given time period depends on the way the task is presented :

(With reference to the previous discussion of patterns and language, and the best interface design for a specific task, imagine the effect on performance in this task if visual analogues of line length had been used, rather than a propositional sentence.)

If changing the presentation of the information can more than double the 'failure' rate, then the detailed transformations that people have to carry out, to get the information in a form in which they can do the task (see Johnson-Laird, 1983), may have more effect on failure rate than the overall task of 'deductive reasoning'. Detailed knowledge of the cognitive processes involved in a task, and how these are affected by interface design, may be necessary before performance predictions can be made with any accuracy.

It is important to have data which identify priorities for financial investment to improve operator performance. In many cases, available data on relative performance with different types of equipment can be used. Time pressures tend not to be critical in process control, so engineers are concerned only about interface aspects which will double the time taken, or increase human error rates by an order of magnitude. In office automation, and rapid response tasks such as flying, much smaller differences in performance become critical. This paper concentrates on the content of what should be displayed, rather than the technology of how it should be displayed, so the relevant data will not be reviewed.

Working memory. The operator's knowledge of the current state of the process provides the context for making rapid wise decisions. Knowledge about how this information develops, and what form it takes, are important in design decisions about manual take-over, or the number of VDU pages of information which must be available simultaneously. For example, in modest installations where investment does not justify installing more than one or two VDUs, it might be better to get the computer to drive conventional instruments, so that all the information needed in a decision can be available together.

Interesting recent work by cognitive psychologists supports the notion that working memory develops as a function of thinking about the task, using knowledge available in longer-term memory about what potentially could happen, and that this working memory provides the context which determines optimum future thinking, e.g. Chase and Ericsson (1981), Johnson-Laird (1983). (Confusingly for us, what Johnson-Laird calls a 'mental model' is more akin to the notion of working storage as used in ergonomics - while ergonomists may use 'mental model' to label the knowledge of the process, rather than awareness of its current state.). These studies reinforce and expand our understanding of the underlying cognitive processes, but they do not have added implications for design issues so will not be discussed here.

Obviously there is a daunting amount of research to be done, on the best techniques for task analysis, the best interface formats, and the best mapping between the two. Some recommendations can be based on what is known to cause difficulty in using an interface. The memory load problems caused by having to call up a sequence of VDU pages carrying information which needs to be cross referenced, argue strongly for having sufficient VDUs to display all the information needed in any one decision, and without having to do information extraction tasks which require complex cognitive processes.

Conclusion

In the last few years enough detailed studies have been made, of the behaviour of operators during plant failures, to be able to draw quite strong conclusions about what operators do and how this should be supported by interface and training design. Data on cognitive processes [by process operators, not by people doing any sort of 'diagnosis' task] is essential for this to be possible. Operators show high levels of cognitive skill, so such studies must be done on experienced personnel. The studies suggest that the operators use the same type of information structure during diagnosis as they do during control, i.e. one in which the focus of attention is on process variables rather plant components. Their behaviour is very flexible, so care must be taken not to restrict this flexibility by giving displays which can be used only for limited purposes.

The author would like to thank Susan J. Reinartz (formerly Page) for comments on an earlier version of this manuscript.

General References database

Some other relevant sources which appeared between 1984 - 1997 :

1988 Addendum :

Types of Knowledge and Display Design

Two of my recent papers are relevant to this :

Bainbridge (1987) VDU/VDT interfaces for process control.

Bainbridge (1988) Types of representation.

Charles Brennan's M.Sc. Thesis (1987) compared mimic and signal-flow-graph representations. His results suggest that mimics are better display formats for observers who know about the process being studied, as mimics provide many reminder cues about things to consider. SFGs may be better for inexperienced observers who do not already know the underlying causal structure.

1997 Addendum

Some other interesting papers

Hukki and Norros (1993) Diagnostic orientation in control of disturbance situations. Ergonomics, 35, 1317-1328.

Marshall et al (1981) Panel diagnosis training for major hazard continuous process installations. The Chemical Engineer, 365, 66-69.

Patrick (1993) Cognitive aspects of fault-finding : training and transfer. Le Travail Humain. 56, 185-210.

Shepherd et al (1977) Control panel diagnosis : a comparison of three training methods. Ergonomics, 20, 347-361.

Access to other papers via Home page

Search This Blog

Complex Cognition, papers by Lisanne Bainbridge

Diagnostic skill in process operation

Comments

Post a Comment

Popular posts from this blog

Ironies of Automation

9. Final comments

Complex Processes Review : References