Inferring from Verbal Reports to Cognitive Processes

 This paper focusses on the analytic techniques and limitations of using verbal protocols as evidence about cognitive processes, rather than on the findings.

Topics :


Preparing the material : 

- using the analyst’s natural language understanding and task knowledge to separate individual phrases, and to combine phrases into groups.

- inferring activities that are not mentioned in the report.

Analysing content : 

- developing reliable categories to count the frequency of.

- choosing categories which include the information needed to answer the empirical question.

- similar methods for groups of phrases.

Inferring determinants of sequences in the content :

This is possible when there are many examples of the same behaviour, plus detailed information about the state of the environment whenever behaviour changes, and about items which the speaker has previously considered which may still be in working memory.

Inferring from Verbal Reports to Cognitive Processes

Lisanne Bainbridge

Department of Psychology, University College London

in Brenner, M., Brown, J. and Canter, D. (eds.) The Research Interview : uses and approaches. 1985, Academic Press, London, pp. 201-215.

'Think aloud’ verbal reports are frequently used to obtain information about cognitive processes during complex behaviour.   This chapter, which is primarily methodological, describes techniques that are available to analyse these data. The first task is to develop the verbal material into a form for further analysis, by inferring referents and connecting material.  Because verbal data are too rich in detail for complete analysis, the next stage is to develop reliable categories for describing the material, relative to a particular empirical question.  A description of the material in terms of these categories can be made from frequency counts of the categories.  It is more complex to identify sequences of occurrence of statement types which hopefully describe the programmes/ routines underlying a particular cognitive process or the decisions determining the sequence of behaviour.

What can verbal reports tell us about how speakers think and the knowledge they have ?  This is obviously a very broad question, and the brief discussion here will concentrate on methodology.  The examples are oriented to the analysis of data from operators in an industrial process control plant.

Recent commentators, particularly Nisbett and Wilson (1977), have shown that verbal reports may not be a valid description of mental processes.  Indeed, there is no way in which the validity of a report of mental processes can be checked independently.  There have been several types of reaction to Nisbett and Wilson’s paper (including some discussion by Brown and Canter, Chapter 10 in this volume).  White (1980) has criticised them on both theoretical and methodological grounds.  Other writers, such as Smith and Miller (1978) and Bainbridge (1979) have emphasised that, as verbal reports give interesting data, it is important to try to find the circumstances in which their validity is high.  Problems with the validity of reports may arise because the underlying thought processes are not available to conscious access or are not verbal in form, so that the thoughts are distorted in translating from one medium to another.  Ericsson and Simon (1980) have discussed the different types of verbal report that are required in different experiments and the types of data that may be available for verbal report.  They suggest a general model for the way in which verbal reports may be generated, and review the available data.

This chapter hurdles over the validity issues and instead concentrates on how verbal material can be analysed.  At a minimum such an analysis will lead to hypotheses that can be investigated further by other methods.  The assumption will be that verbal reports provide "strong" data that can be analysed in detail, although the full possibilities will not be explored here.

The question of what the speaker thinks and knows is obviously a broad one.  The coverage must be very brief here. The methods so far available are still inadequate and incomplete, more detailed discussion is available in Bainbridge (1979).  The methods used follow from those developed by Newell and Simon (reported in e.g. 1972).  They used the approach of asking someone to 'think aloud' while doing a task.  This gives material that differs in syntax and type of content from the material obtained in interviews.  This is illustrated in the following report fragments :

1.  I shall have to cut [furnace] E off, it was the last to come on, what is it making by the way?  E make stainless [steel], oh that’s a bit dicey, I shall not have to interfere with E then.

2.  If a furnace is making stainless, it’s in the reducing period, obviously it’s silly, when the metal temperature and the furnace itself is at peak temperature, it’s silly to cut that furnace off.

These two examples come from the same furnace operator, the first while he was doing the [furnace control] task, and the second during a lull in activity a few minutes later, which he filled by talking about his general control strategy to the investigator.   

[In this paper, phrases from a report are in italics, categorisations of the phrases by the analyst are in bold.]

Further data on the difference in verbal behaviour in different verbalising tasks has been found by e.g. Benjafield (1969).  One infers that in an interview situation the verbal reports give better information on the content and interrelations in the speakers knowledge, rather than on how this knowledge is used in a particular task, while verbal protocols collected from someone who is actually doing a task give information about the dynamics of the use of knowledge, but not about its full range.  For further speculations on this see Bainbridge (1979, Tables 2 and 3).

As will be seen, natural language understanding and obtaining agreement between judges are the basic  tools used in the data analysis.  I have been fortunate to study industrial process control situations, in which the things that the speaker might be talking about are more or less constrained to the process plant and product, so that identifying the range of the speaker’s referents is relatively simple.  In an investigative interview with more wide ranging referents, the problems of analysis are much greater.  This problem is at its most intense in trying to understand schizophrenic language, which is well known to be difficult to interpret (see e.g. Salinger et al, 1970).


The first stage is to segment the verbal report into the basic analysable units.  This can be done at two levels : dividing the material into a sequence of separate phrases, and into groups of phrases.  One then infers the referents in the phrases, the interconnections between them, and any missing material where possible.


[The recording of the spoken report is transcribed into written form, then] the continuous text is divided into phrases, using the analyst’s natural language understanding.  The following piece of report will be used in the examples.  (The dots indicate pauses in the audio recording, when the operator did not say anything.)

C is on oxidation now that’s something you can make an estimate for it’s a quality so I must leave it alone. . . oxidation average length is one hour 30 minutes for C and started at time zero no it didn’t it started at time 33 minutes how confusing of it so it’s got nearly one and a half-hours to run . . . I’d better check that oxidation for C one hour 30 minutes started 50 minutes ago so it’s got 37 minutes to go. . .

Separated into phrases this text becomes :

1 C is on oxidation

2 now that’s something you can make an estimate for

3 it’s a quality

4 so I must leave it alone

5 oxidation average length is one hour 30 minutes on C

6 and started at time zero

7 no it didn’t

8 it started at time 33 minutes

9 how confusing of it

10 so it’s got nearly one and a half hours to run

11 I’d better check that

12 oxidation for C one hour 30 minutes

13 started 50 minutes ago

14 so it’s got 37 minutes to go

Notes : in this section of protocol, the controller is talking about a furnace called 'C'.  Phrases 1-4 talk about general properties of the ’oxidation' stage through which the furnace is going, and phrases 5-14 make an estimate of the time at which C will finish oxidising.  Figure 1 shows graphically how these phrases are interrelated [see justification in next section].  The average length is given in a job aid booklet, and the time the stage started is on the controllers' display panel.

[The analyst needs to know about the industrial process and the operator’s task and interface to be able to make these interpretations of the verbal report.]

The division of the text into phrases (which might loosely be described as minimum grammatical units, though the language in such reports is often not at all well formed) is done by natural language understanding of judges.  This can often be done by people who have no knowledge of the specialist content of the material.  Because these judges are using unobservable processes to make the analysis, it is necessary to use several judges working independently and to measure agreement between them, either by counting the percentage of occasions when they agree or by using a statistical technique to measure concordance.

Figure 1 : Graphic representation of the interrelationships of phrases 1-14.

Combining phrases into groups 

Two methods can be used for combining phrases into groups; both make use of the semantic content.  The first approach is to identify the pronominal referents, as these indicate cross-references between phrases.

There are three ways of identifying pronominal referents.The most reliable can be used for identifying pronominal referents in a description of an ongoing task.  It requires an independent record of the states of the environment during the task, from which the referents of such phrases as ’it’s at 35 now’ can be identified.  Another method involves going through the report afterwards with the speaker to check on the meaning (this can only be done if the verbal report is recorded in short segments and transcribed immediately).  This may raise some problems as it allows an additional opportunity for the speaker to rationalise what he or she was doing, perhaps adding after-the-event material that the speaker was not thinking about at the time.  The third method is to use the judges’ semantic knowledge of the task.

After the links between phrases provided by pronominal referents have been made, further groupings can be identified on the basis of judges’ knowledge of what items go together in the task.  The result of doing this for the material above is shown in Figure 1.  Because a judge is using his or her own semantic knowledge to make this analysis, many people trying this for the first time think that all they are managing to do is to attribute their own knowledge to the speaker.  This is in fact unavoidable as it is how all language understanding is done.  However the method can at least be given reliability if this grouping is done by several judges and a concordance obtained.

One can also take advantage of this necessity by making an explicit record of one’s own knowledge used in the analysis, and using this as a record of the knowledge one is attributing to the speaker. For example, in Figure 1, lines 1-4 list properties of C furnace and lines 5-14 recount a method of calculation.  These points, as shown in the notes above, are also what one has to mention a person unfamiliar with the task before he or she can understand the report.

Inferring connecting material

As mentioned in the introduction, there are many reasons why verbal reports may be incomplete.  

One is that the speaker may not mention things that he or she thinks are obvious to the listener.  These are inferred, both in natural language understanding and in the present type of analysis.

Another reason for incompleteness is that thought is often faster than speech. It is not possible to reconstruct material that has passed through someone’s mind so quickly that no clue has been given about it in the report.  An example of this type of thought is that possibilities may be reviewed and rejected very rapidly while problem solving. 

[Also thought may be non-verbal, and many people have difficulty putting their thoughts into words.]

It can be possible however to reconstruct some types of intervening material when the speaker says something that must be the result of some thinking that he or she has not mentioned.  For example, in lines 10 and 14 of the above example the speaker states the result of a calculation, so he must have made the calculation in some way.  The report does not necessarily indicate how the intervening processes were carried out, only that they must have been done.

However in some tasks in which the same situation recurs frequently, as happens for example in many industrial process control tasks, it may be possible to combine what is said on different occasions to obtain a fuller account of what is happening.  In the above example, lines 5-10 can be interpreted as :




(Lower case indicates operations that are inferred, upper case items that are explicitly mentioned.  

This interpretation and generalisation has already involved considerable inference about underlying processes, this will be discussed further below.)

When these lines are combined with lines 12-14, a complete picture of the processes between line 8 and line 10 can be inferred :

(5) 12 read STAGE LENGTH


read time now

13 TIME SO FAR = time now - stage start time = Y

(10) 14 TIME TO GO = stage length - time so far = X

Together with the results from the identification and grouping of phrases, these inferences give the material that is used in further analyses.


There are many styles of analysis called content analysis.  These typically involve counting frequencies for categories of material.  There are standard computer programmes that count word frequencies, which have been used in many studies (see e.g. Hays, 1967; Dolezel and Bailey, 1969; also Tagg, Chapter 8 in this volume).  This section will give brief examples of analyses using frequency counts on categories of material that are more complex than words.

Having prepared the material as described in the previous section, one can analyse the frequency of occurrence of particular categories : within phrases, whole phrases, groups of phrases.  These categories, their instances, and the frequency counts can be used as the origin for further analyses of category members or of the contexts in which they occur.

[Reliable categories]

One of the main problems is to develop the categories to use, because they must be both relevant to the investigation and reliably usable by the judges who assign the material to them.  Rasmussen and Jensen (1974) describe the iterative method that must be used in developing reliable categories [categories which give repeatable results when used by many judges].  First, several judges independently develop a set of categories.  Then they attempt to use each other’s categorisation schemes.  This both pools the inferences the judges have made about the important distinctions to be made in analysing the material, and also tests whether different people can repeatedly make the same allocation of material to the categories.  If not, then an analysis using these categories cannot give reliable [repeatable] data, and the categories must be revised, again with the judges working independently during development, and coming together for assessment.

[Categories which include the information needed to answer the empirical question]

The categories developed, to be useful, must be ones that encourage further inferences about the material.  For example, the distribution of frequencies may suggest emphasis in the way the speaker thinks about the topic, or the categories can be used simply as a preliminary sort before further analyses. For example, the analyst could look further at all phrases that have been categorised as "comment on own behaviour" to see if the phrases have any common properties.  If the categories are based on semantic or syntactic aspects of the reports, one might wish to count only the occurrence of overall concepts (e.g. birds) or to count the frequency of individual instances (e.g. robins vs. blackbirds).  Different aspects of semantic structure could be differentiated.  For example one might wish to distinguish between active and passive voice as reflecting different emphasis by the speaker.  Or one could differentiate between different types of conditional statement, for example comparing "A therefore B" with "B because A".  

Whether one can do this depends not only on whether one is interested in this level of analysis but also on whether the concepts at this level occur with sufficient frequency to make a frequency count something more informative than a simple listing of categories used.  The categories may also imply inferences about the types of cognitive process underlying them : for example "statement of fact" and "comment about own behaviour" may imply different types of underlying cognitive activity.  The categories are therefore always a function of particular empirical questions.  if there is a set of categories that can be applied in many different circumstances, the categories are likely to be so general that the results of using them will not be very rich.

To give further examples of the categorising approach, we can look at the task of identifying characteristic structures within the phrases of reports made during an industrial process control task.  We will look at categories at two levels : the types of referent words that it is useful to identify within the phrases, and the characteristic patterns in which these occur.  For example the phrase

the temperature is 45º

can be interpreted as a statement of the form

VARiable has VALue

[There’s a shortage of coding tools. In this paper capital letters have been used with 2 meanings :

In the previous section, upper case indicated what is explicitly mentioned in the report.

Here capital letters indicate categories of content in phrases.]

The phrase

the steam pressure is 101

can then be categorised as a statement of the same form. 

Having identified all statements of this type, one might then look at the categories further, e.g. finding how many different instances of "VARiable" occur and with what frequency, or concentrating on the "VALue" instances and seeing how accurately they are specified.  

One also has to consider whether to categorise 

there’s steam temperature - it’s rising

as a statement of the same type, or whether this would loose some of the information in the report, so this statement should be interpreted as

VARiable has CATEGorised VALue

This statement also gives an example of the way in which categorising can retain information considered important for a particular analysis but loses other aspects : this phrase is also syntactically different from the previous instances, but it has been assumed this is not relevant to the question about how VARiable VALues are processed by the speaker.  The syntactic change rather than the semantic one might, of course, be the emphasis of an analysis being made for other purposes.

This simple statement type also occurs as a component in more complex ones, e.g.

I’ll try to run the temperature down to about 400º

ACTion gives VARiable has VALue

The statements of this type in the report give a sample of the speaker’s knowledge of how changes in the outside world can be effected.  Again the way in which this is expressed may give useful information.  The speaker can also give information about his or her knowledge of the conditions under which certain effects occur, which indicate his or her knowledge of the wider interactions in the plant behaviour :

we have to have 50º superheat before we can run it up

could be described as

when VARiable has VALue then ACTion gives VARiable has VALue

or more simply, if one is not interested in distinguishing between different ways of expressing conditional knowledge (e.g. to test whether they typically occur in different task context) as

ACTion given VARiable

A collection of condition statements made by the speaker also gives a sample of their knowledge.  In the process control task, it is interesting to ask what other event sequences in the plant a speaker might be able to think about by following through sequences of the knowledge of conditions on events that they have expressed.  Unfortunately there are two problems with this type of data.  One is that, in the situation of thinking aloud while doing the task, one will only obtain a limited sample of the controller’s knowledge, relative to the particular situations in the test period, so this corpus of the speaker’s knowledge will be very incomplete.  A fuller corpus may be obtained from interviews (see e.g. Cluny, 1979).  However, there is also a problem with this.  Cooke (1965) found that a controller may be able to express some knowledge about a process, but his or her control actions may not reflect this knowledge.  In other words, the speaker may make a statement about knowledge that is solely at an 'intellectual' level.  And also vice versa [their behaviour shows they know something which they do not mention in an interview].  Anyone analysing verbal reports must always keep this sort of proviso in mind.

[Grouping of items]

These examples have been from the identification of categories of material within phrases.  Similar methods can be used for studying categories of phrases, or groups of phrases, though the categories are necessarily more general and therefore more care may be needed to ensure that they are unambiguous to judges.  As an example, the phrases in Figure 1 might be categorised as follows [using the small group of categories : fact, prediction, strategy, comment] (these categories are just given as an example and have not been properly tested for reliability as described above ) :

1 fact

2 strategy

3 fact

4 strategy

5 fact

6 fact

7 comment

8 fact

9 comment

10 prediction

11 strategy 

12 fact

13 fact

14 prediction 

Again it may be useful to distinguish sub-categories, depending on the question one is interested in and the amount of data available. It might e.g. be useful to distinguish between statements of fact about past and present, or between comments expressing general strategy compared with rules for behaviour at this particular time.

The categories used in describing groups of phrases have to be even more general, i.e. there may be an even more distant relationship between the actual material in the report and the way in which it is described.  As an example, the explanatory notes given for the main example in this chapter are equivalent to a categorisation of types of activity in that piece of report.  Rasmussen and Jensen (1974) and Umbers (1981) give examples of this type of analysis in maintenance and process control tasks.


Rasmussen and Jensen (1974) use their analysis of groups of phrases into categories as a basis for studying sequences of activity, to identify the speaker’s general strategies.  This can also be done at the level of sequences of individual phrases.  The level of categorisation at which sequences in the material can be sought depends on the range of referents in the material. 

For example, in the fairly small 'world' of controlling a simple process plant [in which the same control situation occurs repeatedly], the frequency of occurrence of very similar statements is sufficient to allow one to analyse the sequence of activity phrase by phrase.  

Rasmussen and Jensen studied maintenance technicians.  In each report the speaker was working on a different piece of equipment [with a different fault], and each time maintenance activities differed.  Consequently, in this type of task the behaviour is not repeated at the level of individual phrases.  To search for common properties of the behaviour in these different situations, one must look at a more global level.

The most frequently used, fully specified, rigorous procedure for analysing sequences is to make a Markovian analysis, i.e. to find the probability of transition from one item to another.  Unfortunately, this method gives a very limited description of the properties of a sequence.  

For example, one could make a Markovian analysis of the tune of 'Three Blind Mice', obtaining a table of the probability of a note at one pitch being followed by a note at each of the other notes in the tune.  This is not, however, a very helpful description of the tune because 'Three Blind Mice' is not a probabilistic sequence.  It is exactly the same each time, and this important feature has completely disappeared in the analysis.  Markovian analysis can be a useful preliminary technique to determine which transitions are most frequent and therefore are most likely to be rewarding to study further.  

However if one prefers to assume that the people producing verbal reports are not acting in a random way, and that their reports reflect activity that is at least to some extent structured and repeatable [and related to features of the task environment]  (one may even wish to infer the goals that underlie this structure), then one would like to use techniques that increase the probability of finding more determinate sequences in the behaviour.  The techniques that will be described are also applicable to sequences in non-verbal behaviour. 

Sequences of phrases

The sequence in which individual items are mentioned in the report can indicate the standard "routines" or  "programmes" with which the speaker thinks about a particular topic.  One can only reach any strong conclusions about this, however, if one has several examples of each type of behaviour.The reports are always incomplete at some level, and one may have several hypotheses about the processes intervening between two phrases.  Unless one has other examples of the same behaviour, which constrain the number of hypotheses that can be used to account for them, such an analysis remains very speculative and unwieldy.  For example, in the main example as analysed in the section above on combining phrases into groups, it would be possible for phrases 8 and 10 to be linked by the following calculation :

    stage end time = stage start time + stage length

    time to go = stage end time - time now

The actual method that was used is indicated by phrase 13, which appears in another occurrence of the same behaviour.  That example also makes it clear that in this sequence analysis one is working with "categorised" phrases, which have been identified as representing a particular type of general activity, rather than with the individual details of the language in which these activities were expressed.

The frequently occurring sequences identified in this way might be considered as cognitive programmes, and this is the main technique of researchers who use verbal protocols as data when developing simulations for cognitive processes.  There is a fairly extensive literature on this type of theory development, of which Newell and Simon (1972) is a classic example.

Sequences of groups of phrases

Identifying the sequence of groups of phrases is more difficult because one wants to infer what influences the speaker to move from one topic to another.  One cannot take the speaker’s word for it, for as Nisbett and Wilson (1972) have shown, speakers do not necessarily have good access to this type of information about their behaviour.

To do this analysis one therefore needs a record of the earlier report because previous items discussed (if taken as reflecting the speaker’s thoughts) can affect the choice of later behaviour.  A record of the environment is also needed because changes in this may influence what is the most appropriate item for the speaker to consider next.  Again, making a record of the environment and its changes is relatively simple in a small "world" such as a simple process control task, which may be monitored in full during the time period that the controller is thinking aloud.  It may be much more difficult, or impossible, in interview situations with a wide range of possible referents, and so analyses of this type of material must be much more speculative.

To do this analysis one identifies the instances of transition from one type of behaviour to others, e.g. behaviour A may be followed by behaviour B or by behaviour C.  This can be identified from a Markov analysis. One then looks at the whole context of the speaker’s behaviour and the environment to see whether any dimension consistently has one value when A is followed by B and another when A is followed by C.  If so, then one infers that the value of this dimension determines the behaviour sequence at this point.

The example given here again comes from a process control task.  The speaker frequently made remarks such as "it’s above now", "it’s below", and the problem was to identify what dimension of the process he was using to make this judgement, i.e. on what dimension did the values determine whether he used behaviour X (judge 'below') or behaviour Y (judge 'above').  There were two main candidates for the basis of his judgements : a display that showed the total power being used at the time, and a display showing the discrepancy between present power usage and target power usage.  Table 1 shows the distribution of judgements at different levels of the total power display [target value 50].  Table 2 shows the distribution of judgements at different levels of the discrepancy meter.  It is clear that the use of the judgements correlates with the discrepancy meter reading and not with the total power display, so the speaker is assumed to be using the discrepancy meter reading in making his judgements.

Note that this example does not come from the analysis of sequences of groups of phrases, but here the technique has been used to identify pronominal referents.  [The same method can be used, when there is enough evidence, to identify the reason for changes from one behaviour to another in longer segments of a report.  This example has been used here because the result is very clear.]  This illustrates that many of the techniques used in analysing verbal reports can be used to study several levels in the organisation of the material.

Methods of identifying the sequence of sections of the verbal report can be used in a specific way to identify a speaker’s decision determinants and make inferences about their working memory (see e.g. Bainbridge, 1975), or in a more general way to identify the speaker’s overall strategy (see e.g. Rasmussen and Jensen, 1974).


Papers that discuss the difficulties of collecting verbal reports and the distortions they may contain have been referred to in the Introduction. This methodological review takes an optimistic approach, even though analysing verbal reports is not easy, nor are there many time-saving techniques that can be applied.

The choice of complex behaviour [by the speaker] is influenced not only by immediate circumstances but also by planning in relation to the predicted future or by reference to similar past events.  It is difficult or impossible to get sufficient evidence from observed non-verbal behaviour to suggest or constrain hypotheses about such cognitive activities. As a consequence, we know very little about the processes underlying complex behaviour. Verbal report analysis is currently one of the richest ways of investigating the nature of behaviour that is a function of either past or future.  The methods described here make explicit the flexible analytic techniques that can be used.

General References database

Access to other papers via the Home page

© 1997, 2021  Lisanne Bainbridge

= = = = =


Popular posts from this blog

Ironies of Automation

Types of skill, and Rasmussen's SRK schema

Complex Processes Review : References