John Staddon
Duke University
As I look back at my own research life, I am occasionally amazed to see things that seem absolutely obvious now but were completely unrecognized at an earlier time, common practices that were then taken for granted, but now seem absolutely mistaken. One example is what I have called the object of inquiry problem. An experiment or series of experiments arrives at a result or a law. The law is assumed to describe the organism, but often it says more about the system, the organism and the associated apparatus and reinforcement contingency, than the organism alone. But first, an example of how relevant procedural details can be missed.
Neglected variables: The ‘frustration effect’
Frustration theory was a dominant idea in the 1950s and ‘60s but has fallen into desuetude in recent decades. The theory rests on a plausible datum: that failing to get a reward when you expect it ‘energizes behavior.’ The door won’t open to your first push, so you push it harder. An organism gets frustrated and consequently runs or presses faster — does more of whatever response he’s been using to get the food — when expected food fails to appear.
The definitive proof for frustration theory was a 1952 experiment reported by Amsel and Roussel in their paper in the Journal of Experimental Psychology. In the experiment, hungry rats were running in a “double runway” — that is, a runway with two goal boxes: one a short way from the start box and the second some distance after that. A short first runway was 3.5 ft. followed by a much longer second one (10 ft.). In training, the rat runs to the first goal box (midbox) and gets a bit of food, then it runs to second goalbox (endbox) and gets another bit of food. Thus, the rat learns to expect food in the midbox. The experimenter measures how fast the rat runs in the first part of the long second runway.
In the second phase, the rat gets food in the midbox on only half the trials: rewarded on half, ‘frustrated’ on half. The question: How fast does it run in the long runway after food and after no-food, in the midbox? The answer is: after training, it runs faster after no-food in the midbox compared to when there is food, especially in the first third of the long runway. This is the frustration effect (FE). Abe Amsel, author of the idea, attributed the effect to similarity. In the first phase the rat learned to associate the midbox with food, yet now (on half the trials) suddenly there is no food, hence ‘frustration’. The more similar a given situation is to one where food is expected, the greater the frustration when food is omitted — hence the more vigorous the response.
There is a conceptual problem with this idea. It implies that frustration increases as the similarity of the no-food situation to the food situation increases. But the two situations are of course maximally similar when there actually is food. So, should frustration be maximal when the rat actually gets food? But I am concerned here with the structure of the apparatus, the double-runway. (a) Why is the second runway much longer than the first, especially as measurements show that most of the effect of nonreward is on the time taken to traverse the first bit of the long second runway? Why not just a short second runway? (b) Why so many rats? The ‘frustration effect’ is a within-subject effect, so why use 18 animals, why average? In fact, at that time, outside Skinnerian enclaves, the norm was group-average experiments. I doubt that Amsel thought twice about his method. The same excuse cannot be offered for Darryl Bem, a well-known social psychologist who in 2011 used a hundred human subjects to demonstrate precognition. Given that talent, Bem should have known that his results which, if valid, would have required no more than one or two subjects to demonstrate, proved to be unreplicable (see here and Stuart Ritchie’s book Science Fictions (Metropolitan Books, 2020).
It turns out that the long second runway was critical to Amsel’s result, because it imposed a small delay between food in the midbox and food in the endbox. Experiencing a time-to-food delay, initiated by a memorable event like a bit of food, animals invariably learn to hesitate. The hesitation is controlled by the food; absent the food on the 50% unreinforced trials, the rats run faster. In other words, the frustration effect is an example not of excitation by nonreward but inhibition by reward: midbox food, signaling a delay to the next food, causes the rat to hesitate, hence midbox “no food” leads to faster running.
The neglected independent variable was the length of the second runway. Had the experiment been done with a 20-ft rather than a 10-ft second runway, I don’t doubt that fewer rats would have been needed to get a significant ‘frustration’ effect.
So, when designing an experiment it is vital to ensure that the details of the apparatus and reinforcement contingency do not tend to favor an initial prejudice. Every detail of the procedure including, in the FE case, the length of the second runaway, should be scrutinized to ensure that it does not bias the result in some way. I return to his topic in connection with operant choice in just a moment.
Behavioral Economics: Prospect Theory
But first, a version of the object of inquiry problem. The father of experimental medicine, Claude Bernard once wrote “Science does not permit exceptions.” But statistics, the null hypothesis significance testing (NHST) method, exists because of exceptions. If an experimental treatment gave the same, or at least a similar, result in every case, statistics would not be necessary. But going from the group, which is the subject matter for statistics, to the individual, which is the usual objective of psychology and biomedicine, poses problems that are frequently ignored.
A couple of examples may help. In the first case, the subject of the research really is the group; in the second, the real subject is individual human beings, but the group method is used.
Polling uses a sample to get an estimate of the preferences of a whole population. Let’s say that our sample shows 30% favoring option A and 70% favoring B. How reliable are these numbers as estimates of the population as a whole? The answer depends on the size of the sample in relation to the size of the population. If the population is little larger than the sample, the sample is likely to give an accurate measure of the state of the population.
If the population is large this method is not possible. For this, a model is needed. Whether the population is large or small, the aim is to draw conclusions not about individual decision makers, but about the population as a whole. The method does not violate Bernard’s maxim. Since the conclusion is about the group, there are no exceptions.
Not so with the most famous choice experiments in psychology: the prospect theory studies of Daniel Kahneman and Amos Tversky who, in 1979, came up with a clever way to study human choice behavior. K & T consulted their own intuitions and came up with simple problems which they then asked individual subjects to solve. The results, statistically significant according to the standards of the time, generally confirmed their intuition. They replicated many results in different countries. This work eventually led to many thousands of citations and an economics Nobel prize in 2002.
In a classic paper subjects were asked to pick one of two choices, for example: A: 4000 (Israeli currency; today about 1000 U.S. dollars) with probability 0.2 or, B: 3000 with probability 0.25. 65% of the subjects picked A, with an expected gain is of 800 over B, 35%, with an expected gain of 750. This represents rational choice on the part of 65% of choosers. Other experiments seemed to violate rationality. The outcome of many such experiments was a list of cognitive biases — certainty effect, reflection, isolation, confirmation and very many others.
Each effect is presented as a property of human choice behavior. Because the group effect is reliable, individual exceptions are ignored. But there is little doubt, for example, that given a lesson or two in probability, or phrasing the question slightly differently (e.g., not “What is your choice? But “What would a statistician choose?”), would greatly change the results. Nevertheless, this theory, which is more a tabulation of biases than a hypothetico-deductive system, has gained some acceptance as an accurate picture of individual human choice. In short, prospect theory is not a theory about individual human beings at all, but about the behavior of groups — groups large enough to allow positive results from standard tests of statistical significance. It is opinion polling with a clever set of questions.
To go beyond a group average, the experimenters would need to look at the causal factors that differentiate individuals who respond differently. What is it, about the constitution or personal histories of individuals, that makes them respond differently to the same question? Solving this problem, satisfying Claude Bernard’s admirable axiom, is obviously much tougher than simply asking “do you prefer 3000 for sure or a 0.8 chance of 4000?” But until this problem is solved, prospect theory — and numberless other psychological theories — give a distorted picture of individual human behavior. The results are in fact uninterpretable, meaningless as individual choice.
Kahneman and Tversky were not alone in treating their group theory as a theory of individual choice behavior. The field of NHST made and continues to make the same mistake: a statistically significant group result is treated as a property of people in general.
Matching
Operant choice is also an example of the object of inquiry problem, in the sense that orderly behavior of a feedback system in which organism and environment exert reciprocal effects is presented a property of the organism alone.
In the early 1960s, Richard Herrnstein discovered that under many conditions animals long exposed to choice between a pair of intermittent (variable-interval, VI) schedules tend to stabilize at rates of response such that the average payoff probability (total reinforcers divided by total responses, counted over a few hours) is the same for both choices: x/(R(x) = y/R(y), where x and y are total responses to each choice in an experimental session and R(x) and R(y) are the reinforcers obtained for each. In other words, if the reinforcers received are in 2 to 1 ratio, so will be the ratio of responses to the two alternatives. This relation is known as the matching law.
On any time-based schedule, a reinforcer set-up is almost certain after a long enough wait. So, on a VI 60-s schedule with a time between responses greater than 10 min, say, almost every response will be reinforced: matching is guaranteed. If response rate is the more usual 40 to 100 per minute, with the same VI 60s, not only will most responses be unreinforced, but reinforcement rate will be more or less independent of response rate. If we see matching under these conditions, therefore, it will seem more interesting because it is not forced by the reinforcement contingency.
Do we in fact get matching when pigeons must choose between two VI schedules? Sometimes. But more commonly we see what is termed undermatching, i.e., the ratio of reinforcement rates, R(x)/R(y) is more extreme than the ratio of responses x/y. Undermatching is when the ratio of responses is closer to unity (indifference) than the ratio of reinforcements.
So far so uninteresting: undermatching is not too exciting. The function relating x/y to R(x)/R(y) can always be fitted by a suitable power function:
x/y = a[R(x)/R(y)]b. Unless the parameters a and b can be related to some procedural feature, or at least are invariant in some way — the same across subjects, for example— all we have done is fit a smooth and totally plausible curve by a very flexible function. If Herrnstein had reported undermatching in 1961 we should probably have heard little more about it.
But Herrnstein got simple matching, x/y = R(x)/R(y), by making a modification to the concurrent VI VI procedure called a changeover delay (COD).“Each time a peck to one key followed a peck to the other key, no reinforcement was possible for 1.5 seconds. Thus, the pigeon never got fed immediately after changing keys.” Perfect matching is only obtained when switching is suppressed by the COD: “The precise correspondence between relative frequency of responding and relative frequency of reinforcement broke down when the COD was omitted.”
The feedbacks inherent in the concurrent VI VI procedure ensure a monotonic relation between reinforcement and response rates. But to get perfect matching, you need a more complex arrangement involving not just two independent schedules but a changeover delay. Yet, the matching relation is in fact pretty robust. Consequently, it has been the focus of a huge amount of empirical and theoretical research in the fifty-odd years since Herrnstein’s original paper. But what does it tell us about the individual organism, as opposed to the whole organism-schedule system?
The answer seems to be precious little: almost any reward-following process, from momentary maximizing, to melioration, to hill-climbing is sufficient to yield matching. How to decide between them?
A convergence of two lines of work some years ago led us to one possibility. The argument is as follows. A series of experiments on reversal-learning in pigeons showed that they can learn to switch their choice faster and faster across successive daily reversals. They never reverse spontaneously but soon learn to switch their preference after the first one or two reinforcements each day.
We found that a very simple model, adequately describe this behavior. The model assumes that choices are driven by the cumulative reinforcement probabilities for each choice response in winner-take-all fashion.

The model, unsurprisingly, is compatible with matching but also makes decidedly non-matching predictions under other conditions. For example, compare choice behavior when the subject (pigeon or model) must choose between two identical random-ratio schedules. Figure 1 shows the results. The large graph shows the behavior of the model choosing between two identical random ratioschedule, either 5 each or 100 each. Preferences after twenty runs each of 4,993 time steps. The graph plots total responses: X vs. Y for the two ratios: 5 and 100. Exclusive choice is favored at the smaller ratio (black diamonds), indifference at the large value (gray squares). Initial conditions, etc., are here. The right-hand graph plots the proportion of choices of the right-hand alternative across daily sessions for each of four pigeons for two equal random-ratio conditions, p = l/75 and p = l/20, in ABA sequence In both simulation and data, exclusive choice is favored at the high probability, indifference at the low.
The cumulative effects (CE) model is a first step in understanding the process of operant choice. It is worth attention because it is consistent with more than a single experimental result, both daily reversal and concurrent variable-ratio, as well as the matching. But it lacks a dynamic element, time does not figure, as well as any notion of contextual control. The only parameters of the model are initial conditions; are they context-linked and if so how? But, unlike the matching law and prospect theory, CE can lay claim to being a theory of individual choice behavior. It’s a start!