Bad Science: Quadrants of Operant Conditioning
People become dog trainers for various reasons. Often, these individuals will talk about a dog’s “performance,” yet this undoubtedly has a variety of interpretations. After all, what is performance? Is it speed? Strength? Accuracy? Reliability? Chat up a few trainers involved in any professional sport (canine or human) and you will see that there are numerous beliefs both for which methods produce the best results for the desired performance as well as for what reasons. Should our toes be pointing straight ahead or at an angle when doing a squat? Should we stretch before or after an activity? With dogs though, the question is even more convoluted because here the concerns are not just about performance: they are also about welfare.
Animal welfare is a vast topic and one that cannot be approached from A-Z in a single sitting. Many philosophers and scientists devote their entire lives to traversing the quagmires of non-human animal welfare issues and so I am not going to put all of my roulette chips down on 28 black and defend my choice in the never ending spin of the animal welfare debate wheel. In a perfect world, conversations are always productive. In the actual world, this is often a rare occurrence. But to the point, the conversation that might be the least productive in dog welfare is the assertion that techniques which use positive reinforcement and negative punishment are ethical; while techniques which use positive punishment and negative reinforcement are unethical. This issue is so emotionally charged and emblazoned in the industry that often the supporting evidence for a claim about the ethical nature of a technique revolves solely around the interpretation of what quadrant of operant conditioning the technique relies on.
For example, many trainers claim that a technique called Behavior Adjustment Training (BAT) is unethical because they see it as negative reinforcement. For those unfamiliar, BAT removes a stimulus (which a dog finds threatening) at a distance great enough for the dog to remain calm and not show signs of being overly agonistic (such as growling, snarling, barking, etc.) [note: typically it is the dog that is moving, not the scary stimulus]. Because you repeatedly remove something (in this case, the thing the dog doesn’t “like”) to reinforce calmer behavior, many trainers label this type of training as negative reinforcement—and because negative reinforcement is claimed to be unethical, BAT must therefore be unethical. Problematically, BAT was designed to steer owners away from using harsh punishments and the method itself creates no signs of undue harm on the dog; so, if the interpretation of the quadrants of operant conditioning cause trainers to conclude that BAT is unethical, then there is a serious problem with the convention because calling BAT unethical is like calling Mr. Snuffleupagus from Sesame Street a serial rapist.
Since 1975, various scientists have pointed out that learning events can rarely, if ever, be labeled solely as positive or negative (e.g. Michael 1975, Baron & Galizio, 2005; Baron & Galizio, 2006; Tonneau, 2007). For example, imagine a rat in a black box at freezing temperature. They have a lever which activates a heater for a short period of time. As the rat stays in the box, an increase (reinforcement) in lever-pressing is noticed over time. Here is the paradox: does lever-pressing increase because of the addition (positive) of heat or the cessation (negative) of cold? The answer is yes.
[This example is paraphrased from an actual experiment conducted by Weiss & Laties published in Science in 1961]
In the physical universe, the addition of one stimulus is always met with the removal of another stimulus. Regardless of what type of matter (energy) this stimulus is, energy cannot be created or destroyed, and so within any closed system you have to remove something to add something and you have to add something to remove something. This is a fundamental property of the universe and is analogous to the idea that two opposing baseball teams cannot win the same game: in order for one team to win, another team has to simultaneously lose. This prompts us to ask two questions: 1) are the quadrants of operant conditioning mutually exclusive?; and 2) if they are not mutually exclusive, then are we able to stipulate that they are not occurring at the same time during a learning event?
Most examples of what dog trainers consider positive reinforcement rely significantly on negative reinforcement elements (e.g. the removal of hunger). Food is great, but as a motivator we are removing hunger (negative reinforcement), however it is also positive reinforcement for the obvious reason that we are adding food. This might seem unimportant for the lives of most dogs who are fed to the point of obesity, however in behavior research, most animals are deprived of food before reinforcement begins in learning paradigms, therefore the contingency of food as positive reinforcement is being given to an animal deprived of enough food prior to testing to cause a 15% decrease in body mass. For perspective, imagine a 180-pound male losing 27 pounds before being handed a cheeseburger as reinforcement and you might appreciate how removing deprivation is not only perhaps a better description of the actual science of reinforcement but also a significant motivator for a rat to start pressing levers in their black box.
A classic example used popularly in psychology textbooks is the example of an aspirin as a negative reinforcement. The idea is that the removal of a headache might increase future aspirin-taking behavior, thus the removal of the aversive headache could be said to increase the frequency of the behavior—or more concisely, the aspirin is negatively reinforcing aspirin-taking behavior. However, we are adding aspirin to the system, so what do we say about the addition of a stimulus that causes the removal of another stimulus that overall causes a consequence which increases or decreases the frequency of the antecedent behavior? Vis-à-vis “aspirin-logic,” the addition of food that removes the feeling of hunger would have to be negative reinforcement as well. By now it should be clear that there is no mutual exclusivity to the reasoning behind the popular interpretations of the quadrants of operant conditioning, and therefore any conclusion that relies on such a demarcation is neither logical nor scientific. Simply put, analyzing behavior with a system that relies on the Tweedledee-Tweedledum characterization of reinforcement and punishment (Marr, 2006) in a universe that is beholden to the conservation of energy is a product of improper, massive oversimplification.
It should be appreciated that the difficulty of negotiating positive versus negative effects within a system is common to science. For example, biologists that enjoy old-fashioned terminology will describe the movement of an organism in relationship to a stimulus a “taxis;” positive taxis is therefore movement toward a stimulus, while negative taxis is movement away from a stimulus. If the reference point for the behavior is the change to the environment (e.g. the appearance of a prey animal) then naturally we would instinctively describe the motion of a predator towards the prey as positive taxis. However, let us instead change the animal to an herbivore like an elk. Imagine a large group of elk munching away on some delicious savory grass. Overtime, the elk wear down the presence of grass in the area they are feeding and they then move toward another area which has more grass present. Are the elk moving toward an area of more food to forage on or away from an area of less food to forage on (i.e. is it positive taxis toward new grass or negative taxis away from no grass)?
It is important to remember that much of what we use to categorize nature are simply conventions, and sometimes their creation is no more sophisticated than what one person decided while reading the latest issue of Science while sitting on the can. One of my favorite illustrations of the sometimes arbitrary nature of conventions is in the way physicists describe torque motions. In physics, a torque that generates movement counterclockwise is notated with a positive force and a torque that generates movement clockwise is notated with a negative force. Why? Because if you replicate the motion of an object moving counterclockwise with your fingers on your right hand, your thumb is pointing up, and if you replicate the motion of an object moving clockwise with your fingers on your right hand, your thumb is pointing down. For this reason it is called the thumb rule.
Despite the overwhelming issue, research papers and essays are still frequently published describing events that are “positive reinforcement” or “negative reinforcement,” therefore this is by no means just a dog industry issue. Furthermore, responses to these criticisms fail in addressing the issue head on, are unable to provide sound counterarguments, and/or fall back on the pragmatic argument: “well, we don’t have anything better so it is better than nothing.” There are a couple problems with the pragmatic argument. First, define what is “better?” Quadrants create a paradigm view that cannot be supported without the existence of quadrants, so if “better” requires a convention that maintains the theory-laden beliefs of operant conditioning then I would say the pragmatists are correct, just like creationism cannot exist without a God who created the universe as the central hypothesis. If “better” requires only the need to describe learning events then the pragmatists are definitively wrong because the concepts of reinforcement and punishment are descriptive enough in and of themselves as positive/negative distinctions always have to be clarified further with methodological explanation.
But all of this side steps the heart of the issue: harsh punishment creates the negative and deleterious results we are familiar with because of the threat it presents to the organism. The ethics here are measured through actual harm, not through the way an animal learned something. Indeed, many dogs might not learn much of anything that is objectively quantifiable in an operant classification after being swung around on a choke chain in a helicopter swing or kicked in the ribs, thus we couldn’t say these events belong to any quadrant because we have to first establish the learned behavior that is operating on the environment.
Ethics does not have a quadrant. It is a complex web of issues that are rarely cut and dry and conversations about dog training through positive and negative quadrant distinctions only obfuscate the discussion at hand. Kicking a dog is unethical because it is harmful and cruel: not because it is “positive punishment.” Dangling a dog in the air as it suffocates is unethical because it too is harmful, cruel and abusive. You cannot design an experiment to show that the Yankees won is true but the Red Sox lost is false in the same way it is impossible to falsify whether it is the addition of a treat or the removal of hunger acting during a learning event. Pragmatists will say “oh, whatever, it’s not a big deal because I know the difference.” Problematically, it’s not only unhelpful to the conversation but it is also unscientific. Science is falsifiable, if it is not, it is not science.
Baron, A., & Galizio, M. (2005). Positive and negative reinforcement: Should the distinction be preserved? The Behavior Analyst / MABA, 28(2), 85–98.
Baron, A., & Galizio, M. (2006). The distinction between positive and negative reinforcement: Use with care. The Behavior Analyst, 29(1), 141.
Marr, M. J. (2006). Through the Looking Glass: Symmetry in Behavioral Principles? The Behavior Analyst, 29(1), 125.
Michael, J. (1975). Positive and Negative Reinforcement, a Distinction That Is No Longer Necessary; Or a Better Way to Talk about Bad Things. Behaviorism, 3(1), 33–44.
Tonneau, F. (2007). Behaviorism and Chisholm’s Challenge. Behavior and Philosophy, 35, 139–148.
Weiss, B., & Laties, V. G. (1961). Behavioral Thermoregulation. Science, 133(3464), 1588–1588. doi:10.1126/science.133.3464.1588