Tag Archives: negative reinforcement

The Emperor’s new clothes

I have recently been reading a number of articles, responses to questions on Facebook or comments made in relation to the use of aversive stimuli with negative reinforcement (pressure and relief) for controlling the behaviour of horses or for training repeatable responses with horses.

It reminds me of the story of the Emperor’s new clothes.

In the story, by Hans Christian Andersen, the vain Emperor hires two rogues who promise to make him an amazing new outfit from a fabric that is invisible to anyone who is stupid. The garments do not of course exist and the Emperor himself, his ministers, and the townsfolk, pretend to be able to see and admire his new suit, for fear of being considered stupid.

Then a child in the crowd through which the Emperor is parading in his new clothes, shouts out the truth, that the Emperor is wearing nothing at all. The Emperor embarrassed, suspecting that the assertion is true and that he has been duped, continues the procession in his birthday suit.

These questions and answers I have seen going around about the use of aversives in training, reminded me of this story because everyone seems to be going along with the tale of the miraculous properties of the fabric used for the Emperor’s new clothes and no-one wants anyone else to think they are stupid, even though they can see for themselves that it can’t be true.

Negative reinforcement is said to have happened when the future frequency of a behaviour is strengthened (happens more frequently) because that behaviour results in the removal, escape or avoidance of an aversive (unpleasant) stimulus.

In English, this means that if something we don’t like happens to us, and the action we take to escape it is successful, we will repeat that behaviour next time. Not only that, but we will start to notice what happens before the bad thing happens, and start to take action sooner, to avoid the bad thing.

In order for that behaviour to be more likely to be repeated, the stimulus (the unpleasant thing) needs to be sufficiently unpleasant to prompt some behaviour that is a step in the direction of the required response, and the stimulus must be removed or reduced in strength the instant that a correct response occurs.

Pressure and relief is a way to deploy the phenomenon that is negative reinforcement, to train repeatable behaviours. In a nutshell, the handler or rider must apply an aversive stimulus (one that the horse either innately dislikes or has learned to dislike by association) until a desired response (desired by the person, not the horse) is performed, and then provide the horse with relief from the unpleasant stimulus by removing it, reducing it, or ensuring that the horse escapes or avoids it by performing that correct response.

It is possible to teach commands for behaviours by introducing the command for the behaviour immediately before applying the aversive. With repetition, the horse learns that the command is a signal that warns that an aversive will follow, and the horse will begin to respond to the command to avoid the aversive. The command given can be anything the horse can perceive – a visual body language gesture, an audible verbal command word or sound, a shift in the weight on his back, or a touch from the leg, or movement of the rein by the rider. Even the rider or handler taking in or letting out a breath. These can all be conditioned to predict aversive onset.

It must be noted however that any command given, comes to be learned by the horse to be a reliable predictor that an aversive will follow if s/he does not respond to the command, and that learned and previously neutral command comes to be aversive (producing the same emotional response) in its own right, by association with what follows.

A typical example of this might be the use of verbal commands in lunging, where the horse is given the command “Walk on” or “and….Trot” just before the application of the lunge whip behind the horse, which, when applied at a sufficiently aversive level, will produce the desired response.

Another example might be pointing and turning our body in the direction we want the horse to go, before applying an aversive both through steady pressure on the halter through the rope or lunge line, and aversive pressure with a stick and string or lunge whip or flag behind the drive line of the horse to cause it to seek relief away from each of those pressures by going out on a circle.

Alternatively a seat aid for halt – experienced by the horse as an alteration in the weight, balance or feel of the rider on his back – is given immediately prior to the application of an aversive through the reins connected to the bit in the mouth of the horse. This seat aid – which initially means nothing to the horse since s/he has had to be first desensitised to the movement of a rider on top – comes to predict an unpleasant sensation in his mouth, and the horse will, if correctly trained, learn to halt from the seat aid to (hopefully) avoid the unpleasant sensation in his mouth from the bit. If the seat aid does not work, then of course the rider will apply the aversive pressure to the mouth of the horse through the reins to make the horse stop.

Many equine behaviourists and trainers insist that having horses learn how to respond to pressure from handlers and riders is essential for the safety of both.

I used to agree with that, but the more I look at how people handle horses, I no longer do.

For one thing, horses already know how to “respond” to stimuli they do not like – they often tend to pull against or lean into pressure so as to avoid being pulled off balance.

The other is that I rarely see people handling horses who understand how to take pressure off when the horse does respond. So I don’t see negative reinforcement of desired behaviour happening for the most part in any event. Even if we trained our horses to respond to light pressure, unless these “other people” handling them are equally well trained to apply pressure lightly and wait for the horse to respond, the horse is going to be hurried and pulled and pushed and prodded into position anyway.

The best we can do is to train the horse to respond to commonly used verbal or visual body language or touch cues, and desensitise the horse to all the things that might detract from his willingness to respond to those.

However, there is another issue I want to address and it is the fact that I keep seeing people repeating what they hear said by others, that it is possible to teach horses behaviours in small steps, using pressure-relief (an aversive stimulus being used to prompt a desired behaviour) without escalating the aversive stimulus. They go along with that idea, perhaps because they don’t feel brave enough to question an authority figure, much the same as did the townsfolk watching the Emperor parade naked.

Escalation would involve either increasing the strength of the aversive or adding another type of aversive to increase the overall aversive experience of the horse, provoking the horse to want to take action to escape that situation.

Even the act of keeping an aversive stimulus “on” at one apparent level of strength results in escalation in the sense that the stimulus can become increasingly aversive to the horse. Imagine for instance being repeatedly tapped on the back of your leg with a whip. Even through each application of the whip may on its own be exactly the same in terms of the physical amount of effort used by the person holding the whip, over time the sensation would become increasingly intolerable.

The same can apply to any form of steady constant pressure. As an example, imagine holding a 2 kilo bag of sugar in your hand, with your arm stretched out in front of you for a few seconds. Not so bad. Now do it for a whole minute. Now do that for 10 minutes. Over that time, even though the bag of sugar weighs the same amount, it will begin to feel heavier and heavier to you as you hold out your arm, and more and more unpleasant to do so, due to the muscle fatigue you will be experiencing, resisting the forces of gravity.

So even though an aversive might not appear to us to be being escalated – the person is for example only pulling on the head of the horse with the same amount of pressure as was originally applied – the muscle fatigue involved in resisting that force is such that a stimulus maintained at that same level with no more pressure apparently being added can eventually become sufficiently aversive to a horse that he will act to make it stop. That pressure gets “heavier” or stronger or more intense, on its own.

Part of the reason for suggesting that we can use aversives without escalating, is, I am sure, that most equine behaviourists, and many very dedicated and self-educated and experienced consultants in equine management, behaviour and training generally tend to discourage the use of particular techniques for escalating pressure to either suppress or increase behavioural responses, such as are deployed in some natural horsemanship methods in particular.

Natural horsemanship methods are often criticised for promoting outdated ideas about dominance between animals, flooding (forced, restrained exposure to high strength and enduring frightening stimuli or events – the “throw them in at the deep end and leave them there until they swim” approach), aversive punishment (corrections) and the use of escalating pressure, which involves increasing strengths of aversive stimuli, sometimes escalated quite quickly to high strengths that can cause considerable fear or aggression or both in horses. Natural horsemanship has no monopoly on those things, just a convenient label. A trip to a weekend local horse show would be enough to see all of those things happening, although arguably with less skill in the giving of relief.

However, the question that I have debated with myself and with other qualified and experienced psychologists, equine behaviourists, riding teachers and trainers was the extent to which it is ever possible to use aversives to form and reinforce behaviour or to suppress it, without escalating.

I say that because I am seeing equine behaviourists and horse trainers of all persuasions suggest to those less knowledgeable about learning theory, that it is possible to form and reinforce complete behaviours from horses using very light pressure and relief, or very mild aversives, without ever escalating.

That means using some form of very mild aversive to get a response and removing that aversive when the horse responds without ever increasing the strength of the aversive.

It seems to me that while they might know something about negative reinforcement – you can produce a level of response with an aversive and get it again if you repeat it, they have forgotten about the other form of very simple learning that is always going on, and that is habituation.

If I take the use of light leg “aids” as an example, then if we were to be able to only use light aversive stimulation (pressure) to cause a horse to go forwards, then we would be able to just squeeze the horse lightly with our legs on his sides, or touch him with our heels, and the horse would walk forwards.

But this simply makes no sense. Folks who tell us this would have us believe that if we have a horse who has never been ridden, has been desensitised to things pressing around his middle (such as a girth or cinch), then touching the horse with something else in more or less the same area is going to produce a response.

It won’t. If it does, it probably means the horse has not been desensitised to the girth or the rider’s legs hanging down by his sides at all and is still reacting fearfully or with annoyance to having things in contact with his sides. Because the thing we must do before ever riding a horse is to ensure that he is not in any way reacting to the tightness of the girth or cinch around his middle and that he accepts touch all over – meaning that when touched with the hands, with the rider’s legs or with any item we might want to wear or carry, he does not react at all. That is an essential prerequisite to safe riding.

If he is reacting, then further desensitisation must take place such that the horse completely ignores the feel of the tightness of the girth or the feel of a rider’s legs touching his sides before we ever attempt to ride. If we don’t, the horse will try to get away from that sensation by moving, or by bucking or kicking out or biting or tail swishing at the source of the annoyance or discomfort, much as he would if he had a fly on him. Much the same way as horses do when ridden with spurs.

What actually has to happen for a horse to learn to respond to a very “light” aid or signal in a pressure-relief model is that this light leg touch or squeeze from the rider needs to be applied first and then it needs to be followed quite soon afterwards by some higher strength aversive stimulus in order to get a response.

The touch from the leg then, after some repetitions, comes to be the warning signal that another aversive stimulus is coming – whatever the rider chooses to use that is sufficiently unpleasant to cause the horse to walk forwards to get away. Through repetitions of this, the leg aid or touch or squeeze will come to be learned to be associated with that more significant aversive that follows and the horse will act sooner to avoid that. And it is also essential that, if we are trying to teach the horse to walk on for instance, that it is only when steps of walk forwards happen that the aversive is removed.

Various methods are recommended for applying an aversive to cause a response, but the rider has to make a response happen pretty soon after that “light” touch or verbal signal or whatever is used, or the horse will remain (or become) habituated to that light aid, much as he should to the rider’s legs just hanging there in contact with his sides. And when he habituates, he will ignore it.

In some systems the use of a whip tap to the horse behind the rider’s leg, on the boot of the rider or on the shoulder or rump of the horse might be recommended either to cause annoyance or a little stinging pain or to startle the horse into a flight response forwards.

In others, the rider might be advised to slap themselves or the horse on the body with a rope or with a special rope device (called by the fun sounding name “whip-whop”) to cause the horse to startle and move off. Some natural horsemen like to use a coiled up lariat and slap themselves with it on the leg and then the horse on the rump, which startles the horse into going forwards.

Some systems might advocate a little kick with the heels be used to follow the squeeze. Others seem to favour clucks and smooches given verbally by the rider or instructor, followed by some leg slapping or rope shaking, or some kind of general commotion from behind on top of the horse perhaps also with verbal chastisement to move. “GET ON!” seems to be a favourite local to me.

Whatever the system of pressure-relief that you follow and whoever you copy, there is no system that will teach you to just sit there squeezing lightly with your legs if the horse doesn’t respond to the neutral or mildly aversive feel of the squeeze. Because all that will happen if you do, is that the horse will habituate to that level of pressure (be that physical pressing or psychological warning threat that an aversive will follow non-response) and stop responding to it at all.

Whichever particular application is used, it is necessary to apply a sufficiently strong aversive and to either continue to apply it, or to increase it until a response is produced, whereupon that stimulus should be removed.

Should a horse fail to respond to a light pressure or mild aversive then unless that aversive is either:

  1. maintained at that level until it becomes aversive enough that the horse acts escape it, or
  2. increased in strength in some way, or
  3. another, additional type of aversive added to provide additional motivation to act

…then the horse will almost certainly habituate to the mild aversive or light pressure being applied at that level and future responses will be fewer and / or weaker.

Trainers actually rely on this process of habituation – where animals learn to ignore stimuli – in the first place during the halter training and backing process when we desensitise the horse to things it would otherwise fear or resist – like the feeling of the weight and pressures of the head collar or bridle on its head, the saddle on its back or the tightness of the girth or cinch around the middle.

Unless and until the horse shows no response at all to the feeling of these things, it would not be safe to ride or lead at all.

And that’s the trouble with mild aversives. If you’ve been trying to use mild aversives with a horse either to cause it to move or to stop it from moving and this isn’t working, then it is because either the horse is distracted by or attracted to something more important than your very light pressure, or the horse has simply habituated to it and needs to be re-sensitised – sometimes called “reminded” – that non-response will result in escalation.

If the horse experiences repeated applications of very light pressures or very mild aversives without any escalation or added aversive stimulus to produce a response, the horse will habituate and ignore those stimuli. And that means that at some point when you use that same mild aversive it won’t produce anything at all by way of a response. And it is inevitable that this will happen at some point due to the many things that compete for the attention of the horse.

To add to that, we also know that responses to aversive stimuli tend, in any event, to be reduced over time to produce only the effort that is necessary to achieve escape or avoidance. Once the horse has learned that he only needs to go forwards a step or two to make the aversive from the rider stop, then that is going to be all he will do. Which is why riders (and some instructors do teach riders to do this, so I do not judge the rider for this practice) seem to end up continuing to “remind” the horse that he has to keep going.

One alternative, in a pressure-relief only paradigm, is strong escalation, and it is this that is disliked by those objecting to the natural horsemanship form of escalation. That kind of strong escalation means that when the horse breaks gait to a lower (or higher) gait, the aversive is applied quickly and fairly severely to a level where the horse will not break gait because he fears what will happen if he does. For horses breaking gait down, pressure is added quickly to reproduce the desired gait. For those breaking gait up, pressure is added quickly to reproduce the lower gait. Arguably this kind of practice effectively punishes any failure to maintain gait and the horse experiences relief and freedom from aversives only when he maintains the gait asked for. Some horses take to this quite quickly. They want an easy life. It’s easy to get them to move if they are wanting to move anyway from nervous energy or because they have been stabled for a long time. I am thinking competition dressage horses. It’s not so easy to persuade those horses to slow or stop. And the converse is true with so called “lazy” horses that actually often become slower and less responsive in this kind of system due to learned helplessness.

It is also worth mentioning that the inability to maintain gait without constant reminders or corrections in an otherwise fit horse, should be a warning sign of possible physical discomfort. Always eliminate pain as a cause of unwillingness to stop or unwillingness to go.

So, back to the use of mild aversives / light pressures.

Once a horse has learned that the light leg pressure will be removed if he walks on (and escalated to some degree if he doesn’t, until he does walk on) then what are we to do to produce a trot? Because the same light leg pressure used to produce the walk is only sufficient to produce the walk.

What about when we want a faster trot, or even canter? If we apply the same light leg pressure when in walk to get the trot and there is no response from the horse, then what are we to do? Because in reality, if we get no increase in the level of response, we are on the road to also having the horse habituate to and ignore that light leg aid for walk as well as for trot.

And by the laws of habituation, we know that frequent and repeated exposure to low strength stimuli results in a reduction in response, and eventually to no response at all. So we can’t keep using mild aversives because they will habituate and if we don’t want the horse to ignore our aid or command we need to make the behaviour happen to maintain the association between that command, the aversive used to produce the behaviour and the negative reinforcement (aversive removal) that follows a correct response.

With horses that have – through breeding or management – a greater desire to move, and whose breeding is such that they have a strong flight response, it tends to be quite easy to produce movement with very mild aversives. That is why we see so many of those breeds of horse such as thoroughbreds and warmbloods performing in dressage, compared to more “cold-blooded” native breeds and cobs.

Getting the so-called hotter blooded horses to move with aversives is not so difficult, because they have been bred for the flight response as well as for the long ground covering legs and light bodies. Getting them to stop tends to be more challenging and that tends to be more where we see aversives being escalated through the bit.

Cobs and many native breeds can be the opposite – they tend to be much easier to persuade to stand still or go slowly (unless they are afraid), they can be notoriously difficult to motivate to use any energy in schools and arenas and yet are the safest and most fun if all you want to do is have a pootle around the block on a hack. Getting them to put in any more effort than that without escalating aversives, if aversives is all you have, is very difficult and explains a lot why they don’t feature quite so frequently on the dressage scene.

So, what I really wanted to do was to make sure that no one is still feeling stupid, and I hope I’ve done my job. If you have been trying to use very mild pressure or a very mild aversive with a horse to produce a behaviour, and the horse’s response is getting no better or has died off altogether, it’s not because you are ignorant or doing it incorrectly. It is because this is what happens.

The fact is that it simply is not a realistic expectation to set that in a pressure-relief paradigm we can use very mild pressure or a learned visual or vocal or touch signal to forever produce the same or increased effort in behaviour and never escalate with an aversive to make the behaviour happen.

The use of mild pressure or a mild aversive with no response from the horse results only in one thing and that is habituation to the stimulus, such that there will, in very few repetitions of that, be no response at all.

So the question is, what DO we have to do then if mild aversives won’t work?

Well, if you wish to pursue training a horse in a pressure-relief model, then it is going to be essential to keep adding or increasing the strength of aversive until you do get a response. It’s unavoidable. How you choose to go about doing that is a personal choice and the extent to which you are willing to escalate is a moral decision. Sometimes people find themselves working harder and harder to get a response and the horse does less and less, because they are just not the kind of person who is willing to escalate to the extent that might be required to get a “snappier” response. They love their horse and they don’t want to have to do that and frighten or make him mad to get more from him.

Sometimes when I am teaching pressure-relief to people and horses – something I prefer to do usually only where there are physical imperatives for the horse to exercise, such as weight management and physiotherapy, or where the owner has yet to learn how to use pressure-relief correctly, I might try to find out what the person thinks is the least aversive thing that will produce a response from the horse and go from there.

I will try to use very mild aversives (the smallest increment necessary to get some movement) and mark and positively reinforce that for horses that are ready to be trained with positive reinforcement. That way, I can get the horse thinking that there is something more in it for him than the threat of more pressure if he doesn’t and we can often improve performance and effort hugely by just using the mild aversive a few times to form the behaviour initially and then get it on a non-aversive cue and build from there using successive approximations with positive reinforcement alone.

There is still an issue with that though from two perspectives. One is that if the behaviour breaks down then we have no other way to reproduce it other than using the original aversive used to form it.

The other – and I must stress that this is not because the training doesn’t work but because people find using aversives to get behaviour very reinforcing for themselves, and have usually practised that a lot – it is all too easy to slip back into old habits and escalate.

For that reason I prefer to use target training for many things for which aversives are commonly used. It just takes the temptation out of it, for everyone who escalates automatically without thinking because this is what they have been trained to do for so long!

What I will never do though is to mislead anyone who wishes or who is for the time being compelled (for example on vet’s orders to exercise a horse) to continue using pressure and relief, to expect that they will only ever need to use a teeny tiny mild aversive and never increase it if they are expecting to get either longer duration or more effort or indeed continued effort from the horse.

It’s an impossible dream.

If we only have aversives available to form and reinforce behaviour, we will always have to escalate for some behaviour at some point, to keep the association going between the command for the behaviour from the handler or rider, the actual aversive stimulus that will produce the response from the horse and the reinforcement produced by his effort to escape or avoid that.

Anyone who was wondering how it can possibly be that we can get more output from a horse in an aversive paradigm without increased input was right. The Emperor, sadly, is in fact naked.