A question I am sometimes asked is whether it’s acceptable to use an aversive stimulus (pressure) to produce behaviour from a horse and to both remove the aversive AND mark and reward the desired response with food or a scratch. And to call that positive reinforcement training.
The way I want to answer this is to help you to think less about the reinforcement method (and there are only two – negative and positive) and more about how the behaviour was produced.
Because with every behaviour there is emotion – how the horse feels.
And there are two types of learning always in play and that go hand in hand.
One is classical conditioning – which is how horses (all of us in fact!) form perceptions or associations with things – how we all come to feel about things – our conditioned (learned) emotional responses to stimuli and events. Classical conditioning is all about feelings and about how we respond because of how we feel.
And the other is how they learn as a result of the consequence of their behaviour – and how they feel about that consequence. So how the animal perceives the consequence of his or her behaviour will determine whether that behaviour is repeated in future. That is operant conditioning. Learning by experiencing consequences for behaviour.
And when we are wanting to train behaviour there are always two types of changes in the environment that involve stimuli affecting the horse – one comes before the behaviour and causes it to happen – as an activator or trigger of the behaviour, and the other comes after the behaviour as its consequences. And this consequence determines whether the horse will repeat the behaviour.
But in each case – the stimulus that comes before – the antecedent as it is known in behaviour science – and the one that comes afterwards – the consequence, evoke emotional responses.
So the question is can we really reward a horse for performing a behaviour under pressure, even if we use food as well as relief as a consequence?
So let’s start with making sure we understand how reinforcement works.
When we use positive reinforcement, it is important to remember that the reinforcer always comes after a behaviour has been performed.
Likewise negative reinforcement.
Positive reinforcement involves adding (hence positive or the plus + sign) something that causes the horse to want to repeat the behaviour that the reinforcement follows.
Negative reinforcement involves removing (hence negative or the minus – sign) something that caused the behaviour, immediately we get the behaviour or a try towards it.
Both are reinforcement and will strengthen the behaviour they immediately follow, but one is aversive reinforcement (an aversive that has been used to produce the response is removed) and the other is appetitive reinforcement (an appetitive is added, as an immediate consequence of the response).
If we use pressure, or existing pressure-trained aids or cues (those that would be followed by an escalation if ignored – meaning either that the aversive used to produce the behaviour persists until the animal acts to escape it, or the aversive is increased in strength or another type of aversive is added), then if we remove the aversive when the horse does something we want, that is negative reinforcement – pressure, followed by relief.
If we also give food after the behaviour has happened or we intentionally bridge (using a marker signal that the horse has learned predicts that food is coming) as we remove the aversive, and give the horse a treat, that is not what I would call positive reinforcement for the horse, because the behaviour was produced under aversive stimulation.
I would describe that as an attempt at counter-conditioning. The horse performed the behaviour under conditions in which they were either afraid or in some discomfort or annoyed. If they were not in any discomfort or annoyance or fear they would ignore the stimulus used to produce the behaviour and there would be nothing to reinforce.
As a bare minimum, even if we think we are using shaping to produce a response using aversives, for negative reinforcement to work, the stimulus applied to the horse has to be unpleasant enough for the horse to want to act to escape it.
All we can hope to do if we give the horse food after he has performed a behaviour to escape or avoid an aversive stimulus is to change how he feels about the stimulus he just experienced. And that is classical conditioning. Not positive reinforcement.
And the trouble with that is that people are SO likely to escalate if the horse ignores that light aversive (because being able to make the horse do what we want is positively reinforcing for the human) that we can be trying to counter-condition for ever because we keep re-associating the cue with aversive onset.
If you want to train using positive reinforcement then the best way to do it is to learn about how to produce behaviour without the use of aversives, pressure, discomfort – call it what you will – any stimulus that the animal values when it stops.
Positive reinforcement goes hand in hand with target training, where we make use of the natural investigative behaviour of horses to approach novel objects. We classically condition a marker signal to mean that food or a scratch is coming, and then we use that marker signal to reinforce the horse for approaching the novel object that we plan to use as the target.
Looking at, approaching or touching that object will result in the trainer giving the marker signal and then offering the horse some food or a lip curling scratch.
Within seconds you have a way to now cause the horse to move, to stand still and to alter his posture without ever using any aversive (pressure) or learned aversive (aid or cue learned by association with aversive onset) to produce that movement.
You can even train a horse to target other body parts to a target prop or to your own body – to your hand or leg for instance. I’ve taught my horse to target his belly to my leg when I am standing on a mounting block or rock or gate so that I can get on. He knows to position himself until his belly comes into contact with my leg, so that he is lined up to make it easy for me to just either swing a leg over or put my foot in the stirrup.
Together with good use of other objects such as mats or poles or pens to form posture or movement, we can use target training with positive reinforcement without ever associating the behaviour or ourselves or the environment in which we are training the horse with aversives.
Now wouldn’t that be good for the relationship!