One very common question I see from people who are learning about positive reinforcement is this. “Will I always need to use food?” or “When can I fade out the food?” or “Other trainers say that food is just needed to kick-start behaviours and then the horse will learn to like doing it and do it for its own sake. Why am I still having to use food?”
It’s a fascinating set of questions because it got me reflecting on the alternatives to using food in training, and whether the aversive can ever be “faded out”.
Let’s clear up some of the misconceptions.
There is one situation in which you can very often discontinue the use of food in training and that is when food is being used for perception modification.
That involves following a very mild / weak (small, quiet, far away, short lasting) stimulus with something appetitive in an effort to switch (from fear to happy anticipation) the perception the animal has about that stimulus.
An example would be following the bark of a single small dog in the distance with a handful of pony nuts for a pony who is worried about dogs that jump and run and bark around him (while ensuring the dog could not come any closer). Pretty soon the horse would come to associate the dog bark with food coming. Done skilfully and gradually this can change the horse’s perception of dogs as a threat.
We do use counter-conditioning regularly, together with systematic desensitisation (which is more or less what I have described above), and sometimes we use it on its own.
An example of using it on its own would be feeding the horse some pieces of carrot while watching the dustcart / dustbin lorry drive slowly past his field gate.
If you use food as counter-conditioning for the purposes of perception modification then it’s perfectly possible to discontinue the use of food once the animal has lost their fear of a stimulus or situation.
Perception modification changes behaviour by changing how the animal responds emotionally to stimuli. An animal who is no longer fearful will not try to run away.
However, if we are using positive reinforcement, which involves adding an appetitive stimulus as a consequence of a behaviour in an effort to strengthen it, then like negative reinforcement, the behaviour must continue to be reinforced to be maintained.
A behaviour that no longer produces reinforcement will begin to weaken. Imagine you were working for someone who paid you either in cash or board and lodgings and food to work for them.
If they stopped paying you, or paid you less, or paid you less frequently or sometimes didn’t provide any food for you at all, or provided food that you didn’t want to eat, would you want to continue to put in the same effort?
The same is true with behaviours elicited using aversive stimuli (also sometimes called pressure – anything the animal finds psychologically or physically unpleasant).
For example, if when using an aversively trained cue (sometimes called an aid) for a behaviour, we failed to enforce the cue with an actual aversive stimulus for non-response, we would expect the behaviour produced by that cue to weaken because it would no longer associated with an aversive. And if we failed to remove the aversive or we continued to use the conditioned stimulus (the aversively trained cue) when the animal did the correct behaviour then we would also expect responsiveness to weaken (or the animal to try a different behaviour) due to lack of reinforcement.
Responses to actual aversives (things the animal automatically finds unpleasant) are called escape behaviours. It’s escaping from the aversive that provides the relief that acts as reinforcement.
Responses to conditioned aversives (commands or aids that the animal has learned are predictors of aversive onset) are avoidance behaviours. The behaviour is performed to avoid aversive onset. In order for this to happen the aid has to predict aversive onset and have been “fear-conditioned”. The animal responds to the aid in fear of it escalating to an actual aversive.
Both of these are forms of reinforcement. Escape and avoidance behaviours are all negatively reinforced – either by making an aversive stop or avoiding it being applied.
An example of escape behaviour would be the horse coming to a halt when the rein pressure is applied to the bit. For that behaviour to be reinforced, the bit pressure must immediately be removed for a correct response.
An example of avoidance behaviour would be the horse coming to a halt when we breathe out, sit deeper into the saddle, or say “whoa”, because we consistently follow those things by rein pressure to the bit or noseband if the horse does not halt. The horse expects rein pressure for non-response and so acts when he perceives those other cues, to avoid the rein pressure.
Both forms of the behaviour of coming to a halt are negatively reinforced.
What this means is that when we handle or ride our horses correctly (in this case I mean using negative reinforcement correctly) in traditional riding or using classical or natural horsemanship methods, every movement the horse makes is negatively reinforced either by aversive escape or aversive avoidance.
When we handle or ride our horses using positive reinforcement our aim is to produce the behaviour without anything that causes the horse to seek to escape or avoid of something aversive.
Instead we elicit the behaviour without using anything that is an actual aversive or a threat of an aversive, and we reinforce the behaviour by (usually, for precision purposes) marking it and then adding something appetitive as a reinforcer.
But, just like negative reinforcement, we still have to reinforce that behaviour to maintain it. There has to be something in it for the horse to make the effort to perform the behaviour in preference to doing his own thing.
Horses don’t do anything much that involves effort without some form of reinforcement. There is of course reinforcement in searching for forage and playing with friends (if you are in a playful mood).
But a horse trained using either aversives or appetitives would never choose to perform a dressage test, jump around a cross country course or walk, trot and canter on an endurance ride for 25 miles right past a plentiful supply of food under his feet, without very frequent reinforcement – negative or positive.
So if a trainer suggests to you that it’s possible to train and maintain behaviours without ongoing positive reinforcement with a primary reinforcer – food (which might be given less frequently once you have trained the horse to perform behaviours for longer – we call this shaping for duration) then think again about whether this really makes much sense.
Yes, it is possible for horses to find it reinforcing to go out for walks, in company with others, on foraging expeditions with quite intermittent additional reinforcement from us because they find that activity enriching and they do so in an expectation of finding reinforcement in the hedges or verges.
But it’s unrealistic to expect to be able to do something like dressage or jumping or behaviours involving a lot of physical effort without regular and indeed very frequent reinforcement.
And there’s nothing wrong with that. We all, I think need to realise that there’s no shame in providing ongoing positive appetitive reinforcement for desired behaviour. If we aren’t doing that then we don’t have any option but to be using a lot of negative reinforcement. Because there are only two kinds.
I am very happy to regularly positively reinforce behaviour produced without pressure.
If I wasn’t using appetitives (food mainly) to train repeatable desireable behaviours I’d be having to use some kind of aversive reinforcement (escape or avoidance) for EVERY move my horse was making.
And for reasons that have to do with my own ethics and desire for a “different” type of relationship, I’d rather not be associated with anything the horse finds unpleasant.
So using lots of food to maintain behaviours as I develop them is no big deal for me.
We have a choice, but it’s really only a choice between two things. I am much more comfortable using food on an ongoing basis than I ever was or will be using aversives.