We have recently launched a new Facebook group called Horse Charming for people who want to learn more about training with positive reinforcement.
You can apply to join us here https://www.facebook.com/groups/HorseCharming/
As part of the process for applying to join the group we asked people to explain what they understand by positive reinforcement.
We get a variety of responses that range from precise psychology definitions to the view that it involves being kind, firm but fair, or that it somehow involves using energy.
I decided to jot down some principles that we would hope would help people to better understand what training with positive reinforcement involves, and what it does not.
It does seem to be increasingly quite important to explain what it does not involve because there are many misunderstandings, maybe because people have seen others using a clicker to mark behaviour produced using traditional “aids” or perhaps they believe that it somehow involves food deprivation. While there are people who do routinely employ these techniques in an effort to get animals to do what they want, or to please other people, in our opinion these do not meet the description of training with positive reinforcement.
What we care most about is how the animal feels during training. We care much less about what he does in terms of any performance of behaviour.
Here are the principles that we regard as being consistent with a description of positive reinforcement:
1) The behaviour happens without an aversive prompt.
An aversive prompt or event is something the horse finds painful, uncomfortable, unpleasant, frustrating, irritating or annoying in and of itself.
Examples would be being approached by another animal or human that he fears, being touched or tickled (with the hand or an object) somewhere he doesn’t like, being tapped or hit with a stick, being squeezed, nudged, kicked or jabbed by the rider’s heel or spur, or being pulled by the rein or lead rope. It could also include having an object thrown or waved or swung towards him, such as a clod of earth, a stick, a flag or a string or rope.
If the horse appears to act to make something stop or go away, seeks to move away, seeks to move something away from himself, and repeats a behaviour that made that prompt stop or go away, it was probably aversive to the horse.
An aversive prompt is something that the horse has come to believe will result in something unpleasant (uncomfortable, painful, annoying, frustrating) to happen to them if they don’t act to avoid it (e.g. someone holding a stick in a position in which it has been held before being waved at or used to hit or tap the horse, or someone saying “Walk on!” before squeezing the horse with their leg, prior to kicking or jabbing with a spur, or someone changing body position or making a sound (like “Whoa”) before pulling the rope or rein.
2) The behaviour either occurs spontaneously, or it is prompted using a food lure or a target.
Behaviour that happens spontaneously is rarely accidental when we are proactively training. That doesn’t mean that we don’t opportunistically capture behaviour the horse performs incidentally, but most of the time we would work to intelligently arrange the environment so that the behaviour is more likely to occur.
An example of this would be laying a line of poles on the track where the horse lives so that he has to walk or trot over them to get around the track to the hay.
Horse owners regularly use food lures to produce behaviour. Eagerness to be caught is often produced by an owner giving a treat to the horse once haltered. Enthusiasm and confidence in a stable area might be produced by the horse finding that there are other horses there and some feed in the stable. Carrot stretches form a regular part of every conscientious horse owner’s exercise plan for their horse. All of these are forms of luring.
Target training involves teaching a horse to touch a specific object (called a target) with their nose or with any other body part. Target training can be used to produce any of the movements or postures that would be traditionally produced with aversives.
Target training is achieved using the natural tendency of animals to explore novel objects, together with positive reinforcement.
It is also possible to use physical manipulation to produce behaviour – handling a horse’s feet for cleaning and trimming often involves manipulation. Because this can often be disliked by horses at first we prefer to deal with that issue first using systematic desensitisation and counter conditioning before going on to mark and reinforce the horse for consenting to being handled physically.
Once behaviour has been taught using food lures and targets then it’s usual to switch to a learned cue for the behaviour and to fade out the lure or target in favour of reinforcing the behaviour these have been used to produce.
3) A performance of a correct behaviour results in something appetitive being gained by the horse.
Positive reinforcement is so called, because it involves the addition of something that the animal desires and which they are motivated to obtain or gain.
We use the + sign in the shorthand way of writing positive reinforcement to indicate that it involves addition. It can be written either as “+R” or “R+”.
The catch-all term for “things the animal likes and wishes to obtain” is “appetitive”. It’s easiest to remember this if you think that it’s something for which they have an appetite.
Positive reinforcement is said to have happened when the animal choses to repeat a behaviour because something they valued was added or gained as a consequence of (immediately following) their behaviour.
This will usually be food or scratches.
Food is singularly the most commonly used and most powerful form of positive reinforcer for as long as the animal has an appetite for that kind of food.
Never assume that the horse will find being scratched on a particular body part appetitive. It is necessary to seek the consent of the horse to be scratched and to test whether and where they enjoy it.
Not all horses will find scratching reinforcing, and their interest in being scratched will change seasonally.
When attempting to use scratching as a reinforcer, be aware that most horses will not find a couple of scratches for a few seconds in anyway motivating and they can become frustrated if they aren’t scratched “correctly”! Watch horses groom each other. It’s done with teeth and it last a good few minutes! If your fingertips are not on fire and your nails aren’t filled with grime, you need to try harder, or find a good hard scratching brush.
4) Typically, an audible marker signal (bridging stimulus) is used to pinpoint the moment in time when the animal has performed a desired behaviour.
This helps the animal to understand which behaviour is being reinforced and improves precision and comprehension.
The animal will be motivated to repeat whatever he was doing when you marked that behaviour. The food reinforcer must immediately follow the marker signal.
5) Comfort, safety, confidence and freedom from pressure are pre-requisites to training with positive reinforcement. These are neither denied nor withheld so as to achieve desired behaviour or to discourage unwanted behaviour. We do not create discomfort to motivate the animal to behave for the purposes of seeking comfort.
Confidence, comfort and safety and freedom from pressure are constants we aim to maintain at all times during training with positive reinforcement.
6) Positive reinforcement does not involve the removal of food and does not require (nor should it involve) the animal being in a state of hunger.
In fact hungry animals would be in a state of discomfort – and since this goes against principle 5, causing an animal to experience hunger is inconsistent with the objectives of positive reinforcement. This is partly because food deprivation creates anxiety and frustration – both of which are aversive states and partly because it can cause physical damage in horses, including ulcers or colic.
Using food with hungry animals could be considered to be negative reinforcement.
This is why, in our particular application of positive reinforcement, we always advocate that the animal has eaten for a period of time before training and that there is other food available to the horse when training with food, at least in the initial stages.
This gives the animal choices either to eat the food that is available or to engage in training to obtain additional or different food.
7) While scratching can be used as a reinforcer, this may or may not be a positive reinforcer to the animal.
For the chronically itchy animal, (such as an animal with sweet itch, severe fly bites or another skin irritation) being scratched could arguably be negatively reinforcing as it may provide temporary relief from discomfort.
Negative reinforcement increases the frequency of a behaviour that results in reduction or removal of something the animal finds unpleasant. In this case a behaviour that is followed by scratching may be strengthened because the intense irritation of their itching is temporary relieved by scratching.
That of course doesn’t mean we are suggesting you don’t scratch the itchy horse (although arguably as we know this can exacerbate some conditions, so it may not actually be advisable). We would prefer to recommend common ways to manage conditions like sweet-itch such as barrier fly rugs, oil based topical barrier creams and lotions and shelter from midges at critical times.
The difference of course between this and other types of negative reinforcement is that we don’t intentionally cause the horse to be itchy so as to use scratching as a reinforcer.
8) Positive reinforcement is a form of operant conditioning. It increases the strength or frequency (the animal is likely to repeat or make more effort to perform) of behaviour it immediately follows.
For positive reinforcement to be effective (and in fact for any form of reinforcement or punishment to be effective) it must be immediate.
Reinforcement must quickly follow the specific behaviour or stream of behaviour to be strengthened, and the reinforcer must of be a type and quantity that is valued by the animal.
The value of the reinforcer must exceed the cost to the animal of the effort required to perform the behaviour. It’s a net sum game!
If the behaviour is not seen (over time) to be increasing in strength or frequency, then either something is punishing it, the animal is obtaining more valuable reinforcement (positive or negative) for doing some other behaviour, or it is not being sufficiently well reinforced.
9) We do not reinforce the animal. It’s the behaviour that we are attempting to strengthen through reinforcement.
Only the animal can decide what they find reinforcing and punishing, and they tell us through their behaviour. The future behaviour of the animal tells us whether they found the outcome of their past behaviour to be reinforcing or punishing.
10) There is always an associative learning (classical conditioning) element to training with positive and negative reinforcement / with positive or negative punishment.
What that means is that the animal’s perception of us is formed on the basis of what they see us as representing in terms of their own emotional experience.
Our entire relationship is formed on the basis of what the animal associates with us and our behaviour.
Our behaviour can in essence, come to predict “treats” (something appetitive) or be seen as a threat.
With positive reinforcement training we attempt to associate ourselves with only good things and to avoid any use of coercion, force or threats.
We avoid making ourselves aversive to the horse and we try never to associate ourselves, or be the bearer of any kind of pressure or unpleasantness.