Using Bayesian inference to predict fishing success: It was surprisingly effective.
The mild weather earlier in January 2023 had passed and rural Surrey was in the grip of a bitter cold spell, something that’s generally not conducive to catching fish. In fact, as mentioned in the blog post about the first species of the year (the zander), I carried out a scientific study a few years back which demonstrated that I caught fewer zander when the weather was colder!
A simple way to think about surprise
In my
review of Bournville (see link above), I noted how effectively the author (Jonathan
Coe) had used sudden surprising revelations about some of the central
characters. I therefore decided that the link from that book to something
statistical would be an analysis of how scientists and/or statisticians might
think about, and model, the surprisingness of an event.
I'll introduce some statistical ideas first, before we see how we might analyse
the surprisingness of the results of the fishing trip on January 25th.
A
future event might be thought of as surprising if it is unpredicted.
This natural and obvious way to think about surprise requires us to consider at
least two separate timepoints, although they might be very close to each other
in time. The event of interest might (or might not) occur at the second
timepoint. At the first timepoint, occurring before the event of interest, an
individual's brain might have made a prediction about what happens at timepoint
two. For example, perhaps it is strongly predicted that a particular event will
occur at that later timepoint. The prediction is made based on any relevant
information about the world that was available at timepoint one (and also potentially
information carried forward from many previous timepoints). For the moment, we
can set aside questions of whether the prediction needs to be made explicitly
or intentionally. As discussed below, popular modern theories of how the brain
works suggest that it is continually, and largely automatically, making
predictions about future events, and then comparing those predictions with what
actually occurs. If the brain's prediction turns out to be wrong and the
predicted event does not occur, or happens to a lesser extent than was
expected, then there will be a certain amount of surprise, related to extent of
the prediction error.
Let's
make these ideas more concrete with some everyday examples. First, let's consider the scenario where a predicted event does not happen when it was expected to
occur. Imagine a Doomsday cult who predict that the world will end on a
particular date and time. The cult members might be huddled together in a bunker as that
moment approaches, expecting the end of the world. However, with no sign of the
apocalypse at the appointed hour, the members of the cult would presumably be
very surprised. It seems reasonable to suggest that an individual cult member's
degree of surprise would be proportional to the strength of their belief in the
prophecy of the world's end.
A
complementary scenario -- the occurrence of an unpredicted event -- can also
create surprise. As a cricket batsman you expect to face a series of balls
bowled at you by the opposition team's bowlers. The bowlers are always trying
to surprise you to a certain extent by varying exactly how fast they bowl and
where the ball will bounce before it reaches you. However, the laws of cricket
do not allow the bowler to bowl a ball which does not bounce AND is above waist
level when it reaches the batsman. Such an illegal ball is colloquially known
as a "beamer". It is therefore always a completely unexpected
occurrence -- a big surprise -- when the ball slips out of the bowler's hand
and flies directly towards your head or chest without bouncing. You can watch
a You Tube of
some of the worst examples from professional cricket. The surprisingness of the
event means that the batsman is often unable to take effective evasive action.
In the last
two decades the popular "predictive coding" theory (and
variations of that account) assumes that our brains actively try to make sense
of noisy information gleaned from the world around us, by continuously making
predictions or inferences based on that information. According to Bogacz
(2017), predictive coding suggests that the brain's "cortex infers the
most likely properties of stimuli from noisy sensory input. The inference in
this model is implemented by a surprisingly simple network of neuron-like
nodes. The model is called 'predictive coding’, because some of the nodes in
the network encode the differences between inputs and predictions of the
network." In other words, the models will compute the prediction
errors noted above.
Bogacz's
manuscript is a tutorial paper which is intended to help students and researchers
understand how to construct a variety of predictive coding models. It is quite
technical but has a simple underlying message. It also gives some
"toy" examples of model code, in Matlab, at the end of the paper.
The
first example Bogacz presents is a straightforward implementation of Bayesian inference and
makes no pretence of being at all brain-like. Later in the paper Bogacz shows
how brain-like neural models can do the same kind of prediction: the general
point is that brains have evolved to make predictions about future events by using
neural apparatus to approximate a version of Bayesian inference. In an earlier
blog (and
associated videos) I introduced the notion of Bayes' theorem and Bayesian
statistics. These ideas, of course, lie at the heart of Bayesian inference.
We will start by sticking to the example used by Bogacz for the first model: he considers trying to predict (or infer) the size of a spherical food item, which has a radius denoted v. In the model, the size prediction is based solely on the intensity of the food's visual image as captured by a light-sensitive receptor cell. The single cell's information is inevitably an error-prone (“noisy”) estimate of the light intensity (denoted u). In the model, the error associated with u is assumed to be normally distributed. It is also assumed that the mean level of estimated light intensity will be a function of the size of the food pellet, denoted g(v). The size of the food pellet directly influences (we might even say “causes”) the level of light intensity as detected by the cell. Bogacz uses g(v)=v2 in his example but that choice is not very important. This relationship means that u, in effect, “contains” some information about v. Bayesian inference allows one to leverage this information in u (the thing you can estimate roughly), to predict v (the thing you want to infer).
Bogacz combines the above relationships into Equation 1 of his paper:
Click
link to see equation 1
Expressed in words this means that the probability of detecting an image intensity u given (in mathematical notation | means "given") a food item of actual size v is described by a normal probability density function, or normal pdf, denoted f. We will see below what this pdf looks like. The pdf for u has a mean intensity of g(v) and a standard deviation (s.d.) of Σu. We also have some prior information about the likely size of the food object. This will have been learned through previous experience. Again, it is assumed that this can be expressed using another normal pdf, f. Bogacz writes the prior probability for v in Equation 3, thus:-
Click
link to see equation 3
In words
this means that the prior probability that the food is
of size v is described by a normal pdf with mean of vp
and an s.d. of Σp. You will probably be familiar with the
bell-shaped normal pdf. Below is an example where vp=3
and Σp=1 (these are values which Bogacz used). This pdf
implies that roughly 95% of the values of v will lie between 1 and 5. The scale for v is arbitrary.
Click link to see Equation 4
The evidence is just the product of the prior*likelihood summed up across all values of v. In mathematical terms this means integrating the prior*likelihood over all values of v. Bogacz gives this integral explicitly in Equation 5 of his paper.
Click link to see Equation 5
When we run Bogacz’ model using my slightly enhanced version of his code you get the figure below. In my code, and Bogacz’ original code snippet, the integration is done numerically. This involves dividing the range of possible values of v into very small intervals, computing the prior*likelihood for each interval, and adding them up. In his paper Bogacz just gives the posterior probability distribution, in red below.
Plotting the posterior and prior in one figure allows you to see easily two ubiquitous features of Bayesian inference: shrinkage and regularisation. A really good account of regularisation is provided in this blog.
Shrinkage results in the posterior distribution having less error (and thus a smaller standard deviation) than the prior distribution. This is clear in the above figure. We are more confident in our estimate of v (the size of the food pellet) after using the information from the light intensity estimate, u, than we were before (without the light intensity information). A sensible form of inference/prediction must lead to this effect whenever the information being employed in our inference is relevant. In this case the degree of shrinkage will be affected by the accuracy of our estimate of the light intensity. One can alter the s.d. of the light intensity estimates in my code to demonstrate this.
We
can now move on to consider a fishing inference/prediction problem by adapting
Bogacz's code for the above example. My experience is that, whenever someone
provides a model in a scientific paper, I never really understand it fully
until I have adapted it and changed it to capture an analogous but different
problem. This is exactly what I am going to illustrate below. In future blogs
we will build up other predictive coding accounts using more brain-like models
including ones that directly compute prediction errors.
How did the fishing go?
In fact, the fishing trip on Jan 25th was full of surprises. First, when I arrived at Clover Lake, its surface was frozen solid. I haven't often fished the lake in winter before, but this was the first time I had ever seen it with ice on it, let alone completely frozen over. It's quite a big lake and the temperatures overnight hadn't dipped that low (they hovered just below freezing) so it was even more surprising that the ice was still there in the late afternoon.
In
normal conditions, the easiest fish to catch in Clover Lake are gudgeon (Gobio
gobio). Normally a river species, these amazing, super-aggressive small
fish are everywhere in this lake for some unknown reason. They can often be a
pest, as they snatch baits meant for larger species. When fishing Clover Lake
in summer I would expect to have a gudgeon eating my bait within 5 minutes or
so. When I saw the ice covering the lake, I had no idea how active they might
be in midwinter. Nevertheless, I thought that I would gently break the surface
of the lake, and put some handfuls of ground bait in the resulting hole. Maybe
this would bring the gudgeon into my fishing area (this is known as a
"swim" and it’s the area in front of where you are sitting on the
bank). I used the butt end of my landing net handle (which is about 2m long) to
break the ice very gently. The hole I created was perhaps 1-1.5 metres from the
bank and about 1 metre across. At least it was close enough to throw the ground
bait accurately into the hole. A few minutes later I lowered my maggot-baited
hook through the ice hole to see what would happen. I did once do some proper ice
fishing on a huge frozen lake in Finland; that was completely different though
and much, much colder.
After
about 20-30 minutes I had the first clear signs of interest from a fish. Small
"knocks" (i.e., small sudden twitching movements) occurred on my
float as the fish on the lake bottom began nudging and tugging at the maggots
on my hook. After missing a few bites I eventually caught a gudgeon, about an
hour after I arrived at the lake. These tiny fish usually weight less than 1
ounce (28g). They have a pretty iridescent flank and blackish spots on their
tails. See the one I caught in the picture below.
I
packed up my gear, returned to the car, and reparked by Jenny's Lake. My
favourite roach swim was just about the only area of the lake that wasn't
completely frozen. At least I didn't have to break any ice this time. I had
roughly a 3m x 3m area of unfrozen water in front of me. Were there any active roach
(i.e., hungry and feeding) in this swim? How many (if any) might I catch? This scenario
and set of questions gave me the idea for a prediction problem to which I could
apply the Bayesian inference code from the Bogacz (2017) paper.
Using
Bayesian inference to make fishing predictions
Whenever I begin fishing, I am usually childishly excited (I lead a simple life). It stems from a mild thrill of the unknown: am I going to catch anything? Will I even get a bite? What will I catch? The time from when the first cast hits the water until the first bite is, for me, a key indicator. I have always used it as a rough way to predict how many fish are “available” in the swim in front of me. Intuitively, if the first bite comes quickly then my expectation is that there are lots of fish available, if it comes more slowly then there are fewer. The more fish that are available the more I am likely to catch. This act of prediction while I am fishing can be mapped directly onto Bogacz’s first model, where he tries to use the estimated intensity of a visual image (u) of a piece of food to estimate its size (v). Remember that this is also based on the intuitively reasonable belief that the image intensity is caused by the size of the food with a function, g, relating u to v; namely, u=g(v).
In
the analogous fishing inference problem, I am trying to predict the number of
fish available (v) to be caught in my swim from my mental estimate of the time
between the first cast and the first bite (u). The number of fish available directly affect the time to the first bite. The mental estimate of
the time elapsed is error-prone, as I don’t time it with a watch. This is just
like the Bogacz example where the brain cell’s estimate of the food’s image
intensity has error associated with it (it is a “noisy” estimate). If u
reflects v, then u can be used to predict v, even if the
value of u is known only approximately.
Next,
I need to consider the form of the functional relationship, u=g(v),
between the noisy time estimation value (u) and the thing I ultimately want
to infer (the number of available fish in my swim, v).
There
are lots of possibilities so I tried to come up with a plausible and reasonable
functional relationship. Later we will vary the function and see if it affects
the general pattern of our predictions. An “available” fish in my swim means
one that is present in front of me, is hungry, and one I am able to catch. The
total number available is therefore limited by the time limits implicit in each
cast of my fishing line (a single cast is an attempt to catch a fish).
Each
cast involves baiting the hook, throwing out the line, waiting to get a bite,
reeling in and, if successfully landed, detaching the fish safely before returning it to the water, and then re-baiting the hook. The minimum
period to do all of this is about 1 minute. I had about 45 minutes of roach
fishing in this swim before dark, so I could have a maximum of 45 bites in the
session and, at best, catch a maximum of 45 fish. The time to the first bite (which will be denoted by u) could come after 1 minute or after any other period up
to 45 minutes, or not at all. So, if the first bite occurred after 1 minute
there could be a maximum of 45 fish available to be caught in my swim. Obviously there may be more than 45 in the swim in total but in effect there are a maximum of 45 available that I could catch, because of the physical limits in the fishing process just discussed. If no
bites came in the 45 minutes this suggests there may be no available fish in my
swim.
A
simple non-linearly decreasing function between these two limits, for both u
and v, can be written as
u = Max(v)/(v+1)
where v ranges from 0 to 45 fish available in the swim and Max(v) =45. In the left-hand panel of the figure below you can see a plot of this non-linear relationship. The non-linear relationship might be a reasonable function because of a phenomenon called competitive feeding. Anglers love to suggest they catch more fish when competitive feeding is occurring. This effect means that the more fish there are in the swim the more quickly they might grab the bait in order to beat other fish to it (at least this should occur whenever there is limited food available). Thus, if there were 10 available fish in my swim then the time to the first bite would faster than a tenth of the time to the first bite if only a solitary fish was available in my swim. Hence, the function is non-linear.
An even simpler alternative function would be such that the estimate of the time to the first
bite is a linearly decreasing function of the number of fish available to catch from your swim. Using this relationship implies that there is no competitive feeding. The linear relationship is shown in the right-hand panel of the figure below. The full Matlab
code I used for the simulations in this blog, based on Bogacz’s code snippet, can be found here.
How long
did I have to wait for the first bite?
Very soon I had my actual estimate for u. In fact, it was only about 90 seconds before I had my first bite. The roach were clearly in my swim and were feeding and thus available to be caught. We will use the value of u=1.5 minutes to run our Bayesian inference below.
Before running the model, we can consider how surprising this value was in relation to the prior distribution? Our estimated prior mean of 10 fish available converts to waiting for a bite for 4.1 minutes using the non-linear function, and to 35 minutes using the linear function. Clearly, the rapid first bite on this freezing day came surprisingly quickly.
I landed the roach shown in the picture below on the first or second bite. Roach can occasionally grow to be 3 or 4 lbs in the UK but often they are little ones like this. This first roach weighed maybe 1-1.5 ounces.
What did our Bayesian inference model predict?
I
applied the model with the non-linearly decreasing function linking u and v, illustrated
above, coupled with an estimate for u=1.5 mins, with the s.d. for u and
the prior pdf for v set as described above.
The posterior probability distribution for the number of fish available had a mode (most common value) of 13.3. This is our inference about the likely number of fish available to be caught in 45 minutes, and we can think of it as the most likely number of bites I will have in the session. With my assumption that I catch a fish from three-quarters of the bites in a roach-fishing session, then this would lead to a prediction of about 10 fish caught. Given my past experience in this swim then it is very likely that the fish caught would almost all be roach.
Overall
Fishing stats?
After this
second trip my fishing stats looked like this: total fish for the year=33; total
number of species caught=4; number of species satisfying the challenge rules=2.
Here is a link to all the fish photos I
have taken during the 2023 challenge sessions (including this one), and another
link to a spreadsheet with all
the gory details. The blogs are lagging well behind, and so the spreadsheet
shows that I’ve caught more fish and additional species since this second trip.
References
Bogacz, R. (2017). A tutorial on the free-energy
framework for modelling perception and learning. Journal of
Mathematical Psychology, 76(Pt B), 198–211.
Although the models presented in this blog are for illustration purposes, there is something paradoxical about them that is bugging me. As it is untidy/slightly annoying, I'll need to create a post-script shortly to sort this out. As usual, I still need to check the compatibility of the code with Octave, too.
ReplyDeleteAn Octave compatible of the code for the model in this blog can be found at this repository: https://github.com/Alan-Pickering/Bayesian_inference
ReplyDelete