Sometimes debugging computer code can be very challenging. We do a lot of computer modeling in the lab and this case I discovered a bug in the process of having a computer model, which we call Tempus, attempt to respond similarly to human learners under the same conditions. This is done through a process called ‘fitting’, where you give a model some initial settings and then compare its behaviour with some target ( the human data) and then fiddle (for instance using gradient descent on the error) with those settings in such a way as to try and make the model produce data that looks more like the human. In this case, some training data was withheld from both the humans and the model during learning, in order to see how well the learning generalizes to novel stimuli. This is a comparison of the model with human data as of last fall:
These fits look decent but we then needed to modify our approach to run the model multiples times at each setting and take an average of its behaviour in order to describe the effect of noise. After implementing this change a quick look now showed the model to be doing quite poorly on this particular measure.
If this was the only thing that had changed, it would have been easier to narrow down the source of the problem. However, I had used this update as an opportunity to make a few other little adjustments to the model (it can be time consuming to verify output after every seemingly inconsequential commit).When I started to see poor results on this particular measure, I just assumed that it was just going to take the fitting process a bit of time to find some good settings again. When the fitting was unable to improve this measure however, I started to get concerned. What was going on? I decided to look at the single runs of the model instead of the noisy averages. I knew immediately after doing so that there was something wrong with the averaging. Single runs of the model were coming exceptionally close to the human output. An example subject and its associated fit looked like this:
To check this, I needed to look at the actual distribution of simulations that was being averaged. And here we find our snake in the grass; a piece of code that produces the programmer’s worst nightmare – an error that systematically retains the same shape and scale of its output as appropriate data.
modelIndividualTProbs = reshape(cell2mat(lookupTable(constParamRows,5)),length(constParamRows),... lengthTprobs);
What this code is designed to do is take data that is shaped like this:
1 0.5 1 0 0 1 0.5 0.5 1 1 0 0
And reshape it into data that looks like this:
1 0.5 1 0 0 1 0.5 0.5 1 1 0 0
That is, there are 4 simulations in this example, with 3 pieces of data each. Because the data originally came in a format that doesn’t let you distinguish the individual simulations very easily it needs to be massaged into a shape that has the same number of elements: 12, but is a 4×3 matrix instead of 12×1. But what does the code I had written do?
1 0 1 0.5 1 1 1 0.5 0 0 0.5 0
Notice that this is output is incorrect but it looks very similar to the kind of output we might expect. So rather than getting an average of:
0.625 0.25 0.75
0.625 0.5 0.5
The corrected piece of code should look like:
modelIndividualTProbs = reshape(cell2mat(lookupTable(constParamRows,5)),... lengthTprobs,length(constParamRows))';
Which breaks the individual simulations up as columns first in a 4×3 and then rotates the output 90 degrees. So now we can rest easy with results that make a bit more sense.