Dealing with “drift” in desk-mounted eye-tracking experiments

Despite what this blog may imply, we manage to get a lot of sciencing done around here. (In fact, a whopping 25% of all starcraft-related activities conducted in this lab are research-related, but I’ll let someone else fill you in on those details). In addition to struggling with big picture questions about human cognition, though, we also need to address the nitty-gritty details of – and problems with – collecting data with an eye-tracker.

Here in the cogslab we use four desk-mounted Tobii X120 eye-trackers for data analysis. These machines record the location of your eye-gaze 120 times in a single second – that is, once every 8.3 milliseconds. Wow! As you can imagine, a single participant generates thousands and thousands of data points. In an experiment with 480 trials that takes about 45 minutes to complete, for example, the average participant yields 201,200 samples. We use a modified dispersion threshold (Salvucci & Goldberg, 2000) to condense this raw data into “fixations” by identifying points on the screen the eyes pause at (or “fixate”). Fixations are essentially [x,y] coordinates and durations, and our average participant in the aforementioned experiment has only 4215 of these.

The next step is coding these fixations in a meaningful way that allows us to perform data analysis on a participant’s distribution of attention. We do this by identifying areas of interest (AOIs) that, in a category learning experiment, correspond to the locations of category features. Our AOIs are typically defined as circles around the location of features with an extra 40 pixels of padding:

A quick mock-up of how AOIs are defined.

If a fixation falls within an AOI, it receives a special location code. Later, we use the experiment-level properties of an individual subject (since everything is counter-balanced) to determine what was inside each of these AOIs, and each fixation within a location receives a special functional relevance code. Here, for example, is a scatterplot of the locations of each fixation for a participant in a recent experiment. This shows all the fixations made during the part of a trial when they were looking at the stimulus (as opposed to the part of the trial where they are looking at feedback). Fixations that fall outside the AOIs I’ve defined for this stimulus are colored black. Fixations that fall within the AOIs are color-coded based on their functional relevance.

Fixations during the stimulus phase

A little complicated, but pretty straight-forward, right? Not quite! You see, we like using desk-mounted eye-trackers because it allows people to sit comfortably in front of the computer while playing our categorization games as the eye-tracker quietly and unobtrusively records away. If human beings were perfectly-still robots, this set-up would be perfect. Alas, we are not. Even the most diligent participants can slowly begin to slouch over the course of an hour, or lean forward a bit during a block, or sit slightly differently than they did during the initial eye-tracker calibration. Although the Tobii X120 can generally deal with a bit of drift, and relative locations of eye-movements tend to be preserved, it (understandably) can’t quite get absolute locations correct if someone’s head is in a different position than it was to begin with.

Here, again, is a scatterplot of all the fixations made by subject 1195 in one of our older experiments. Remember, black fixations are coded as fixations outside of the AOI, and coloured fixations are coded as functionally relevant:

All of a subject's fixations during the "stimulus" phase.

If we had just looked at the data without visualizing the fixations, we might think that this participant was able to learn the categories by only looking occasionally at a single feature! Clearly, this isn’t so. The participant is definitely making fixations to features – we just aren’t coding them correctly.

In order to correct for this “drift”, Michael, Gordon, and I sat down and brainstormed a number of different ways to move the centre of the display so that the fixations are coded in a reasonable way. As humans with eyes, we can easily pick out where these fixations should be – but because this is science, we need a principled way of detecting and shifting these fixations. After a bit of discussion I was able to implement a set of tools that ultimately give us better data and a more accurate picture of what our participants are looking at.

Here’s how it works: at the beginning of each trial in our experiments, participants are required to look at a central fixation cross. We can safely assume that most participants (although not all) are looking at this fixation cross during this phase of a trial. (In addition to outright asking them to please look at the fixation cross during this part of the trial, we have a number of ways of encouraging eye-movements to this area, depending on the experiment.) My algorithm uses the distribution frequency on the x and y axis of fixations during this “fixation cross” part of the trial to determine the amount of x and y drift that has occurred in the experiment. Because drift can get more extreme at the end of the experiment than it is at the beginning of the experiment (a participant may have shifted in her chair during a break, for example), the algorithm uses a sliding window to determine the correction. For example:

The position of a participant's eye-gaze relative to the "true" position of the screen's centre during the beginning of a trial

The above image shows all the fixations subject 1195 made at the beginning of the trial, during the “fixation cross” phase. Each ‘o’ marker represents the detected centre of the screen (assuming the participant is gazing at the centre of the screen) using a sliding window size of 25. (All this means is that corrections for trial i are based on the max [x,y] distribution of fixations during the fixation cross phase from trial i-12, trial i, and trial i+12.) We apply the same sliding window to subjects in a given experiment. So, how well does this work?

Fixations with offsets applied. Those that fall within the circles will now be counted as fixations to features.

Pretty neat, huh? I believe that this method really helps to clean up data. For our friend subject 1195, we now have 1572 fixations inside AOIs instead of 78! In the 480-trial experiment I described at the beginning of this post, this method was able to increase the total detected number of fixations to areas of interest by 14,777. Participants who have good, clean data already are essentially unaffected by this method; and participants who we would normally disregard for lack of data can now be included in our analyses. It is important to keep in mind that this method introduces two sources of experimenter bias, though I think most would agree that these are acceptable (that is, preferable to having no data at all): first, the experimenter needs to determine the best window size for the offset calculation; and second, the experimenter needs to inspect fixation scatterplots like the one above to ensure that they do not make the fixations wildly worse. For example, some participants like to look at the location of their favorite feature even during the fixation cross stage of a trial:

This subject looks at a feature location before the feature appears

The algorithm is unable to account for participants who do this, so it is possible to exclude subjects like this from the offset correction. I’m not actively working on ways to improve this at the moment, though I am keeping in mind possible ways to more objectively evaluate which sliding window should be used on an experiment (eg, discard participants like 1904 for whom the calculation will not work; then, assume that the correction works on the remaining participants, and calculate the number of fixations lost and the number of fixations gained given a set of potential sliding windows; go with the window size that recovers the most fixations in areas of interest). From experience, I can tell you that mouse-driven experiments can handle smaller window-sizes, while joystick-driven experiments need larger window-sizes (because people are quicker to advance past the fixation cross during joystick-driven experiments, so there are fewer fixations per trial during that phase of the trial). The key is encouraging participants to look at that fixation cross – through interface design and through explanation and encouragement.

Another problem the algorithm is unable to account for is “squishers” – people who lean away from the computer partway through the experiment. Eyes that are farther from the screen don’t need to move as much in order to foveate the category features, and the eye-tracker registers a pattern that implies the three features were very close to each other in the centre of the screen. This happens very rarely, however, because our RAs are very diligent about asking people to remain seated in the same position for the entirety of the experiment.

I’m a nerd, so if you’re from an eye-tracking lab and would like to discuss this kind of thing further, drop me a line (kmm1 at