--- title: "Analyzing Pupil Data with GazeR" date: r format(Sys.time(), "%d %B, %Y") output: html_document ---

This is an R markdown file explaining pupil preprocessing functions contained in the gazeR package. The example dataset is from a lexical decison task that had individuals judge the lexicality of printed stimuli and cursive stimuli. Cursive stimuli are ambigious and non-segmented making them really hard to recognize. It is predicted that it should be harder to recognize cursive words than printed words. Good thing we have this new package to help us analyze the data!

One thing you have probably noticed while perusing the pupillometry literature is how variable pupil preprocessing can be across labs. It is quite common for researchers to develop in-house code, written in different programming languages (e.g., R, Python, or Matlab). This can make computational reproducibility difficult. In the current climate where replicality and transparencey are becoming the norm, some sort of standardization is needed. To this end, we have created several functions that will hopefully aid researchers in analyzing pupil data.

Preparing your data

For your own data, you will first have set your working directory. You should have a specified folder that has all your data. You will save this folder path.

rm(list = ls()) # clear ws
setwd("") # set working directory 
file_list <- list.files (path='', pattern=".xls") #folder path 

Before using this package, a number of steps are required: First, your data must have been collected using an SR Research Eyelink eye tracker. Second, your data must have been exported using SR Research Data Viewer software. Third, I recommend extending blinks 100 ms before and after a blink in Data Viewer. However, that is not neccessary as we have a function that will do that for you! A Sample Report must be generated with the following columns:


You should also include columns important for your experiment.

Load gazeR


Load packages

You will need these packages.


We need to read in the file that contains data from several participant. I could not host the whole data file becasue it is large (> 1 million rows).

pupil_path <- system.file("extdata", "Pupil_file1.xls", package = "gazer")

To use with your own data (that includes many Ss and could include millions of rows), the function merge_pupil_files will take all your pupil files from a folder path and merge them together. It will also: rename columns, make all columns lowercase, and adds a new column entitled time that places time in ms instead of tracker time.

You will first have set your working directory. You should have a specified folder that has all your data. You will save this folder path.

Then you apply the merge_pupil_files function.

Behavioral Data

If you are also interested in analyzing behavioral data (RTs and accuracy), the behave_data function will cull the important behavioral data from the Sample Report. The function will return a data frame without errors when omiterrors=TRUE or a data frame with errors for accuracy/error analysis when omiterrors=FALSE. The columns relevant for your experiment need to be specified within the behave_col names argument. This function does not eliminate outliers, however. You must use your preferred method. I recommend Jonathan Gange trimr package to do this.

behave_data<-behave_pupil(pupil_files, omiterrors = FALSE, behave_colnames = c("subject","script","alteration", "trial", "target","accuracy","rt", "block", "cb"))
##      subject  script alteration trial     target accuracy   rt block cb
## 1        10b   print       word     1 sprigp.png        1 2539     0  2
## 960      10b cursive       nwtl     2  nypmh.png        1 3254     0  2
## 2117     10b Cursive       nwtl     3 seivep.png        0 1755     0  2
## 2882     10b cursive       word     4  mourn.png        1 2435     0  2
## 3821     10b Cursive       word     5  noisy.png        1 2200     1  2
## 5197     10b Cursive       word     6  ridge.png        1 1952     1  2

In our data, we want to remove subject accuracy lower than .75 percent and item accuracy below .60 percent. We can take the file generated above to calculate this when argument omiterrors=FALSE. We then merge accuracy by items and subejects into the main pupil file.

itemacc<-behave_data %>% 
  dplyr::group_by(target) %>% 
  dplyr::summarise(meanitemacc = mean(accuracy[block>0 & alteration=="word"])) #overall item accuracy
subacc<-behave_data %>% 
  dplyr::group_by(subject) %>% 
  dplyr::summarise (meansubacc = mean(accuracy[block>0 & alteration=="word"]))#subject accuracy
dataraw1<-merge(pupil_files, itemacc)#merge into main ds
dataraw2<-merge(dataraw1, subacc)#merge into main ds

Let’s now clean up our data a bit. First, we will transform pupil size from arbituary units to mm. Second, we will remove practice blocks, incorrect responses, conditions that are not words, subjects with accuracy below 75%, and items with accuracy below 60%.

pupil_files1<-dataraw2 %>%
  dplyr::mutate(pupilmm= (pupil*5)/5570.29) %>%
  dplyr::filter(block>0  & accuracy==1 & alteration=="word" & meanitemacc>.60 & meansubacc>.74) %>% 
  arrange(subject, target, trial, time)

Pupil Preprocessing is now ready to begin!

Interpolation of Pupil Values

Pupil values need to be linearlly interpolated. The interpolate_pupil function sets all pupil values with blinks to NA and linearlly interpolates those values and returns a tibble with a column containing interpolated values. if you had Data Viewer extend blinks, the extendblink argument must be set to FALSE. If you used our function, the extendblink argument should be set to TRUE.It is important to note that SR only extends the blink column and does not set blinks to NA in the sample report.

  pup_interp<-interpolate_pupil(pup_sd1, extendblinks=FALSE, type="linear")
## Turning pupil size with blinks to NA
## Performing linear interpolation

For a sanity check, let’s make sure that our interpolation did it was supposed to do. In the plot below we have the uninterpolated data (in black) and our interpolated data (in red) for one trial.

Looks good!

Median Absoulate Deviation (MAD)

Artifacts can arise when the size of the pupil changes too quickly (Kret & Sjak-Shie, in press). The max_dilation function calculates the nomralized dilation speed, which is the max absolute change between samples divided by the temporal speration for each sample, preceding or succeeding the sample. To detect outliters, the median abosulate deviation is calcualted from the speed dilation variable, multipled by a constant, and added to the median dilation speed variable–values above this threshold are then removed.

  group_by(subject, trial) %>% 

mad_pup<-max_pup %>% 
  group_by(subject, trial) %>% 

mad_removal<-mad_pup %>% 
  filter(speed < MAD)



Pupil data can be extremely nosiy! One way to reduce some of this noise is to use a n-point moving average to smooth the data. You can do this using the movingaverage function. To use this function, you need to specificy the column that contains the interpolated pupil values and how many samples you want to avearge over. In this example, we use a 5-point moving average (n= 5). However, you can set the moving average to whatever you would like.

rolling_mean_pupil_average<-as.data.frame(mad_removal) %>% #must be in a data.frame
  dplyr::select(subject, trial,target, script,alteration, time, interp, sample_message) %>%
  dplyr::mutate(movingavgpup= movingaverage(mad_removal$interp,n=5))

To make life easier, let’s place the data into timebins (useres can specifiy a timebin to use). For this example, we will use 200 ms timebins.


To make life easier, let’s place the data into timebins (useres can specifiy a timebin to use).The downsample.pupil function takes your data and a specified bin length as arguments and returns downsampled data.

timebins1<- downsample.pupil(rolling_mean_pupil_average, 200)

Baseline correction

To control for variability arising from non-task related processing, baseline correction is commonly done. There several different types of baseline correction. In a recent paper by Mathot (2018), it was reccomended that a subtractive baseline correction be done based on the median. The baseline_correction_pupil function finds the median pupil size during a specified baseline period for each trial. In this exmaple, we used a baseline window between 500 ms and 1000 ms (when the fixation cross was on screen) and subtracted this value from each pupil trace.

baseline_pupil<-baseline_correction_pupil(timebins1, baseline_window=c(500,1000))
## Calculating baseline
## Calculating median baseline from:500-1000
## Merging baseline
## Performing baseline correction

Trial Clipping

The stimulus of interest comes on screen 1 s after trial onset. To start our trial at zero, and not 1 s, we take each time bin and subtract it from the first. Next, we trucnate the trial to 4200 ms–this seems to be when conditions begin to diverge for our data after visual inspection.

baseline_pupil_onset<-baseline_pupil %>% 
  dplyr::filter(time > 1000) %>% 
  mutate(timebinonset=time - time [[1]]) %>% 
  dplyr::filter(timebinonset <=3000)

Aggreating Data

We have a whole lot of data. To reduce the amount of data we are working with, we aggregate the data across subject,conditon, and bin onset. This produces an average pupil diameter for each bin,subject, and condition.

agg_subject<-baseline_pupil_onset %>% 
  dplyr::group_by(subject, script,timebinonset) %>% dplyr::summarise(baselinep=mean(baselinecorrectedp)) %>% 


Pupillary Time Course

We can see here that recognizing cursive words resulted in a larger pupillary response.

## Loading required package: lattice
## Loading required package: plyr
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
##     compact
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##     set_names
## The following object is masked from 'package:tidyr':
##     extract
runningT <- cursive_new %>% 
  dplyr::filter(cursive_new$timebinonset<=2500) %>% 
  split(.$timebinonset) %>% 
  purrr::map(~t.test(baselinep~script, paired  = TRUE, data=.))
runningSE <- cursive_new %>%
  dplyr::filter(cursive_new$timebinonset<=2500) %>% 
  split(.$timebinonset) %>% 
  map(~Rmisc::summarySEwithin(data = ., measurevar = "baselinep", withinvars = "script", idvar="subject"))

cur1 <- dplyr::filter(cursive_new, cursive_new$timebinonset<=2500)

pvals <- data.frame(
  Time = unique(cur1$timebinonset),
  p.value = map_dbl(runningT,"p.value")

pvals$crit <- 0+(pvals$p.value <= .05)
pvals$crit[pvals$crit == 0] <- NA

WSCI <- map_df(runningSE, extract) %>%
    Time = rep(unique(cur1$timebinonset), each = 2) 
    #Note, you'll have to change 2 to match the number of conditions

levCat.plot <- ggplot(WSCI,aes(Time, baselinep))+
  scale_color_brewer(palette = "Set1")+

l1<- levCat.plot+
  stat_summary(fun.y = "mean",geom = "line",size = 1,aes(colour = script))+
  labs(x = "Time (ms)",y = "baseline-corrected", colour = "")+
  geom_hline(yintercept = 0,linetype = "dashed")

l2<- l1 +
  geom_ribbon(data = WSCI, aes(ymin = baselinep-ci, ymax = baselinep+ci,
                               fill = script, colour = script), 
              linetype="dashed", alpha = 0.3)+
  guides(fill = "none")+
  stat_summary(fun.y = mean, geom = "line", size = 1, aes(colour = script))+
  labs(x = "Time (ms)", y = "Baseline-corrected Pupil (in Arbitrary Units)", colour = "")+
  geom_line(data = pvals, aes(x = Time, y = crit-3),na.rm = TRUE,size = 2)+
  geom_hline(yintercept = 0, linetype = "dashed")


Mean Pupil Size

You may want to examine the mean pupil size, as this is one of the most popular DVs in the literature.

mean_pup<-subset(cursive_new, cursive_new$timebinonset<=2500) %>% 
  dplyr::group_by(subject, script) %>%
  dplyr::summarise(meanpup=mean(baselinep), maxpup=max(baselinep)) %>%

 mean<-ggplot(mean_pup, aes(script, meanpup, fill = script))  + 
    geom_point(shape = 21, color = "black", size = 2.3, alpha = 0.6, position = position_jitter(w =   0.03, h = 0))   + stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax = "mean", geom = "crossbar", color = "black", size = 0.4, width = 0.3) + labs(y="Mean Pupil Size", x="Script", title= "Mean Pupil Size (in Arbitrary Units)")

Max Pupil Size

or you may want to examine max pupil size.

max_pup<-subset(cursive_new, cursive_new$timebinonset<=2500) %>% 
  dplyr::group_by(subject, script) %>%
  dplyr::summarise(maxpup=max(baselinep)) %>%
  max<-ggplot(max_pup, aes(script, maxpup, fill = script))  + 
    geom_point(shape = 21, color = "black", size = 2.3, alpha = 0.6, position = position_jitter(w =   0.03, h = 0))   + stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax = "mean", geom = "crossbar", color = "black", size = 0.4, width = 0.3)+labs(y="Max Pupil Size", x="Script", title= "Max Pupil Size (in Arbitrary Units)")