--- title: "Analyzing Pupil Data with GazeR" date: r format(Sys.time(), "%d %B, %Y") output: html_document ---
This is an R markdown file explaining pupil preprocessing functions contained in the gazeR package. The example dataset is from a lexical decison task that had individuals judge the lexicality of printed stimuli and cursive stimuli. Cursive stimuli are ambigious and non-segmented making them really hard to recognize. It is predicted that it should be harder to recognize cursive words than printed words. Good thing we have this new package to help us analyze the data!
One thing you have probably noticed while perusing the pupillometry literature is how variable pupil preprocessing can be across labs. It is quite common for researchers to develop in-house code, written in different programming languages (e.g., R, Python, or Matlab). This can make computational reproducibility difficult. In the current climate where replicality and transparencey are becoming the norm, some sort of standardization is needed. To this end, we have created several functions that will hopefully aid researchers in analyzing pupil data.
For your own data, you will first have set your working directory. You should have a specified folder that has all your data. You will save this folder path.
rm(list = ls()) # clear ws setwd("") # set working directory file_list <- list.files (path='', pattern=".xls") #folder path
Before using this package, a number of steps are required: First, your data must have been collected using an SR Research Eyelink eye tracker. Second, your data must have been exported using SR Research Data Viewer software. Third, I recommend extending blinks 100 ms before and after a blink in Data Viewer. However, that is not neccessary as we have a function that will do that for you! A Sample Report must be generated with the following columns:
You should also include columns important for your experiment.
You will need these packages.
library(gazer) library(tidyverse) library(zoo)
We need to read in the file that contains data from several participant. I could not host the whole data file becasue it is large (> 1 million rows).
pupil_path <- system.file("extdata", "Pupil_file1.xls", package = "gazer") pupil_files<-read.table(pupil_path)
To use with your own data (that includes many Ss and could include millions of rows), the function
merge_pupil_files will take all your pupil files from a folder path and merge them together. It will also: rename columns, make all columns lowercase, and adds a new column entitled time that places time in ms instead of tracker time.
You will first have set your working directory. You should have a specified folder that has all your data. You will save this folder path.
Then you apply the
If you are also interested in analyzing behavioral data (RTs and accuracy), the
behave_data function will cull the important behavioral data from the Sample Report. The function will return a data frame without errors when omiterrors=TRUE or a data frame with errors for accuracy/error analysis when omiterrors=FALSE. The columns relevant for your experiment need to be specified within the behave_col names argument. This function does not eliminate outliers, however. You must use your preferred method. I recommend Jonathan Gange trimr package to do this.
behave_data<-behave_pupil(pupil_files, omiterrors = FALSE, behave_colnames = c("subject","script","alteration", "trial", "target","accuracy","rt", "block", "cb")) print(head(behave_data))
## subject script alteration trial target accuracy rt block cb ## 1 10b print word 1 sprigp.png 1 2539 0 2 ## 960 10b cursive nwtl 2 nypmh.png 1 3254 0 2 ## 2117 10b Cursive nwtl 3 seivep.png 0 1755 0 2 ## 2882 10b cursive word 4 mourn.png 1 2435 0 2 ## 3821 10b Cursive word 5 noisy.png 1 2200 1 2 ## 5197 10b Cursive word 6 ridge.png 1 1952 1 2
In our data, we want to remove subject accuracy lower than .75 percent and item accuracy below .60 percent. We can take the file generated above to calculate this when argument omiterrors=FALSE. We then merge accuracy by items and subejects into the main pupil file.
itemacc<-behave_data %>% dplyr::group_by(target) %>% dplyr::summarise(meanitemacc = mean(accuracy[block>0 & alteration=="word"])) #overall item accuracy subacc<-behave_data %>% dplyr::group_by(subject) %>% dplyr::summarise (meansubacc = mean(accuracy[block>0 & alteration=="word"]))#subject accuracy dataraw1<-merge(pupil_files, itemacc)#merge into main ds dataraw2<-merge(dataraw1, subacc)#merge into main ds
Let’s now clean up our data a bit. First, we will transform pupil size from arbituary units to mm. Second, we will remove practice blocks, incorrect responses, conditions that are not words, subjects with accuracy below 75%, and items with accuracy below 60%.
pupil_files1<-dataraw2 %>% dplyr::mutate(pupilmm= (pupil*5)/5570.29) %>% dplyr::filter(block>0 & accuracy==1 & alteration=="word" & meanitemacc>.60 & meansubacc>.74) %>% arrange(subject, target, trial, time)
Pupil Preprocessing is now ready to begin!
Pupil values need to be linearlly interpolated. The
interpolate_pupil function sets all pupil values with blinks to NA and linearlly interpolates those values and returns a tibble with a column containing interpolated values. if you had Data Viewer extend blinks, the extendblink argument must be set to FALSE. If you used our function, the extendblink argument should be set to TRUE.It is important to note that SR only extends the blink column and does not set blinks to NA in the sample report.
pup_interp<-interpolate_pupil(pup_sd1, extendblinks=FALSE, type="linear")
## Turning pupil size with blinks to NA
## Performing linear interpolation
For a sanity check, let’s make sure that our interpolation did it was supposed to do. In the plot below we have the uninterpolated data (in black) and our interpolated data (in red) for one trial.
Artifacts can arise when the size of the pupil changes too quickly (Kret & Sjak-Shie, in press). The max_dilation function calculates the nomralized dilation speed, which is the max absolute change between samples divided by the temporal speration for each sample, preceding or succeeding the sample. To detect outliters, the median abosulate deviation is calcualted from the speed dilation variable, multipled by a constant, and added to the median dilation speed variable–values above this threshold are then removed.
max_pup<-pup_interp%>% group_by(subject, trial) %>% mutate(speed=speed_pupil(interp,time)) mad_pup<-max_pup %>% group_by(subject, trial) %>% mutate(MAD=calc_mad(speed)) mad_removal<-mad_pup %>% filter(speed < MAD) mad_removal<-as.data.frame(mad_removal)
Pupil data can be extremely nosiy! One way to reduce some of this noise is to use a n-point moving average to smooth the data. You can do this using the
movingaverage function. To use this function, you need to specificy the column that contains the interpolated pupil values and how many samples you want to avearge over. In this example, we use a 5-point moving average (n= 5). However, you can set the moving average to whatever you would like.
rolling_mean_pupil_average<-as.data.frame(mad_removal) %>% #must be in a data.frame dplyr::select(subject, trial,target, script,alteration, time, interp, sample_message) %>% dplyr::mutate(movingavgpup= movingaverage(mad_removal$interp,n=5))
To make life easier, let’s place the data into timebins (useres can specifiy a timebin to use). For this example, we will use 200 ms timebins.
To make life easier, let’s place the data into timebins (useres can specifiy a timebin to use).The
downsample.pupil function takes your data and a specified bin length as arguments and returns downsampled data.
timebins1<- downsample.pupil(rolling_mean_pupil_average, 200)
To control for variability arising from non-task related processing, baseline correction is commonly done. There several different types of baseline correction. In a recent paper by Mathot (2018), it was reccomended that a subtractive baseline correction be done based on the median. The
baseline_correction_pupil function finds the median pupil size during a specified baseline period for each trial. In this exmaple, we used a baseline window between 500 ms and 1000 ms (when the fixation cross was on screen) and subtracted this value from each pupil trace.
## Calculating baseline
## Calculating median baseline from:500-1000
## Merging baseline
## Performing baseline correction
The stimulus of interest comes on screen 1 s after trial onset. To start our trial at zero, and not 1 s, we take each time bin and subtract it from the first. Next, we trucnate the trial to 4200 ms–this seems to be when conditions begin to diverge for our data after visual inspection.
baseline_pupil_onset<-baseline_pupil %>% dplyr::filter(time > 1000) %>% mutate(timebinonset=time - time []) %>% dplyr::filter(timebinonset <=3000)
We have a whole lot of data. To reduce the amount of data we are working with, we aggregate the data across subject,conditon, and bin onset. This produces an average pupil diameter for each bin,subject, and condition.
agg_subject<-baseline_pupil_onset %>% dplyr::group_by(subject, script,timebinonset) %>% dplyr::summarise(baselinep=mean(baselinecorrectedp)) %>% ungroup()
We can see here that recognizing cursive words resulted in a larger pupillary response.
data(cursive_new) library(purrr) library(Rmisc)
## Loading required package: lattice
## Loading required package: plyr
## You have loaded plyr after dplyr - this is likely to cause problems. ## If you need functions from both plyr and dplyr, please load plyr first, then dplyr: ## library(plyr); library(dplyr)
## ## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr': ## ## arrange, count, desc, failwith, id, mutate, rename, summarise, ## summarize
## The following object is masked from 'package:purrr': ## ## compact
## ## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr': ## ## set_names
## The following object is masked from 'package:tidyr': ## ## extract
runningT <- cursive_new %>% dplyr::filter(cursive_new$timebinonset<=2500) %>% split(.$timebinonset) %>% purrr::map(~t.test(baselinep~script, paired = TRUE, data=.)) runningSE <- cursive_new %>% dplyr::filter(cursive_new$timebinonset<=2500) %>% split(.$timebinonset) %>% map(~Rmisc::summarySEwithin(data = ., measurevar = "baselinep", withinvars = "script", idvar="subject")) cur1 <- dplyr::filter(cursive_new, cursive_new$timebinonset<=2500) pvals <- data.frame( Time = unique(cur1$timebinonset), p.value = map_dbl(runningT,"p.value") ) pvals$crit <- 0+(pvals$p.value <= .05) pvals$crit[pvals$crit == 0] <- NA WSCI <- map_df(runningSE, extract) %>% mutate( Time = rep(unique(cur1$timebinonset), each = 2) #Note, you'll have to change 2 to match the number of conditions ) levCat.plot <- ggplot(WSCI,aes(Time, baselinep))+ scale_color_brewer(palette = "Set1")+ theme_classic() l1<- levCat.plot+ stat_summary(fun.y = "mean",geom = "line",size = 1,aes(colour = script))+ labs(x = "Time (ms)",y = "baseline-corrected", colour = "")+ geom_hline(yintercept = 0,linetype = "dashed") l2<- l1 + geom_ribbon(data = WSCI, aes(ymin = baselinep-ci, ymax = baselinep+ci, fill = script, colour = script), linetype="dashed", alpha = 0.3)+ guides(fill = "none")+ stat_summary(fun.y = mean, geom = "line", size = 1, aes(colour = script))+ labs(x = "Time (ms)", y = "Baseline-corrected Pupil (in Arbitrary Units)", colour = "")+ geom_line(data = pvals, aes(x = Time, y = crit-3),na.rm = TRUE,size = 2)+ geom_hline(yintercept = 0, linetype = "dashed") print(l2)
You may want to examine the mean pupil size, as this is one of the most popular DVs in the literature.
data(cursive_new) mean_pup<-subset(cursive_new, cursive_new$timebinonset<=2500) %>% dplyr::group_by(subject, script) %>% dplyr::summarise(meanpup=mean(baselinep), maxpup=max(baselinep)) %>% ungroup() mean<-ggplot(mean_pup, aes(script, meanpup, fill = script)) + geom_point(shape = 21, color = "black", size = 2.3, alpha = 0.6, position = position_jitter(w = 0.03, h = 0)) + stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax = "mean", geom = "crossbar", color = "black", size = 0.4, width = 0.3) + labs(y="Mean Pupil Size", x="Script", title= "Mean Pupil Size (in Arbitrary Units)") print(mean)
or you may want to examine max pupil size.
max_pup<-subset(cursive_new, cursive_new$timebinonset<=2500) %>% dplyr::group_by(subject, script) %>% dplyr::summarise(maxpup=max(baselinep)) %>% ungroup() max<-ggplot(max_pup, aes(script, maxpup, fill = script)) + geom_point(shape = 21, color = "black", size = 2.3, alpha = 0.6, position = position_jitter(w = 0.03, h = 0)) + stat_summary(fun.y = "mean", fun.ymin = "mean", fun.ymax = "mean", geom = "crossbar", color = "black", size = 0.4, width = 0.3)+labs(y="Max Pupil Size", x="Script", title= "Max Pupil Size (in Arbitrary Units)") print(max)