Analzying GazePoint Pupil Data With GazeR

Using gazeR to analyze GazePoint Pupil Data

By Jason Geller in R

April 21, 2021

In this vignette I am going to show you how to read in a GazePoint data file along with some behavioral data and use gazeR to preprocess the data.

Special thanks to Matthew K Robinson (Twitter:@matthewkrobinson) for letting me use some data from an auditory oddball task he conducted on himself (we do what we have to do as researchers :D): see Tweet below.

To get started, we need to load in some important packages and read in the GP data files.

Load Packages

library(tidyverse)
library(remotes)
remotes::install_github("dmirman/gazer")
## Rcpp       (1.0.7        -> 1.0.8       ) [CRAN]
## pillar     (1.6.4        -> 1.7.0       ) [CRAN]
## glue       (1.6.0        -> 1.6.2       ) [CRAN]
## withr      (2.4.3        -> 2.5.0       ) [CRAN]
## rlang      (0.4.12       -> 1.0.2       ) [CRAN]
## magrittr   (2.0.1        -> 2.0.2       ) [CRAN]
## jsonlite   (1.7.2        -> 1.8.0       ) [CRAN]
## evaluate   (0.14         -> 0.15        ) [CRAN]
## desc       (1.4.0        -> 1.4.1       ) [CRAN]
## crayon     (1.4.2        -> 1.5.0       ) [CRAN]
## cli        (3.1.1        -> 3.2.0       ) [CRAN]
## testthat   (3.1.1        -> 3.1.2       ) [CRAN]
## tidyselect (1.1.1        -> 1.1.2       ) [CRAN]
## dplyr      (1.0.7        -> 1.0.8       ) [CRAN]
## openssl    (1.4.6        -> 2.0.0       ) [CRAN]
## sass       (f14841535... -> f7a954027...) [GitHub]
## tinytex    (0.36         -> 0.37        ) [CRAN]
## xfun       (0.29         -> 0.30        ) [CRAN]
## yaml       (2.2.1        -> 2.3.5       ) [CRAN]
## rmarkdown  (2.11         -> 2.13        ) [CRAN]
## clipr      (0.7.1        -> 0.8.0       ) [CRAN]
## colorspace (2.0-2        -> 2.0-3       ) [CRAN]
## tidyr      (1.1.4        -> 1.2.0       ) [CRAN]
## broom      (0.7.11       -> 0.7.12      ) [CRAN]
## readr      (2.1.1        -> 2.1.2       ) [CRAN]
## nloptr     (1.2.2.3      -> 2.0.0       ) [CRAN]
## dtplyr     (1.2.0        -> 1.2.1       ) [CRAN]
## lme4       (1.1-27.1     -> 1.1-28      ) [CRAN]
## 
##   There is a binary version available but the source version is later:
##           binary source needs_compilation
## rmarkdown   2.12   2.13             FALSE
## 
## 
## The downloaded binary packages are in
##  /var/folders/w7/b8pm_hmx32n02968yffrgxb80000gn/T//Rtmp0KDX3k/downloaded_packages
## 
## * checking for file ‘/private/var/folders/w7/b8pm_hmx32n02968yffrgxb80000gn/T/Rtmp0KDX3k/remotes781c18876f38/rstudio-sass-f7a9540/DESCRIPTION’ ... OK
## * preparing ‘sass’:
## * checking DESCRIPTION meta-information ... OK
## * cleaning src
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
## Removed empty directory ‘sass/scripts’
## * building ‘sass_0.4.0.9000.tar.gz’
## 
## * checking for file ‘/private/var/folders/w7/b8pm_hmx32n02968yffrgxb80000gn/T/Rtmp0KDX3k/remotes781c963a5f6/dmirman-gazer-c8b35ed/DESCRIPTION’ ... OK
## * preparing ‘gazer’:
## * checking DESCRIPTION meta-information ... OK
## * checking for LF line-endings in source and make files and shell scripts
## * checking for empty or unneeded directories
##   NB: this package now depends on R (>= 3.5.0)
##   WARNING: Added dependency on R >= 3.5.0 because serialized objects in
##   serialize/load version 3 cannot be read in older versions of R.
##   File(s) containing such objects:
##     ‘gazer/data/cursive_agg_data.rda’
## * building ‘gazer_0.1.tar.gz’
library(gazer)
library(data.table)
library(here)
remotes::install_github("tmalsburg/saccades/saccades", dependencies=TRUE)
library(saccades)

Read Data

pd <- fread('oddball_eye_13.tsv') # eye data 
bs<-fread('oddball_13.tsv') # behave data

head(pd)
##      CNT     TIME    TIME_TICK    FPOGX     FPOGY    FPOGS   FPOGD FPOGID FPOGV
## 1: 77433 1251.964 102150661899 -4.40518 -13.98604 1251.883 0.08093   1917     1
## 2: 77434 1251.980 102150821625 -4.39508 -13.95200 1251.883 0.09692   1917     1
## 3: 77435 1251.996 102150983307 -4.36427 -13.84678 1251.883 0.09692   1917     0
## 4: 77436 1252.013 102151149297 -4.40694 -13.99395 1251.883 0.09692   1917     0
## 5: 77437 1252.028 102151307171 -4.42096 -14.04383 1251.883 0.09692   1917     0
## 6: 77438 1252.045 102151469190 -4.40730 -14.00445 1251.883 0.09692   1917     0
##      LPOGX   LPOGY LPOGV    RPOGX     RPOGY RPOGV    BPOGX     BPOGY BPOGV
## 1: 0.71795 0.50537     1 -9.38697 -28.00086     1 -4.33451 -13.74775     1
## 2: 0.71795 0.50537     1 -9.38697 -28.00086     1 -4.33451 -13.74775     1
## 3: 0.70711 0.51136     1 -9.14516 -27.18135     1 -4.21902 -13.33500     1
## 4: 0.70151 0.43055     1 -9.60071 -28.71280     1 -4.44960 -14.14113     1
## 5: 0.70270 0.42562     1 -9.60071 -28.71280     1 -4.44900 -14.14359     1
## 6: 0.69827 0.43625     1 -9.34484 -27.89350     1 -4.32329 -13.72862     1
##       LPCX    LPCY      LPD     LPS LPV    RPCX    RPCY      RPD RPS RPV
## 1: 0.34860 0.59214 25.23420 0.83829   1 0.64310 0.60511 28.49240   1   1
## 2: 0.34860 0.59216 25.27127 0.83829   1 0.64314 0.60509 28.33148   1   1
## 3: 0.34845 0.59234 25.09805 0.83829   1 0.64293 0.60517 28.69580   1   1
## 4: 0.34846 0.59237 25.15947 0.83829   1 0.64286 0.60519 28.47081   1   1
## 5: 0.34838 0.59240 25.32249 0.84484   1 0.64285 0.60518 28.46784   1   1
## 6: 0.34830 0.59240 25.07680 0.85139   1 0.64269 0.60527 28.50588   1   1
##       LEYEX    LEYEY   LEYEZ LPUPILD LPUPILV   REYEX    REYEY   REYEZ RPUPILD
## 1: -0.03362 -0.01726 0.56538 0.00455       1 0.03052 -0.01989 0.57411 0.00512
## 2: -0.03438 -0.01769 0.57821 0.00451       1 0.03056 -0.01997 0.57644 0.00513
## 3: -0.03438 -0.01769 0.57821 0.00449       1 0.03056 -0.01997 0.57644 0.00510
## 4: -0.03438 -0.01769 0.57821 0.00451       1 0.03056 -0.01997 0.57644 0.00509
## 5: -0.03501 -0.01795 0.58649 0.00448       1 0.03130 -0.02049 0.59132 0.00509
## 6: -0.03501 -0.01795 0.58649 0.00450       1 0.03130 -0.02049 0.59132 0.00508
##    RPUPILV      CX      CY CS BKID BKDUR BKPMIN    LPMM LPMMV    RPMM RPMMV
## 1:       1 0.33333 0.33333  0    0     0     20 4.54567     1 5.11614     1
## 2:       1 0.33333 0.33333  0    0     0     20 4.51065     1 5.12683     1
## 3:       1 0.33333 0.33333  0    0     0     20 4.48607     1 5.10490     1
## 4:       1 0.33333 0.33333  0    0     0     20 4.50748     1 5.09460     1
## 5:       1 0.33333 0.33333  0    0     0     20 4.47957     1 5.08827     1
## 6:       1 0.33333 0.33333  0    0     0     20 4.50258     1 5.07552     1
##     DIAL DIALV GSR GSRV HR HRV HRP TTL0 TTL1 TTLV            USER
## 1: 0.088     1   0    0  0   0 454   -1   -1    0               0
## 2: 0.088     1   0    0  0   0 484   -1   -1    0 STARTEXPERIMENT
## 3: 0.088     1   0    0  0   0 456   -1   -1    0 STARTEXPERIMENT
## 4: 0.088     1   0    0  0   0 451   -1   -1    0           START
## 5: 0.088     1   0    0  0   0 482   -1   -1    0           START
## 6: 0.088     1   0    0  0   0 454   -1   -1    0           START
head(bs)
##    subject trial tone   rt response
## 1:      13     1   lo 2113     None
## 2:      13     2   lo 2102     None
## 3:      13     3   lo 2107     None
## 4:      13     4   lo 2108     None
## 5:      13     5   lo 2107     None
## 6:      13     6   lo 2103     None

What we are going to do is run the GazePoint file through the merge_gazepoint function. The function below takes a list of files called file_list and merges all the files together, appends a subject column, creates a trial column using the USER column (GazePoint only allows messages through this channel), creates a time variable (in milliseconds). In the merge_gazepoint function the trail_msg argument requires users to denote a message used in the USER column that references the start of the trial–in our case the START message denotes the start of a new trial. This is a solution by Matt Robinson, but there are other ways one could extract the trial number. What I have done in the past is append a message with the trial iteration (e.g., START_1) in Python and use the separate function to get the trial number.

# A "monocular mean" averages both eyes together. If data is available in just
# one eye, use the available value as the mean, unless we need_both is TRUE.
#' @param x1 pupil left
#' @param x2 pupil right
#' @return vector with monocular mean values
compute_monocular_mean <- function(x1, x2) {
  xm <- rowMeans(cbind(x1, x2), na.rm = TRUE)
  # NaN => NA
  ifelse(is.nan(xm), NA, xm)
}


# function for processing GazePoint data
merge_gazepoint <- function (file_list, trial_msg = "START"){
  #file list is path to .xls files
  #vroom is faster
  library(data.table)
  
  file_ids=str_replace_all(basename(file_list),"([:alpha:]|[:punct:])","") # remove everything but numeric values
                   
  data <- map2(file_list, file_ids, ~fread(.x) %>% 
    mutate(id = .y))  %>% 
    bind_rows()

  
  d = data %>%
    dplyr::rowwise() %>%
    dplyr::mutate(pupil=compute_monocular_mean(RPMM, LPMM)) %>% # average both eyes
             dplyr::ungroup() %>%
           dplyr::mutate(pupil = ifelse(RPMMV == 0|LPMMV == 0, 0, pupil),  #missing data labeled as blinks
         new_trial = ifelse(USER == trial_msg & lag(USER) != trial_msg, 1, 0), # Label new trials
         trial = cumsum(new_trial), # Create a trial variable
         time = floor(TIME*1000)) %>%
    group_by(trial) %>%
    dplyr::mutate(time=time - min(time)) %>%
    ungroup() %>%
  dplyr::select(id, time,trial,pupil,BPOGX, BPOGY, USER) %>%
    dplyr::rename("message" = "USER", "subject"= "id", "x" = "BPOGX", "y" = "BPOGY") %>% 
    dplyr::filter(trial > 0)
  
  return(d)
}

Merge Files

setwd(here()) # setwd

gp_file<-list.files(here::here(), pattern = "eye_13.tsv") # get files 

setwd(here())
      
d=merge_gazepoint(gp_file, trial_msg = "START")

d$subject<-as.numeric(d$subject)

pdb <- full_join(bs, d)

pdb <- as_tibble(pdb)

pdb
## # A tibble: 73,724 × 10
##    subject trial tone     rt response  time pupil     x     y message
##      <dbl> <dbl> <chr> <int> <chr>    <dbl> <dbl> <dbl> <dbl> <chr>  
##  1      13     1 lo     2113 None         0  4.80 -4.45 -14.1 START  
##  2      13     1 lo     2113 None        16  4.78 -4.45 -14.1 START  
##  3      13     1 lo     2113 None        32  4.79 -4.32 -13.7 START  
##  4      13     1 lo     2113 None        48  4.80 -4.58 -14.5 START  
##  5      13     1 lo     2113 None        64  4.80 -4.58 -14.5 START  
##  6      13     1 lo     2113 None        81  4.79 -4.63 -14.7 START  
##  7      13     1 lo     2113 None        97  4.81 -4.42 -14.0 START  
##  8      13     1 lo     2113 None       113  4.80 -4.42 -14.0 TONE   
##  9      13     1 lo     2113 None       129  4.78 -4.23 -13.5 TONE   
## 10      13     1 lo     2113 None       145  4.77 -4.34 -13.8 TONE   
## # … with 73,714 more rows

Plot Interpolated Trial

interp_graph <- smooth_interp  %>%
  dplyr::filter(trial=="400")

bold <- element_text(face = "bold", color = "black", size = 14) #axis bold
#Graph interpolation
pup_g<- ggplot(interp_graph, aes(x= time, y= pupil)) + geom_point()+ geom_line(colour="black") +
  geom_line(aes(x=time, y=pup_interp), colour="darkgreen") + xlab("Time (ms)") + ylab("Pupil Size (mm)") + theme_bw() + theme(axis.title.y=element_text(size = 16, face="bold"), axis.title.x = element_text(size=16, face="bold"), axis.text.x=element_text(size = 12, face="bold"), axis.text.y=element_text(size=12, face="bold"))
print(pup_g)

Baseline Correction

Here we will do a subtractive baseline correction taking 250 ms before the onset of the tone as baseline.

#use messages to baseline correct
baseline_pupil<-baseline_correction_pupil(smooth_interp, pupil_colname="pup_interp", baseline_window=c(0,250), baseline_method = "sub")

head(baseline_pupil)
## # A tibble: 6 × 13
##   subject trial baseline  time pupil     x     y message tone  blink extendpupil
##     <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   <chr> <dbl>       <dbl>
## 1      13     1     4.80     0  4.80 -4.45 -14.1 START   lo        0        4.80
## 2      13     1     4.80    16  4.78 -4.45 -14.1 START   lo        0        4.78
## 3      13     1     4.80    32  4.79 -4.32 -13.7 START   lo        0        4.79
## 4      13     1     4.80    48  4.80 -4.58 -14.5 START   lo        0        4.80
## 5      13     1     4.80    64  4.80 -4.58 -14.5 START   lo        0        4.80
## 6      13     1     4.80    81  4.79 -4.63 -14.7 START   lo        0        4.79
## # … with 2 more variables: pup_interp <dbl>, baselinecorrectedp <dbl>

Missing Data

Let’s see how much missing data there is and remove trials with greater than 20% missing data.

pup_missing<-count_missing_pupil(baseline_pupil, missingthresh = .5)
# remove outliers

I remove about 15 percent of trials.

Unlikely Pupil Sizes

Now let’s keep pupil diameter sizes between 2 mm and 9 mm

pup_outliers<-pup_missing %>%
  dplyr::filter (pup_interp  >= 2, pup_interp <= 9)

MAD

Get rid of artifacts we might have missed during some earlier steps.

  #MAD removal
max_removal<-pup_missing  %>%
  dplyr::group_by(subject, trial) %>%
  dplyr::mutate(speed=speed_pupil(pup_interp,time)) %>%
  dplyr::mutate(MAD=calc_mad(speed)) %>%
  dplyr::filter(speed < MAD)

Onset

Let’s only look fron the start of the trial until 1500 ms

baseline_pupil_onset<-max_removal %>%
  dplyr::group_by(subject, trial) %>%
  dplyr::filter(time <= 1500) %>%
  select(subject, trial, time,baselinecorrectedp, tone, time,message,baselinecorrectedp)

Downsample

Downsample the time-course to 50 ms.

#downsample
timebins1<- downsample_gaze(baseline_pupil_onset, bin.length=50, timevar = "time", aggvars = c("subject", "tone", "timebins"), type="pupil")
## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
timebins1
## # A tibble: 62 × 4
##    subject tone  timebins aggbaseline
##      <dbl> <chr>    <dbl>       <dbl>
##  1      13 hi           0    0.00130 
##  2      13 hi          50   -0.000752
##  3      13 hi         100   -0.00206 
##  4      13 hi         150    0.000331
##  5      13 hi         200    0.00530 
##  6      13 hi         250    0.00503 
##  7      13 hi         300    0.00826 
##  8      13 hi         350    0.00930 
##  9      13 hi         400    0.0116  
## 10      13 hi         450    0.0125  
## # … with 52 more rows

Visualize Time-course

cursive_plot <-ggplot(timebins1)+
  aes(timebins, aggbaseline, linetype=tone, color=tone) +
  stat_summary(fun = "mean", geom = "line", size = 1) +
  theme_bw() +
  labs(x ="Time (ms)",y ="Pupil Dilation (baseline - pupil))") +
  geom_hline(yintercept=0.0)

print(cursive_plot)

This looks very similar to the one in the Tweet albeit a bit smoother as a result of the extra preprocessing done.