Analyzing GazePoint Data with GazeR

In this vignette I am going to show you how to read in a GazePoint data file along with some behavioral data and use gazeR to preprocess the data.

Special thanks to Matthew K Robinson (Twitter:@matthewkrobinson) for letting me use some data from an auditory oddball task he conducted on himself (we do what we have to do as researchers :D): see Tweet below.

To get started, we need to load in some important packages and read in the GP data files.

Load Packages

library(tidyverse)
#install_github("dmirman/gazer")
library(gazer)
library(data.table)
library(here)
remotes::install_github("tmalsburg/saccades/saccades", dependencies=TRUE)
library(saccades)

Read Data

pd <- fread(here::here('oddball_eye_13.tsv')) # eye data 
bs<-fread(here::here('oddball_13.tsv')) # behave data

head(pd)
##      CNT     TIME    TIME_TICK    FPOGX     FPOGY    FPOGS   FPOGD FPOGID FPOGV
## 1: 77433 1251.964 102150661899 -4.40518 -13.98604 1251.883 0.08093   1917     1
## 2: 77434 1251.980 102150821625 -4.39508 -13.95200 1251.883 0.09692   1917     1
## 3: 77435 1251.996 102150983307 -4.36427 -13.84678 1251.883 0.09692   1917     0
## 4: 77436 1252.013 102151149297 -4.40694 -13.99395 1251.883 0.09692   1917     0
## 5: 77437 1252.028 102151307171 -4.42096 -14.04383 1251.883 0.09692   1917     0
## 6: 77438 1252.045 102151469190 -4.40730 -14.00445 1251.883 0.09692   1917     0
##      LPOGX   LPOGY LPOGV    RPOGX     RPOGY RPOGV    BPOGX     BPOGY BPOGV
## 1: 0.71795 0.50537     1 -9.38697 -28.00086     1 -4.33451 -13.74775     1
## 2: 0.71795 0.50537     1 -9.38697 -28.00086     1 -4.33451 -13.74775     1
## 3: 0.70711 0.51136     1 -9.14516 -27.18135     1 -4.21902 -13.33500     1
## 4: 0.70151 0.43055     1 -9.60071 -28.71280     1 -4.44960 -14.14113     1
## 5: 0.70270 0.42562     1 -9.60071 -28.71280     1 -4.44900 -14.14359     1
## 6: 0.69827 0.43625     1 -9.34484 -27.89350     1 -4.32329 -13.72862     1
##       LPCX    LPCY      LPD     LPS LPV    RPCX    RPCY      RPD RPS RPV
## 1: 0.34860 0.59214 25.23420 0.83829   1 0.64310 0.60511 28.49240   1   1
## 2: 0.34860 0.59216 25.27127 0.83829   1 0.64314 0.60509 28.33148   1   1
## 3: 0.34845 0.59234 25.09805 0.83829   1 0.64293 0.60517 28.69580   1   1
## 4: 0.34846 0.59237 25.15947 0.83829   1 0.64286 0.60519 28.47081   1   1
## 5: 0.34838 0.59240 25.32249 0.84484   1 0.64285 0.60518 28.46784   1   1
## 6: 0.34830 0.59240 25.07680 0.85139   1 0.64269 0.60527 28.50588   1   1
##       LEYEX    LEYEY   LEYEZ LPUPILD LPUPILV   REYEX    REYEY   REYEZ RPUPILD
## 1: -0.03362 -0.01726 0.56538 0.00455       1 0.03052 -0.01989 0.57411 0.00512
## 2: -0.03438 -0.01769 0.57821 0.00451       1 0.03056 -0.01997 0.57644 0.00513
## 3: -0.03438 -0.01769 0.57821 0.00449       1 0.03056 -0.01997 0.57644 0.00510
## 4: -0.03438 -0.01769 0.57821 0.00451       1 0.03056 -0.01997 0.57644 0.00509
## 5: -0.03501 -0.01795 0.58649 0.00448       1 0.03130 -0.02049 0.59132 0.00509
## 6: -0.03501 -0.01795 0.58649 0.00450       1 0.03130 -0.02049 0.59132 0.00508
##    RPUPILV      CX      CY CS BKID BKDUR BKPMIN    LPMM LPMMV    RPMM RPMMV
## 1:       1 0.33333 0.33333  0    0     0     20 4.54567     1 5.11614     1
## 2:       1 0.33333 0.33333  0    0     0     20 4.51065     1 5.12683     1
## 3:       1 0.33333 0.33333  0    0     0     20 4.48607     1 5.10490     1
## 4:       1 0.33333 0.33333  0    0     0     20 4.50748     1 5.09460     1
## 5:       1 0.33333 0.33333  0    0     0     20 4.47957     1 5.08827     1
## 6:       1 0.33333 0.33333  0    0     0     20 4.50258     1 5.07552     1
##     DIAL DIALV GSR GSRV HR HRV HRP TTL0 TTL1 TTLV            USER
## 1: 0.088     1   0    0  0   0 454   -1   -1    0               0
## 2: 0.088     1   0    0  0   0 484   -1   -1    0 STARTEXPERIMENT
## 3: 0.088     1   0    0  0   0 456   -1   -1    0 STARTEXPERIMENT
## 4: 0.088     1   0    0  0   0 451   -1   -1    0           START
## 5: 0.088     1   0    0  0   0 482   -1   -1    0           START
## 6: 0.088     1   0    0  0   0 454   -1   -1    0           START
head(bs)
##    subject trial tone   rt response
## 1:      13     1   lo 2113     None
## 2:      13     2   lo 2102     None
## 3:      13     3   lo 2107     None
## 4:      13     4   lo 2108     None
## 5:      13     5   lo 2107     None
## 6:      13     6   lo 2103     None

What we are going to do is run the GazePoint file through the merge_gazepoint function. The function below takes a list of files called file_list and merges all the files together, appends a subject column, creates a trial column using the USER column (GazePoint only allows messages through this channel), creates a time variable (in milliseconds). In the merge_gazepoint function the trail_msg argument requires users to denote a message used in the USER column that references the start of the trial–in our case the START message denotes the start of a new trial. This is a solution by Matt Robinson, but there are other ways one could extract the trial number. What I have done in the past is append a message with the trial iteration (e.g., START_1) in Python and use the separate function to get the trial number.

# A "monocular mean" averages both eyes together. If data is available in just
# one eye, use the available value as the mean, unless we need_both is TRUE.
#' @param x1 pupil left
#' @param x2 pupil right
#' @return vector with monocular mean values
compute_monocular_mean <- function(x1, x2) {
  xm <- rowMeans(cbind(x1, x2), na.rm = TRUE)
  # NaN => NA
  ifelse(is.nan(xm), NA, xm)
}


# function for processing GazePoint data
merge_gazepoint <- function (file_list, trial_msg = "START"){
  #file list is path to .xls files
  #vroom is faster
  library(data.table)
  
  file_ids=str_replace_all(basename(file_list),"([:alpha:]|[:punct:])","") # remove everything but numeric values
                   
  data <- map2(file_list, file_ids, ~fread(.x) %>% 
    mutate(id = .y))  %>% 
    bind_rows()

  
  d = data %>%
    dplyr::rowwise() %>%
    dplyr::mutate(pupil=compute_monocular_mean(RPMM, LPMM)) %>% # average both eyes
             dplyr::ungroup() %>%
           dplyr::mutate(pupil = ifelse(RPMMV == 0|LPMMV == 0, 0, pupil),  #missing data labeled as blinks
         new_trial = ifelse(USER == trial_msg & lag(USER) != trial_msg, 1, 0), # Label new trials
         trial = cumsum(new_trial), # Create a trial variable
         time = floor(TIME*1000)) %>%
    group_by(trial) %>%
    dplyr::mutate(time=time - min(time)) %>%
    ungroup() %>%
  dplyr::select(id, time,trial,pupil,BPOGX, BPOGY, USER) %>%
    dplyr::rename("message" = "USER", "subject"= "id", "x" = "BPOGX", "y" = "BPOGY") %>% 
    dplyr::filter(trial > 0)
  
  return(d)
}

Merge Files

setwd(here()) # setwd

gp_file<-list.files(here::here(), pattern = "eye_13.tsv") # get files 

setwd(here())
      
d=merge_gazepoint(gp_file, trial_msg = "START")

d$subject<-as.numeric(d$subject)

pdb <- full_join(bs, d)

pdb <- as_tibble(pdb)

pdb
## # A tibble: 73,724 x 10
##    subject trial tone     rt response  time pupil     x     y message
##      <dbl> <dbl> <chr> <int> <chr>    <dbl> <dbl> <dbl> <dbl> <chr>  
##  1      13     1 lo     2113 None         0  4.80 -4.45 -14.1 START  
##  2      13     1 lo     2113 None        16  4.78 -4.45 -14.1 START  
##  3      13     1 lo     2113 None        32  4.79 -4.32 -13.7 START  
##  4      13     1 lo     2113 None        48  4.80 -4.58 -14.5 START  
##  5      13     1 lo     2113 None        64  4.80 -4.58 -14.5 START  
##  6      13     1 lo     2113 None        81  4.79 -4.63 -14.7 START  
##  7      13     1 lo     2113 None        97  4.81 -4.42 -14.0 START  
##  8      13     1 lo     2113 None       113  4.80 -4.42 -14.0 TONE   
##  9      13     1 lo     2113 None       129  4.78 -4.23 -13.5 TONE   
## 10      13     1 lo     2113 None       145  4.77 -4.34 -13.8 TONE   
## # … with 73,714 more rows

Plot Interpolated Trial

interp_graph <- smooth_interp  %>%
  dplyr::filter(trial=="400")

bold <- element_text(face = "bold", color = "black", size = 14) #axis bold
#Graph interpolation
pup_g<- ggplot(interp_graph, aes(x= time, y= pupil)) + geom_point()+ geom_line(colour="black") +
  geom_line(aes(x=time, y=pup_interp), colour="darkgreen") + xlab("Time (ms)") + ylab("Pupil Size (mm)") + theme_bw() + theme(axis.title.y=element_text(size = 16, face="bold"), axis.title.x = element_text(size=16, face="bold"), axis.text.x=element_text(size = 12, face="bold"), axis.text.y=element_text(size=12, face="bold"))
print(pup_g)

Baseline Correction

Here we will do a subtractive baseline correction taking 250 ms before the onset of the tone as baseline.

#use messages to baseline correct
baseline_pupil<-baseline_correction_pupil(smooth_interp, pupil_colname="pup_interp", baseline_window=c(0,250), baseline_method = "sub")

head(baseline_pupil)
## # A tibble: 6 x 13
##   subject trial baseline  time pupil     x     y message tone  blink extendpupil
##     <dbl> <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <chr>   <chr> <dbl>       <dbl>
## 1      13     1     4.80     0  4.80 -4.45 -14.1 START   lo        0        4.80
## 2      13     1     4.80    16  4.78 -4.45 -14.1 START   lo        0        4.78
## 3      13     1     4.80    32  4.79 -4.32 -13.7 START   lo        0        4.79
## 4      13     1     4.80    48  4.80 -4.58 -14.5 START   lo        0        4.80
## 5      13     1     4.80    64  4.80 -4.58 -14.5 START   lo        0        4.80
## 6      13     1     4.80    81  4.79 -4.63 -14.7 START   lo        0        4.79
## # … with 2 more variables: pup_interp <dbl>, baselinecorrectedp <dbl>

Missing Data

Let’s see how much missing data there is and remove trials with greater than 20% missing data.

pup_missing<-count_missing_pupil(baseline_pupil, missingthresh = .5)
# remove outliers

I remove about 15 percent of trials.

Unlikely Pupil Sizes

Now let’s keep pupil diameter sizes between 2 mm and 9 mm

pup_outliers<-pup_missing %>%
  dplyr::filter (pup_interp  >= 2, pup_interp <= 9)

MAD

Get rid of artifacts we might have missed during some earlier steps.

  #MAD removal
max_removal<-pup_missing  %>%
  dplyr::group_by(subject, trial) %>%
  dplyr::mutate(speed=speed_pupil(pup_interp,time)) %>%
  dplyr::mutate(MAD=calc_mad(speed)) %>%
  dplyr::filter(speed < MAD)

Onset

Let’s only look fron the start of the trial until 1500 ms

baseline_pupil_onset<-max_removal %>%
  dplyr::group_by(subject, trial) %>%
  dplyr::filter(time <= 1500) %>%
  select(subject, trial, time,baselinecorrectedp, tone, time,message,baselinecorrectedp)

Downsample

Downsample the time-course to 50 ms.

#downsample
timebins1<- downsample_gaze(baseline_pupil_onset, bin.length=50, timevar = "time", aggvars = c("subject", "tone", "timebins"), type="pupil")
## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
timebins1
## # A tibble: 62 x 4
##    subject tone  timebins aggbaseline
##      <dbl> <chr>    <dbl>       <dbl>
##  1      13 hi           0    0.00130 
##  2      13 hi          50   -0.000752
##  3      13 hi         100   -0.00206 
##  4      13 hi         150    0.000331
##  5      13 hi         200    0.00530 
##  6      13 hi         250    0.00503 
##  7      13 hi         300    0.00826 
##  8      13 hi         350    0.00930 
##  9      13 hi         400    0.0116  
## 10      13 hi         450    0.0125  
## # … with 52 more rows

Visualize Time-course

cursive_plot <-ggplot(timebins1)+
  aes(timebins, aggbaseline, linetype=tone, color=tone) +
  stat_summary(fun = "mean", geom = "line", size = 1) +
  theme_bw() +
  labs(x ="Time (ms)",y ="Pupil Dilation (baseline - pupil))") +
  geom_hline(yintercept=0.0)

print(cursive_plot)

This looks very similar to the one in the Tweet albeit a bit smoother as a result of the extra preprocessing done.

Avatar
Jason Geller
Research Scientist