2022-05-31: Multi-Disciplinary Reading Patterns of Digital Documents

As researchers, we acquire new developments in our fields through reading scientific literature. Even though we assume the reading and cognitive behaviors to be stochastic, recent studies reveal otherwise. Over the recent periods, we (NirdsLab, WS-DL) have been studying these patterns of researchers using eye-tracking and published a series of studies highlighting our findings. This blog will summarize our studies, highlighting our discoveries, realizations, and future directions.  

For the discussion, we consider three peer-reviewed poster publications.

  1. Analyzing the Effect of Reading Patterns using Eye Tracking Measures (DOI, JCDL 2020, Best Poster)
  2. Analyzing Unconstrained Reading Patterns of Digital Documents Using Eye Tracking (DOI, JCDL 2021, Best Poster)
  3. Multidisciplinary Reading Patterns of Digital Documents (DOI, ETRA 2022, In Press)

We will briefly review each publication, highlighting key features, novelties, and limitations. Then we identify potential research avenues for expanding the studies with our realizations. Finally, we will discuss a few applications of our research in the long run.

1. Analyzing the Effect of Reading Patterns using Eye Tracking Measures

In the first study of reading patterns conducted on reading patterns, Gavindya et al. analyzed the reading patterns on a two-page poster paper through a pilot study. The study used a static viewport containing the article on fullscreen and recorded the eye-tracking data through a PupilLabs Core eye-tracker. At the end of the reading task, they annotated the gaze position into sections of the paper and analyzed the eye-tracking data using a modified version of the Real-time Advanced Eye Movements Analysis Pipeline (RAEMAP): an eye-tracking pipeline that facilitate the computation of eye-movement and pupillary metrics. 

They analyzed the eye-tracking data using fixation count, fixation duration, Index of Pupillary Activity (IPA), and the scan path sequence. Their study identified the highest number of fixation and fixation times in the methodology section, followed by the motivation, abstract, and conclusion sections. In terms of the cognitive load expressed through IPA, their study revealed the participants to undergo the highest cognitive load while reading the paper's title, followed by motivation, conclusion, abstract, and the methodology sections, respectively.

However, the study had multiple limitations. Firstly the experimental setup did not capture the natural behavior since the participants were not allowed to zoom in/out during the task. Further, the study was conducted with three computer science doctoral students restricting the generalizability of their findings. Finally, the manual annotation process they use in the study does not scale well for large-scale studies.

2. Analyzing Unconstrained Reading Patterns of Digital Documents Using Eye Tracking

To address some of the identified issues of the first study, we introduced modifications to the experimental setup and the analysis procedure. Firstly, we eliminated the constraint on user viewport by allowing the reader to zoom and pan, allowing us to capture a more naturalistic behavior. Further, we automated the annotation process of the eye-tracking data using a predefined segmentation of the paper.

Instead of using a head-mounted eye-tracker, we used a desktop-mounted GazePoint GP3 eye-tracker which allowed us to capture the participant's viewport through screen capture. However, the decision resulted in restrictions on the head pose of the user. In the experiment, we collected data following a similar approach to the previous study.

Mapping between participant viewport (right) and article content (left). Green: matching features, Blue: gaze positions.

The first task of the analysis was to annotate the eye-tracking data based on the gaze location of the user identifying the section on the paper. For this purpose, we matched SIFT features between the user viewport and the article. Then based on the potential matches, we considered the top two counterparts. Using linear interpolation, we mapped the gaze position to the appropriate location on paper and then identified the belonging section. We analyzed the eye-tracking data using fixation count and dwell time using RAEMPAP. Our results indicated consistent results to the previous studies, with participants spending the most time in the methodology section followed by the results section. 

Despite the improvements in the new study, results lack generalizability due to the survey only using three participants on a single publication.

3. Multidisciplinary Reading Patterns of Digital Documents

In the latest study, we tried to eliminate the bias in our study group by recruiting volunteers from non-computer science disciplines (Mathematics and Physics) and studying reading patterns in two articles: computer science and physics. Further, we presented our seminal work toward creating a large-scale dataset for analyzing reading patterns.  

The study used a head-mounted PupilLabs Core eye-tracker operating at 120 Hz to extract eye-tracking data. We annotated the dataset using a forum of three manual annotators and categorized the content of the paper into five sections/ areas; (1) Title, (2) Abstract, (3) Introduction and related work, (4) Methodology, and (5) Figures. When defining areas of the paper, we confirmed the validity of categorization by contacting the author of the publication. Finally, we conducted a preliminary analysis of the eye-tracking data based on average fixation count, pupil dilation, and the Low/High Index of Pupillary Activity (LHIPA).

Sample recording from experiment, circle illustrate gaze position on screen.

The study results indicated a consistent behavior of participants irrespective of their discipline, spending more time in the methodology section. Even though we witnessed higher cognitive load during the introduction and methodology sections, we did not observe domain familiarity impacting cognitive load expressed through pupillary data.

Future Research

One of the critical improvements to our study is to enhance the diversity in our experiment by incorporating a diverse set of participants from a wide range of disciplines and levels of expertise. Even though we attempted to address the issue in our latest study, our volunteers were doctoral students in science and mathematics. Similarly, we can improve the experiment by experimenting with different types of literature, such as posters, short papers, and long papers. However, there are practical limitations in integrating these improvements. For instance, analyzing the reading patterns in long articles might not be viable since researchers tend to spend more time and multiple passes, causing experiments to be tedious and time-consuming. 

Another critical improvement for the studies is automating the analysis process from identifying areas of interest to generating eye-tracking metrics. Even though we successfully automated the whole process with desktop eye trackers, implementation is challenging for head-mounted eye trackers. We found the outward-facing camera on head-mounted eye-trackers to produce blurry images on sudden head movements affecting the feature mapping between the user viewport and the publication. Further, we observed the algorithm we proposed in our second study to produce false-positive matches when used with head-mounted eye trackers.

Moreover, we utilized a selected set of eye-tracking and pupillary measures for data analysis overlooking alternative metrics. 


The main application of our study is to understand the information-seeking characteristics of researchers and the differences based on disciplines and level of expertise. We can use the understanding to improve the information systems and identify behaviors that distinguish researchers. In a slightly modified experimental setting, we can explore the prospect of using reading patterns as a biomarker for identifying neurophysical disorders. 


Paper (arXiv): https://arxiv.org/abs/2205.08475

Dataset: https://github.com/nirdslab/Multidisciplinary-Reading-Patterns

Read more on Eye-tracking and Pupillometric measures: Eye Movement and Pupil Measures: A Review 

Review on first poster:  2020-04-25: Effect of Reading Patterns of Novice Researchers using Eye Tracking

--Bhanuka (@mahanama94)