VIP: A unifying framework for eye-gaze research



Eye-gaze is affected by the 3 factors: Visual stimulus, Intent and Person.

The eye-gaze data, E, is defined as:

E = g({(ti xi yi pi qi di si bi ci)L/R})

where g is a function of a sequence of eye-gaze data which

In ideal world,

E = f(V, I, P)


Due to sensor's noise, computational limits and incomplete model etc., f and g are not equivalent. i.e.

E ≈ f(V, I, P)


We have collected the first fixation dataset which captures all 3 VIP factors.

The images were selected from the NUSEF dataset, which contains both neutral and affective images. Out of 758 NUSEF images, 150 were randomly selected. 75 subjects were recruited from a mixture of undergraduate, postgraduate and working adult population. The male and female subjects are recruited separately to ensure an even distribution. They were tasked to view the 150 images in a free-viewing (i.e. without assigned task) or anomaly detection setting. Each image was displayed for 5 seconds, followed by 2 seconds viewing of a gray screen. The images were displayed in random order. Their eye-gaze data was recorded with a binocular infra-red based remote eye-tracking device SMI RED 250. The recording was done at 120Hz. The subjects were seated at 50 centimeters distance from a 22 inch LCD monitor with 1680x1050 resolution. This setup is similar to other ones used in eye-gaze research.

Before start of the viewing experiment, the subjects also provided their demographic data: gender, age-group, ethnicity, religion, field of study/work, highest education qualification, income group, expenditure group and nationality. 3 personality type questions are posed based on the Jung's Psychological types. The recorded eye-gaze data were preprocessed with the SMI SDK to extract the fixations from the preferred eye as chosen by the subjects.

Download and copyright

The VIP dataset can be downloaded as a single zip file. VIP dataset is available for research purposes only. By downloading or using the dataset, you are deemed to agree to terms and conditions.

If you are using this dataset, please cite:

A Unifying Framework for Computational Eye-Gaze Research.
Keng-Teck Ma. Terence Sim and Mohan Kankanhalli.
4th International Workshop on Human Behavior Understanding. Barcelona, Spain. 2013.[pdf]

Trait inference

Eye-gaze has been used to infer attention, emotion, intentions, tasks and identity. We are the first to use eye-gaze to infer demographic profile and personality.

Many of the personal traits, such as gender, age, culture, and personality types are routinely collected by many organizations. These traits are collectively known as demographic/personality profile. The profiling is used for marketing, personnel screening etc. The advantages of eye-gaze over other modalities are low latency, no purposeful thoughts required and it is non-obtrusive. Personal trait inference is analogous to taking a survey. The eye-gaze information in response to an image is similar to taking a survey at a sub-conscious level. Instead of questions, visual stimuli are presented. Similar to the question in a survey, only eye-gaze data of purpose-selected stimulus can accurately determine the value of the intended trait. As far as we know, we are the first to use eye-gaze to infer personal traits. Using correlation analysis and linear SVM, we achieve good accuracy as shown in the table below.

Accuracy of the classifiers

Attributes Best accuracy
Male/Female 0.75
Religious/None  0.78
Extrovert/Introvert  0.66
 Sensing/Intuition  0.76
 Thinking/Feeling  0.63



Email: ktma [at]