Coordination of gaze and hand movements for tracking and tracing in 3D Andrés Lamont August 2009
Contents 1 Abstract
1
2 Introduction
2
3 Methods
3
3.1
Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3.2
Experimental set-up
. . . . . . . . . . . . . . . . . . . . . . . . .
3
3.3
Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
3.3.1
6
3.4
Calibration 3.4.1
3.5
Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mathematics of the PSOM
8
. . . . . . . . . . . . . . . . .
10
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
4 Results
16
4.1
Calibration
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
4.2
Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
4.3
Tracing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
4.4
Gaze-nger lead time . . . . . . . . . . . . . . . . . . . . . . . . .
19
5 Discussion and conclusion
22
5.1
Calibration
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
5.2
Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
5.3
Tracing
5.4
Conclusion
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i
23 24
Chapter 1
Abstract In this study nger and gaze movement in three-dimensional space was investigated. Since accurately measuring 3D gaze is dicult, most studies focus on planar movement in the frontal plane. To gain more insight in eye movements during hand movements in three dimensions and the coordination between eye and hand movement, the binocular eye movements and nger movements of subjects were measured in a series of trials. In particular we investigated how the coordination between eye and nger movement diers in dierent directions (frontal plane and depth) during tracking and tracing conditions.
This was
accomplished by determining the lead time of gaze position relative to nger position. Binocular gaze was measured with two scleral coils which were calibrated by xating on points on a virtual three-dimensional cube. Depth was estimated by mapping the azimuth and elevation from the calibration trial from both eyes while xating to various targets in 3D space using a parametrized selforganizing map with a 3D interpolation function. Lead times were determined by taking the cross-covariance between eye position and nger position. Results show that nger position almost perfectly follows gaze position while tracking a moving target in the frontal plane. However, the depth component of the gaze lags behind the nger position about 80 ms because vergence is relatively slow, while the speed of the nger is the same in all directions. While tracing a completely visible path the gaze position leads the nger position by approximately 170 ms, while the vergence or depth component only leads 100 ms. These results show that it is not necessary to have the target foveated to accurately anticipate and assist in planning of nger movements in three dimensions.
1
Chapter 2
Introduction It has long been known that version and vergence eye movements have very dierent dynamics (Collewijn et al. [1995], Erkelens [1988]). To gain more insight in eye movements during hand movements in three dimensions and the coordination between eye and hand movement, we measured the binocular eye movements and nger movements of subjects in a series of trials. Several studies already investigated the relationship between nger movements and gaze. However, because of the diculty of measuring gaze in three dimensions very few studies have gone further than only measuring gaze in the frontal plane ([Gielen et al., 2009]). In the present study gaze was determined in three dimensions with a relatively new approach involving parametrized self-organizing maps[Essig et al., 2006]. This approach consists of estimating the gaze position in three dimensions by training a special kind of neural network, a parametrized self-organizing map (PSOM), with the gaze position of known targets. This allows the PSOM to map the pitch and jaw of both eyes to the corresponding point in three dimensions.
A PSOM is essentially a continous self-organizing
map which is capable of learning highly non-linear functions. Two conditions of the coordination of hand and eye movement were investigated. The rst was the tracking condition in which subjects were asked to track a target moving along an invisible path in three dimension with the tip of their right index nger. In the tracing condition subjects were instructed to trace a completely visible path in three dimensions. Tracking produces smooth pursuit eye movements while tracing produces saccades along the path.
2
Chapter 3
Methods Subjects were asked to track a target, moving in 3D, with the tip of the index nger or to move the index nger along a completely visible path in 3D space. The position of the nger tip and the gaze position were measured to determine the time lag between gaze and nger position.
3.1 Subjects Five subjects (aged between 24 and 56 years) of the Radboud University with normal or corrected to normal visual acuity participated in this experiment. Two of the subjects were female. All subjects had already taken part in previous eye tracking experiments with scleral coils and reported that they had no problems in perceiving depth in the presented anaglyph stimuli. Furthermore, all subjects were right handed and none of the subjects had any known neurological or motor disorder.
3.2 Experimental set-up The experimental setup is shown in gure 3.1. The subjects were seated in an oce chair with a high back rest. The position of the head was xed with a helmet attached to the chair to prevent movement of the head. In front of the subject a large back-projection screen of 2.5 m x 2.0 m was placed. The subject's eyes were positioned approximately 70 cm from the screen. The height of the chair was adjusted so that the subjects' cyclopean eye (i.e.
midway between
the two eyes) were right in the middle of the projection area. A bycicle helmet attachted to the chair was used to xate the subjects heads. Data was measured in a right-handed Cartesian coordinate system with its origin in the center of the screen, the x- and y-axis in the horizontal and vertical
3
direction respectively and the z-axis perpendicular to the screen towards the subject (see gure 3.1). The visual stimuli were back-projected on the projection screen with an LCD projector (Philips ProSscreen 4750) with a refresh rate of 60 Hz. A redgreen anaglyph stereoscopic system was used to allow for perception in three dimensions. The color of the projection was calibrated in order to exactly match the red and green lters used in the anaglyph 3D glasses. This ensures that only one eye can see its respective stimulus. To project accurate 3D stimuli the perspective transformations had to take into account the distance between the two eyes and the distance from the cyclopean eye to the screen. Therefore, the eye distance was measured before the experiment. Typical distances were 6.5 cm for the inter-eye distance and 70 cm for the the distance to the screen.
Optotrak cameras
Projection screen
Projector
Y X Z
Figure 3.1: Schematic overview of experimental setup with the projection system (Philips ProScreen 4750), Optotrak 3020 system and the position of the chair relative to the projector screen. The height of the chair was adjusted so that the head of the subject was right in the middle in front of the projection area. The X-, Y-, Z-coordinate had its origin in the center of the projection screen.
The nger position was measured with a Northern Digital Optotrak 3020 system at a sampling frequency of 120 Hz.
4
This system uses three cameras
which were placed at the upper-right hand corner of the subject, tilted down at an angle of 30º, to track the position of strobing infrared-light-emitting diodes (markers). The Optotrak system can distinguish between multiple markers by measuring the strobing frequency of each marker and is capable of tracking the markers with a root-mean-square accuracy of 0.1 mm for the y- and zcoordinates, and 0.15 mm for the x-coordinate. One marker was placed at the tip of the right index nger, oriented towards the Optotrak cameras. Two more markers were placed on the temples of the anaglyph stereo glasses to measure any head movements. The stimuli were not adapted real time to head movement. Head movement data was only used to determine whether the subjects had not moved their head during the experiment. Gaze was measured using scleral coils (Skalar) in both eyes simultaneously in a large magnetic eld (Remmel Labs). The three orthogonal components of this magnetic eld, at frequencies of 48 kHz, 60 kHz and 80 kHz, were produced by a
3 × 3 × 3 m3
cubic frame. The subject was placed near the center of the
magnetic eld, where the eld is most homogeneous. The signals from both coils were captured at a sampling frequency of 120 Hz and ltered by a
4th order low-
pass Bessel lter to prevent aliasing of the coil signal. Bessel lters have a very slow transition from pass band to stop band, but have a linear phase response in the pass band. This property makes Bessel lters preserve the wave shape of the ltered signal. Using this system the yaw and pitch of the eyes could be accurately measured with a resolution of about 0.25º (Frens et al. [1995]). Two PCs were used to conduct the experiment. The rst PC was used to control the Optotrak system. The second computer ran MATLAB which presented the stimuli and captured the coil signals with an analog data-acquisition system which buered the input to ensure no samples were missed. The start of the data-acquisition on both PCs was synchronized by sending a signal to the parallel port.
3.3 Experiments All subjects were asked to complete two calibration trials. First a calibration rose was shown. This calibration trial was not used in the current study but was carried out for compatibility with previous studies. The second trial was the calibration cube trial. This trial was used for calibration of the coil voltages with a parametrized self-organizing map (see section 3.4 for details). After calibration the subjects were tested with dierent shapes with dierent orientations and conditions over 20 trials (see table 3.1).
The two conditions
tested were the 'tracking' and 'tracing' conditions. For the tracking condition subjects were asked to track a dot moving along an invisible 3D path. In the
5
tracing condition the entire path was visible and the subjects were asked to trace the path with the right index nger at approximately the same speed as during the tracking condition. Two dierent shapes were presented for the tracking conditions with one additional shape for the tracing conditions; a Cassini shape and a Limaçon shape. Each shape was presented in two orientations in space; the frontal orientation, with the shape in the x-y plane, and the oblique orientation, with the same shape rotated 45º left handed about the x-axis. Each shape was traced or tracked four times beginning at the top. This was done in 10 seconds per cycle (fast trials) and 15 seconds per cycle (slow trials) resulting in 8 trials per shape. Furthermore, four additional tracing trials were included. These additional trials consisted of a large helix (R=10 cm) and a small helix (R = 6 cm) with their principal axis on the x-axis or on the z-axis. The subjects were asked to trace the helix forth and back three times. In all, the subjects were rst tested on the 8 Cassini trials, then on the 4 helices, and last on the 8 Limaçon trials for a total of 20 trials consisting of approximately 30 minutes. More trials per subject was not practical because wearing scleral coils is limited to 30 minutes per day.
3.3.1
Shapes
The frontal Cassini shape (Figure3.2a) was dened as
x(t)
3 2R
· (1 + a · cos(2ωt)) · sin(ωt)
y(t) = R · (1 + a · cos(2ωt)) · cos(ωt) z(t) z where
R = 12
cm,
a = 0.5
and
z = 30
cm.
The equation for the frontal Limaçon shape (Figure3.2b) was
x(t)
y(t) = z(t) where
a = 20
cm,
b = 10
b · sin(ωt) + a 2
and
a 2
+ b · cos(ωt) +
· sin(2ωt) a 2
· cos(2ωt)
z z = 30
cm.
Both shapes were rotated 45º about the x-axis, with the bottom part closer to the subject to obtain the oblique orientations. For the slow and fast conditions
ω=
2π 15 and
ω=
2π 10 respectively.
The helix (Figure3.2c) was dened as
6
Trial
Shape
Orientation
Speed
1
Calibration Cube
-
-
Condition -
2
Cassini
Frontal
Slow
Tracking
3
Cassini
Frontal
Slow
Tracing
4
Cassini
Frontal
Fast
Tracking
5
Cassini
Frontal
Fast
Tracing
6
Cassini
Oblique
Slow
Tracking
7
Cassini
Oblique
Slow
Tracing
8
Cassini
Oblique
Fast
Tracking
9
Cassini
Oblique
Fast
Tracing
10
Small Helix
Horizontal
-
Tracing
11
Small Helix
Depth
-
Tracing
12
Large Helix
Horizontal
-
Tracing
13
Large Helix
Depth
-
Tracing
14
Limaçon
Frontal
Slow
Tracking
15
Limaçon
Frontal
Slow
Tracing
16
Limaçon
Frontal
Fast
Tracking
17
Limaçon
Frontal
Fast
Tracing
18
Limaçon
Oblique
Slow
Tracking
19
Limaçon
Oblique
Slow
Tracing
20
Limaçon
Oblique
Fast
Tracking
21
Limaçon
Oblique
Fast
Tracing
Table 3.1: This table indicates the shape (Cassini, Limaçon or helix), orientation (frontal or oblique for tracking and tracing and depth and horizontal for tracing), speed (fast or slow) and condition (tracking or tracing) for each trial.
7
x(t)
R · cos(t)
y(t) = R · sin(t) bt z(t) 2π + c where
R=6
cm for the small helices and
For the helices in the depth direction
c = 12
R = 10
cm for the large helices.
cm. The horizontal helices were
c = −24
constructed by swapping the x- and y-axes and setting
cm.
50
40
30
20
10 20 10
20 0 0 −10 −20
(a) Cassini
(b) Limaçon
−20
(c) Helix
Figure 3.2: Shapes used during the tracking and tracing conditions.
(Not to
scale)
3.4 Calibration Precise mapping of gaze position requires calibration. Usually during calibration dierent points on the screen are presented and the subject is asked to xate on these points. This allows the system to obtain a set of samples which relate the coil signals to a certain gaze position on the screen. An interpolation algorithm, usually a second or third degree polynomial, is tted to the data to approximate a mapping function for the entire presentation area for both eyes. This method will determine the gaze direction of both eyes separately.
To determine the
xation point of the eyes in three dimensions the intersection point of the visual axes of both eyes has to be calculated. In three dimensions two lines generally do not intersect and therefore the xation point is estimated as the midpoint of the shortest straight line connecting the visual axes. This approach works well for 2D gaze positions in the frontal plane but is noisy in the depth direction. This is mainly attributed to small errors in the vergence which give rise to relatively large errors in 3D gaze position, especially if the xation point is relatively far from the subject. To overcome this 3D eye tracking problem a neural network approach was
8
used. This method was motivated by results of previous work on 3D calibration. Specialized, individually calibrated neural networks can be used to reduce the error in 3D gaze-position measurement [Essig et al., 2006].
The network
used was a parametrized self-organizing map (PSOM), which is a rapid learning variant of Kohonen's self-organizing maps [Kohonen, 1998]. In order to calibrate the gaze in 3D with a PSOM a virtual calibration cube was constructed (see gure 3.3) . It consisted of three planes on Z = 10, Z = 25 and Z = 40 cm a each containing
3 × 3 points with a distance of 15 cm resulting
in a total of 27 points spanning a volume of
30 × 30 × 30
cm.
Figure 3.3: Stereoscopic projection of the calibration cube.
The darker lines
are only seen by the right eye and vice versa. In the experiment green and red images are used with red-green anaglyph glasses.
Subjects were asked to xate exactly on the highlighted point and then press a button. The button press triggers the acquisition of the coil voltages for one second and the mean value and standard deviation are recorded. After that the next point is highlighted until all points have been shown. Showing the entire calibration cube at once enhances the subjects perception of virtual depth and their ability to perform precise eye movements toward the designated target. However, only one plane at a time was shown to avoid visual interference between the planes. The data obtained was used to train the PSOM.
9
3.4.1
Mathematics of the PSOM
A self-organizing map (SOM) is an articial neural network that is able to produce a lower-dimensional discretized representation of the input space. SOMs are dierent from other articial neural networks in that they use a neighborhood function to preserve the topological properties of the input space. Also, SOMs are capable of learning very non-linear functions [Walter and Ritter, 1996]. The ability to map high-dimensional spaces to lower-dimensional spaces would make SOMs useful for accurate mapping from coil voltages to 3D coordinates.
However, SOMs only supply the position of the most stimulated
neuron in a 'neuron lattice' (Figure 3.4) instead of a continuous output.
For
applications where relatively high-dimensional spaces with a high resolution are needed the size of the neuron lattice quickly becomes unmanageable.
In this
study this would mean that every voxel in the presentation space would require a neuron. With a conservative resolution of 1 mm in all directions this would result in a staggering 30 million neurons. Furthermore, SOM learning requires thousands of training samples which makes it impossible to collect sucient calibration data in a reasonable amount of time. Parametrized self-organizing maps do not have these disadvantages. They supply a continuous output and do not require large amounts of training samples. Instead the PSOM is trained with selected input-output pairs as parameters.
However, this super fast learning comes at a cost.
With only a few
examples to learn one can only determine a rather small number of adaptable parameters.
As a consequence the learning system must be very simple or it
must have a structure that is well matched to the tasked being learned. Fortunately, the type of mapping the PSOM has to do for the calibration of the coils is the same in every trial and subject and thus its structure only has to be matched once to the task at hand. The estimation of 3D gaze positions with a PSOM consist of two steps. First the 3D gaze positions
(xr , yr )
(x3d , y3d , z3d ) are mapped on the coil voltages (xl , yl ) and
through interpolation. Then the inverse of this mapping is computed
to create the desired mapping from coil voltages to 3D gaze position. A PSOM with ve input neurons, 27 inner neurons, and 3 output neurons was used. Each neuron was trained with one of the 27 calibration points
k ∈ A,
where
A = {kxyz |kxyz = xˆ ex + yˆ ey + zˆ ez ;x, y, z = 0, 1, 2}. These points were arranged in a dimension from 0 to 2. coordinate system.
3×3×3
grid with coordinates in each
All further PSOM calculations will be shown in this
Naturally the results have to be scaled and translated to
10
2
1
Y
0 0 2
1
Z
1
2
0
X
Figure 3.4: 3D visualization of the neuron lattice. Each sphere represents a neuron. Note that the lattice has the same topography as the presented calibration points.
match the used laboratory coordinate system. Each point
k
receives information about the gaze parameters. These param-
eters are stored in a training vector
w ˜ k = (xlk , ylk , xrk , yrk , xdk ) where
xlk , ylk , xrk
and
yrk
are the coil voltages for azimuth and elevation for
the left eye and the right eye, respectively. dened as
xdk := xrk − xlk .
xdk
is the divergence which is
This fth dimension was added because the depth-
coordinate depends mainly on the divergence. The dierences in divergence are smaller than those in the frontal plane, so the divergence is weighted with a specic factor to approximate the range of x and y. This will lead to a faster determination of the gaze position as function of the coil voltages. The PSOM now knows the gaze parameters which correspond to the calibration points in 3D space. In the feed-forward step the PSOM interpolates this mapping between the training vectors
w ˜k
to estimate the gaze parameters for
any position in or near the calibration cube. The interpolation function can be constructed from a superposition of basis functions for each inner neuron, weighting the contribution of its training vectors
11
w ˜k
s
depending on the location
relative to the location of the neuron
f (s) =
X
k.
H(s, k)w ˜k
k∈A By specifying a training vector for each neuron location
k ∈ A,
a topological
order between the training points is introduced. Thus training vectors assigned to neurons that are neighbors in
A are perceived to have a specic neighborhood
relation. This allows the PSOM to draw extra curvature information from the training set. It can be quite dicult or impossible to nd a suitable set of basis functions
H(s, k)
if the data is not topologically ordered, i.e.
one must be able to
associate each data sample with a point in some discrete lattice in a topologypreserving fashion (otherwise the interpolation will lead to unacceptably large generalization errors) [Walter and Ritter, 1996].
The basis functions can be
constructed in many ways but must always meet two criteria. First:
H(s, k) = δs,k ∀s, k ∈ A Furthermore the sum of all contribution weights should be one:
X
H(s, k) = 1∀s
k∈A These criteria lead to
f (s) = w ˜ k ∀s ∈ A This guarantees that the interpolation passes trough the calibration points. For the construction of the basis functions a product ansatz from three 1D functions can be used:
H(s, k) = H1D (sx , x) · H1D (sy , y) · H1D (sz , z) x,y and z can only be 0,1 or 2. Thus the 1D functions must have the property:
H1D = (q, n) = δq,n ∀q ∈ R where
n ∈ {0, 1, 2}.
Because
n
has only three possible values, only three basis
functions are needed. The simplest functions which have two roots are second degree polynomials:
H0 =
1 2 3 q − q+1 2 2
12
H1 = −q 2 + 2q
H2 =
1 2 1 q − q 2 2
1.2
H
0
1
H1 H
3
0.8
0.6
0.4
0.2
0
−0.2 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Figure 3.5: Three basis functions used for the interpolation function. Note that the value of each basis function is one on the corresponding neuron and zero on the other neurons.
So far the PSOM can map the 3D gaze coordinates on the coil voltages. However, to estimate the 3D gaze position from the coil voltages the opposite is needed. This is accomplished by calculating the inverse
f −1 of f .
This function
can be implemented by an iterative minimization of an error function:
E(s) =
1 (f (s) − fcoils )2 2
The error function is the deviation of the measured coil voltages the coil voltages estimate
f (s)
fcoils from
calculated by the PSOM for the current 3D gaze position
s.
A very simple approach to calculate the minimum of the error function is the
gradient-descent:
13
s(t + 1) = s(t) − with
> 0,
dimensions.
where
For
Ω
t = 0
δE(s) ·Ω δs
is the weight vector to normalize the dierent input the current nger or stimulus position can be used.
If no nger position is available the center of the calibration cube is a good alternative. While this method works very well in many cases the choice of the learning parameter
is critical. Too large values make the gradient-descent diverge
and too small values will take unnecessarily many iteration steps. In this study a much more robust and ecient method, the
Levenberg-Marquardt-Algorith m
[Levenberg, 1944, Marquardt, 1963], was used. This algorithm can nd the minimum of the error function approximately ten times faster than a conventional
gradient-descent. The nal value of
s
indicates the subject's 3D gaze position as a function of
the current coil voltages.
3.5 Analysis Prior to analyzing the delay between nger position and gaze position the rst eighth and last eighth datapoints were discarded. The gaze position data was ltered by a
Savitzky-Golay
[Savitzky and Golay, 1964] lter of 3rd order and
with a frame size of 21 data points (175 ms).
This lter eectively performs
a local polynomial regression (3rd degree) on a distribution of equally spaced points (21 data points). Gaze data points during blinks were cut out and inter-
Piecewise cubic Hermite spline.
polated with a
These are third-degree splines
with each polynomial of the spline in Hermite form. Hermite polynomials are dened by:
Hn (x) = (−1)n ex
2
/2
dn −x2 /2 e dxn
Missing nger data was also interpolated with the same
Piecewise cubic Hermite
spline. The delay time between nger position and gaze position was determined by calculating the cross-covariance of the ltered gaze position and unltered nger position:
cxy (m) =
P N −|m|−1 x(n + m) − n=0
1 N
PN −1
c∗ (−m) yx
i=0
xi
yn∗ −
1 N
PN −1 i=0
yi∗
m≥0 m<0
where x and y are the 1D gaze and nger position, respectively. The cross-
14
covariance was calculated and by rst multiplying the data with a Hann window,
w(n) = 0.5
1 − cos
2πn N −1
,
to x possible problems at the boundaries of the
data. The result was normalized by setting the auto-covariance at zero lag to 1. The time
τ
on the maximum value of the cross-covariance was taken as the time
lag between nger and gaze position. A negative value for leads the nger. Trials with the maximum of (12 of 43 trials) or
τ
implies that the gaze
cxy (m), cmax < 0.9
for tracking
cmax < 0.7 for tracing (18 of 70 trials) were not considered to
be suciently correlated and were discarded. The lag between nger and gaze position in the depth direction was not determined for the frontal stimuli.
15
Chapter 4
Results 4.1 Calibration Figure 4.1 shows the calibration results of a typical tracking trial for the new calibration method with parametrized self-organizing maps (left) and the conventional method by tting a third order polynomial to a calibration trial and calculating the depth by estimating the intersection point of the two visual axes. The blue, red and dotted black line represent gaze position, stimulus and nger position, respectively. Calibration by the conventional geometric regression method consists of rst tting a number of predened points in a calibration trial with a third order polynomial for both eyes separately. The gaze position is dened by the intersection point of both visual axes. However, lines in three dimensions generally do not intersect. To overcome this problem gaze is estimated by calculating the midpoint of the shortest line connecting both visual axes. Because of the very small vergence angles at relatively great distances from the subject accuracy in the depth direction is very small at those distances. This is clearly seen in the z(x)-plot (gure4.1b, left-bottom panel). In this example the geometric regression method fails to accurately estimate depth further than approximately 40 cm from the subject's eyes. The calibration method involving parametrized self-organizing maps used in this study clearly has a great advantage in the depth direction over the geometric regression calibration method.
Figure 4.1a shows that gaze position is much
more accurate than the geometric calibration method (gure 4.1b), especially in the depth direction. While accuracy of the gaze position at greater depth is worse than at shorter distances by the small vergence angles, the eect is much smaller than in the geometric regression method. Gaze almost perfectly follows the stimulus in all directions.
16
15
15
20
10
10
5
10
5
0
−10
0
y (cm)
y (cm)
0
y (cm)
10
y (cm)
20
−5
−20 0
−20 0 −10
20
20
60
z (cm)
−20
0
40 −15 −20
x (cm)
−10
0 x (cm)
10
20
−10
20
20
0
40
60
z (cm)
−20
−15 −15
x (cm)
15
15
20
20
20
20
25
25
25
25
30
30
30
30
35
35
35
40
40
40
45
45
45
50 −20
−10
0 x (cm)
10
20
50 −15
−10
−5
0 y (cm)
5
10
50 −15
15
(a) Calibrated with PSOM
Figure 4.1:
z (cm)
15
z (cm)
15
z (cm)
z (cm)
0
−10 −5
−10
−5
0 x (cm)
5
10
15
−10
−5
0 y (cm)
5
10
15
35 40 45
−10
−5
0 x (cm)
5
10
15
50 −15
(b) Calibrated with geometric regression
Results of calibration using a conventional geometric regression
approach (left) and using a parametrized self-organizing map.
The blue, red
and dotted black line represent gaze position, stimulus and nger position respectively. While the geometric approach (left) does a reasonably good job of estimating gaze position in the frontal plane, it fails to do the same in the depth direction. Note that points relatively close to the screen (far from the subject in the depth direction) are estimated as too close to the subject. This results in a squashed gaze position estimate in the depth direction. The calibration done with the PSOM (right) shows slightly better results in the frontal plane but is much more accurate than the geometric approach in the depth direction. The gaze position very closely follows the stimulus (and nger position) without loss in accuracy at greater distances from the subject.
4.2 Tracking Figure 4.2a shows a three dimensional plot of gaze and nger position for the oblique Limaçon shape for the tracking condition for a single subject. The blue line represents the gaze position and the black line the nger position.
Gaze
produces a smooth line during smooth pursuit tracking and is superimposed very well on the nger position, except for short deviations in depth during saccades which are well known in literature [Chaturvedi and Van Gisbergen, 2000]. Figure 4.3a-c shows the nger and gaze position versus time for tracking for the x, y and z direction, respectively, for the same trial and subject. It is clearly seen that gaze position is generally a smooth line with very few saccades and occasionally a small peak indicating a blink. Furthermore, the gaze position and the nger position almost perfectly superimpose for the x and y directions. In the depth direction (z direction, gure 4.3c) the noise is much larger than in the frontal plane (gure 4.3a and 4.3b) and the signal is not as smooth. The peaks correspond to small saccades in the frontal plane. Occasionally larger negative
17
45
45
40
40
35
35
30
30
25
25
20
20
15 20
15 20 10
20 10
0
10
20 10
0
0 −10
0 −10
−10 −20
−10 −20
−20
(a) Tracking
−20
(b) Tracing
Figure 4.2: Finger position and Gaze position in three dimensions for the oblique Limaçon shape while Tracking (a) and Tracing (b)
peaks (not visible here) are due to (unltered) blinks and can be as large as 40 cm in the z direction.
Also, steps are visible which occur when the two eyes
make saccades in opposite directions.
4.3 Tracing Figure 4.2a shows a three dimensional plot of gaze and nger position for the oblique Limaçon shape for the tracing condition of a single subject.
During
tracing the eyes make saccades along the completely visible shape and as a result gaze position does not superimpose as well on the nger position as during the tracking condition.
In the depth direction the variance is very large be-
cause of changes in vergence during saccadic eye movements [Chaturvedi and Van Gisbergen, 2000]. Figures 4.3d-f show nger and gaze position versus time for tracing for the x, y and z direction, respectively, for the same trial and subject. Unlike in the tracking condition, the gaze position is not a smooth line and does not impose very well on the nger position. Saccades are clearly seen as steps. Furthermore, it is clear that gaze position signicantly leads nger position (see gure 4.3de). As in gure 4.3c it can be seen in gure 4.3f that the variance in the depth direction is very large. The jumps in the depth direction are much smaller than in the frontal plane, and peaks can be seen which correspond with large saccades in the frontal plane.
18
20
15
45
Gaze Finger
15
Gaze Finger
Gaze Finger
10
40
5
35
position (cm)
position (cm)
5 0 −5
position (cm)
10
0
−5
30
25
−10 −10
20
−15 −20
200
400
600 800 time (bins)
1000
1200
−15
1400
(a) Tracking x-axis
15 200
400
600 800 time (bins)
1000
1200
1400
200
(b) Tracking y-axis
25
600 800 time (bins)
1000
1200
1400
(c) Tracking z-axis
15
45
Gaze Finger
20
400
Gaze Finger
Gaze Finger
10
40
5
35
position (cm)
position (cm)
10 5 0 −5
position (cm)
15
0
30
−5
25
−10
20
−10 −15 −20
200
400
600 800 time (bins)
1000
1200
1400
(d) Tracing x-axis
−15
15 200
400
600 800 time (bins)
1000
1200
1400
(e) Tracing y-axis
200
400
600 800 time (bins)
1000
1200
1400
(f) Tracing z-axis
Figure 4.3: Finger and Gaze position versus time for tracking (panel a-c) and tracing (panel d-f ). The time axis is plotted in bins, which each are
1 120 seconds
(8.3 ms).
4.4 Gaze-nger lead time The lag between gaze position and nger position can be quantied by calculating the cross-covariance function between the gaze position and the nger position. The cross-covariance function has a maximum close to the origin, but is slightly shifted. This shift represents the lag time. Figure 4.4 shows the gaze-nger lag time in ms for all trials. Trials with an absolute lead time greater than 500 ms were ignored. Moreover, trials where the nger position was not symmetrical by more than 200 ms (see gure 4.5) were also removed. In the tracking condition the lag of the nger with respect to the gaze position for the frontal plane (x-y plane) are not signicantly dierent from 0. However, in depth (z axis) the nger position leads the gaze position. This indicates that the gaze does not have to be foveated for the nger to accurately follow the stimulus. All ignored trials are shown in grey in gure 4.4. Table 4.1 shows the mean values and standard deviations of the lag for the x, y and z directions. The mean values were calculated by averaging over all trials and all subjects. For the tracing condition Table 4.1 shows that in the frontal plane the gaze position leads the nger position by about 260 ms. During tracing the eye makes saccades which xate on future nger positions until the nger has reached that position to help lead the nger. After that, the eyes xate on a point further on
19
Tracking
X
500
0
0
−500
Y
5
10
15
20
25
30
35
−500
40
500
500
0
0
−500
Z
Tracing
500
5
10
15
20
25
30
35
−500
40
500
500
0
0
−500
5
10
15
20
25
30
35
−500
40
10
20
30
40
50
60
70
10
20
30
40
50
60
70
10
20
30
40
50
60
70
Figure 4.4: The gaze-nger lead times for all trials are shown on the y-axis, with each trial on the x-axis. The left column is for tracking, the right column is for tracing.
Trials shown in grey were ignored in the determination of the
gaze-nger lead times.
Note the much large deviation in the Z direction.
A
negative value means that gaze leads nger. Trials were the stimuli were only presented in the frontal plane have a lead time of zero in the z-direction. This explains the 'missing trials' in the bottom panels.
Tracking Tracing
X
Y
Z
−18 ± 30 ms N = 31 −274 ± 82 ms, N = 52
−49 ± 57 ms N = 31 −264 ± 92 ms, N = 52
+83 ± 137 ms N = 10 −104 ± 230ms N = 35
Table 4.1: Mean lag between gaze and nger position.
and number of trials
used to calculate the lag. Negative values indicate that the gaze leads. These values were obtained by averaging over all trials and all subjects.
20
15
4.6 s
5.2 s
A
C
10
z (cm)
5
0
−5
−10
B −15
10
15
20
25
30
35
t (s) Figure 4.5: This gure shows the asymmetry in one of the trials. Note that the distance between peak A and peak B is 4.6 s as opposed to the distance between peak B an peak C which is 5.2 s, while both should be approximately be 5 s. This asymmetry of 600 ms makes it impossible to calculate a reliable lead time from the cross-covariance because the lead of peak B will shift the maximum of the cross-covariance function to the left and thus produce a too large gaze-nger lead time.
the path. In the tracking condition this is not possible because the future target position is not known. For the depth direction the gaze also leads the nger. However, vergence is much slower thus the lead time for the gaze is much less than the lead time in the frontal plane.
21
Chapter 5
Discussion and conclusion To gain more insight in eye movements during hand movements in three dimensions and the coordination between eye and hand movement, the binocular eye movements and nger movements of subjects were measured in a series of trials. In particular we investigated how the coordination between eye and nger movement diers in dierent directions (in the frontal plane and in depth) during tracking and tracing conditions. This was accomplished by determining the lead time of gaze position relative to nger position in three dimensions. When tracking a moving target in three dimensions nger position follows gaze position almost directly with very little or no lag in the frontal plane. In depth the gaze position lags behind the nger position by about 40 ms. For tracing a completely visible path gaze always leads nger position by signicant time. In the frontal plane gaze leads nger position by about 270 ms and in depth gaze leads about 100 ms.
5.1 Calibration Calibration of the scleral coils was accomplished by training a parametrized self-organizing map.
This relatively new approach was chosen because of its
more robust and accurate estimation of the gaze position relative to a conventional method such as the geometric regression calibration method or to regular feed-forward neural networks. Calibration with a PSOM showed a signicant improvement in the calibration performance. Especially in the depth direction where conventional methods have diculty to accurately estimate gaze position at relatively great distances from the eyes because vergence angles become very small at those depths. The calibration with the PSOM proved to be much more robust and accurate in the depth direction with much less deterioration of accuracy at relatively great depths. In this study a calibration trial of 27 points
22
arranged in a cube were used. If more points are used, for example a cube with
4×4×4 points for 64 points total, more information is given to the interpolation function used during the PSOM calculations which should signicantly increase the accuracy of the calibration.
5.2 Tracking The results show that during the tracking condition the gaze-nger lead time in the frontal plane is very small,−18±30 ms and
−49±57 ms for the horizontal and
vertical direction respectively. The large standard deviation in the measurement eectively means that there is no signicant lag between gaze and nger. The time between generating a motor command in the motor cortex and the actual eye and hand movement is about 10 ms and 120 ms respectively.
For
the gaze position and the nger position to simultaneously move to the same position the motor command can not be executed at the same time. The results imply that motor commands to the hand precede the motor commands to the eye by about 110 ms. For tracking in the depth direction the gaze lags behind(!) the nger position about 80 ms. While the standard deviation is very large due to arelatively small number of successful tracking trials with a depth component it is clear that nger position leads gaze position. It is well known that vergence is much slower than version while nger movement is equally fast in all directions. Thus the results for the depth direction support the results in the frontal plane which imply that motor commands to the hand precede motor commands to the eye by about 110 ms.
5.3 Tracing During tracing of a completely visible path the results show that gaze position leads nger position by about 270 ms in the frontal plane. This large lead time is due to saccades which jump ahead on the path to anticipate hand movement. After a saccade the gaze position stays xed at the same position until the hand near the gaze position after which the gaze again jumps ahead to anticipate hand position. In depth gaze position leads hand position only by about 100 ms. supports the fact that vergence is much slower than version.
in lead time between the frontal plane and depth is about 170 ms. comparable to tracking, for which this value is about 120 ms.
23
This
The dierence This is
5.4 Conclusion In this study we have investigated nger and gaze movements in three dimensional space. Binocular eye movements and nger movements of subjects were measured in a series of trials. All subjects were asked to track a moving target or trace a completely visible path in three dimensions. The results show that motor commands from the motor cortex to the eye and nger position are not issued simultaneously but that motor command to the hand precedes those to the eye by about 110 ms.
This implies that the brain anticipates the hand
movement. Thus it is not necessary to have the target foveated to plan accurate hand movements towards the target which leaves time to compensate for the 120 ms delay between the motor commands from the motor cortex to the arm and the actual arm movement.
24
Bibliography V. Chaturvedi and J.A.M. Van Gisbergen.
Stimulation in the rostral pole of
monkey superior colliculus: eects on vergence eye movements.
Experimental
Brain Research, 132(1):7278, 2000. H. Collewijn, C.J. Erkelens, and R.M. Steinman. Voluntary binocular gaze-shifts in the plane of regard: dynamics of version and vergence.
Vision Research,
35(23-24):33353358, 1995. CJ Erkelens. Fusional limits for a large random-dot stereogram.
Vision research,
28(2):345, 1988. K. Essig, M. Pomplun, and H. Ritter. A neural network for 3D gaze recording
International Journal of Parallel, Emergent and Distributed Systems, 21(2):7995, 2006.
with binocular eye trackers.
MA Frens, AJ Van Opstal, and R.F. Van der Willigen. Spatial and temporal factors determine auditory-visual interactions in human saccadic eye movements.
Perception & Psychophysics, 57(6):802816, 1995.
C.C.A.M. Gielen, T.M.H. Dijkstra, I.J. Roozen, and J. Welten. Coordination of gaze and hand movements for tracking and tracing in 3D.
Cortex,
45(3):
340355, 2009. T. Kohonen. The self-organizing map.
Neurocomputing, 21(1-3):16, 1998.
K. Levenberg. A method for the solution of certain non-linear problems in least squares.
Quart. Appl. Math, 2(2):164168, 1944.
D.W. Marquardt. An algorithm for least-squares estimation of nonlinear parameters.
Journal of the Society for Industrial and Applied Mathematics,
pages
431441, 1963. A. Savitzky and M.J.E. Golay.
Smoothing and Dierentiation by simplied
Least Squares Procedures Analytical Chemistry Vol 36, No. 8 July 1964 pp
25
A rapid method for calculating the least squares solution of a polynomial of degree not exceedin the fth, 1964. 1672-1639| 2J Kerawala, SM.
J. Walter and H. Ritter. Rapid learning with parametrized self-organizing maps.
Neurocomputing, 12(2-3):131153, 1996.
26