![]() |
|
| HOME * CALL FOR PARTICIPATION * LOCATION AND REGISTRATION * DRAFT PROGRAM | ![]() |
Eye Tracking Techniques for Perceptually Adaptive GraphicsAndrew T. Duchowski, Clemson Universityemail:andrewd@vr.clemson.eduIntroductionA primary goal of Perceptually-Adaptive Computer Graphics (PACG) research is to develop CG methods which in some way adapt to the human observer. On the one hand, interactive approaches aim to dynamically alter the rendered image in real-time to match the perceptual limits of the Human Visual System (HVS) at the viewer's Point Of Regard (POR). On the other hand, non-interactive approaches typically involve generating static imagery designed to match the HVS. For example, adaptive image codecs and renderers have been developed to efficiently encode similar spatial or color regions which are below the human Just Noticeable Difference (JND) threshold. Both approaches can benefit from, and in fact may rely on, the use of eye trackers. The aim of this paper is to survey current PACG-related eye tracking techniques, and where appropriate, highlight potential future research opportunities. Eye Tracking TechniquesAdvancements in eye tracking technology, specifically the availability of cheaper, faster, more accurate and easier to use trackers, have inspired increased interdisciplinary eye movement and eye tracking research efforts. A good example of this type of diverse interest can be found in the proceedings of the recent Eye Tracking Research & Applications Symposium (organized and co-chaired by the author) [DKS00]. Although these proceedings contain exemplary work from several disciplines, only a few papers are directly related to CG research. Work most closely related to CG is currently conducted by the Human Computer Interaction (HCI) community, however, here CG is often used as a means to an end, and is not often the objective of the research. The use of eye trackers in graphical systems falls into two general application types: diagnostic and interactive. Interactive eye tracking systems typically respond in some way to the location of the user's gaze. Such interactive systems may be classified by two application sub-types: selective and gaze-contingent. In selective applications, the user's gaze acts as an alternate mode of input, often compared to a pointing device. For example, Tanriverdi and Jacob have recently used an eye tracker as a selection device in VR [TJ00]. While these types of interactive applications are interesting in their own right, they are not necessarily relevant to the development of PACG techniques. Interactive techniques more concerned with developing perceptually-adaptive graphics fall under the gaze-contingent systems category, as shown in thehierarchy in Figure 1. ![]() Figure 1: Hierarchy of eye tracking applications. In contrast to interactive applications, diagnostic eye tracking systems typically involve the recording of eye movements over time, known as scanpaths, for analysis of the user's overt visual attention over a given stimulus. These systems generally do not require the display to react to the user's gaze. Numerous examples of general diagnostic applications can be found (see for example [DV00]), however, diagnostic eye tracking methods for PACG have not been widely adopted. This paper focuses on eye tracking techniques relevant to PACG, surveying interactive gaze-contingent and non-interactive diagnostic applications. Interactive SystemsInteractive eye tracking methods for perceptually-adaptive graphics typically rely on the use of an eye tracker to manipulate the CG scene, contingent on the viewer's location of gaze. The motivation behind gaze-contingent systems is to minimize overall display bandwidth by reducing peripheral information in concordance with the perceptual limits of the HVS. Gaze-contingent displays are generally partitioned into two imperceptibly distinct spatial regions, a high-resolution foveal Region Of Interest (ROI) surrounded by a low-resolution peripheral region. There are two main approaches: screen-based and model-based. The screen-based approach deals with the manipulation of framebuffer contents just prior to display. The periphery is often masked or smoothed in some way, reducing the bandwidth by compressing the information (in bits-per-pixel) required to display or transmit the final image. Since early peripheral degradation attempts, development of screen-based approaches has continued to show promise for gaze-contingent display with minimal cost to either perception or performance. The model-based approach, while not as common, aims at reducing resolution by directly manipulating the geometric models prior to rendering. To increase display rates above those currently provided by view-dependent Level Of Detail (LOD) rendering methods, it has been suggested that an eye tracker is required to enable the presentation of high resolution portions of the scene or object only at the point of highest visual acuity, i.e., at the foveal ROI [LVCW+00]. Screen-Based ApproachesThe idea of gaze-contingent displays is not new and dates back to early military applications, specifically eye-slaved flight simulators [Koc87,LTFW+89]. In the Super Cockpit Visual World Subsystem, Kocian considered visual factors including contrast, resolution, and color in the design of a head-tracked display. In their Simulator Complexity Testbed (SCTB), Longridge et al. included an eye-slaved ROI as a major component of the Helmet Mounted Fiber Optic Display (HMFOD). This ROI provided a high resolution inset in a low resolution (presumably homogeneous) field which followed the user's gaze. The precise method of peripheral degradation was not described apart from the criteria of low resolution. However, the authors did point out that a smooth transition between the ROI and background was necessary in order to circumvent the possibility of a perceptually disruptive edge artifact. Since early flight simulators, various approaches have been developed for ROI-based image and video coding [LM00,PCN00,KG96,TEHM96,NLO94,ST94]. For screen-based Virtual Reality rendering, the work of Watson et al. is particularly relevant [WWH+97]. The authors studied the effects of LOD peripheral degradation on visual search performance. Both spatial and chrominance detail degradation effects were evaluated in Head Mounted Displays (HMDs). To sustain acceptable frame rates, two polygons were texture mapped in real-time to generate a high resolution inset within a low resolution display field. The authors suggested that spatial and chrominance complexity can be reduced by almost half without degrading performance. In an approach similar to Watson's, Reddy used a view-dependent screen-based LOD technique to evaluate both perceptual effects and system performance gains [Red98]. The author reported a perceptually modulated LOD system which afforded a factor 4.5 improvement in frame rate. It is not entirely clear how the LOD model was constructed, e.g., what was the method of degradation, nor is it clear what kind of apparatus was used. Reddy's empirical evaluation of the LOD model was performed on a 43.6 x 33.4 degree Field Of View (FOV) display, presumably a desktop monitor without the use of an eye tracker. Although these reports are encouraging, it should be noted that head tracking alone appears to have been used as an estimate of gaze direction. The lack of an eye tracker may pose potentially serious implications on the interpretations of results since the true point of regard could not be verified. Model-Based ApproachesThe technique of simplifying the resolution of geometric objects as they recede from the viewer, as originally proposed by Clarke [Cla76], is now standard practice, particularly in real-time applications such as VR [Vin95]. Clarke's original criteria of using the projected area covered by the object for descending the object's LOD hierarchy is still widely used today. However, as Clarke suggested, the LOD management scheme typically employed by these polygonal simplification schemes relies on pre-computed fine-to-coarse hierarchies of an object. This leads to uniform, or isotropic in terms of resolution degradation, simplification of objects.A gaze-contingent adaptation of a model-based adaptive rendering scheme was proposed by Ohshima et al., where three visual characteristics were considered: central/peripheral vision, kinetic vision, and fusional vision [OYT96]. Their LOD algorithm generated isotropically degraded objects at different visual angles. Although the use of a binocular eye tracker was proposed, the system as discussed used only head tracking as a substitute for gaze tracking. Isotropic object degradation is not always desirable, especially when viewing large objects at close distances. Numerous multiresolution mesh modeling techniques suitable for gaze-contingent viewing have recently been developed [ZS00]. Techniques range from multiresolution representation of arbitrary meshes to the management of LOD through peripheral degradation within an HMD where gaze position is assumed to coincide with head direction [LKRH+96,MJ96,Hop97,ZSS97,SS97]. Although some of these authors address view and gaze dependent object representation, few results concerning display speedup are as yet available showing successful adaptation of these techniques within a true gaze-contingent system, i.e., one where an eye tracker is employed. Due to the advancements of multiresolution modeling techniques and to the increased affordability of eye trackers, it is now becoming feasible to extend the LOD approach to gaze-contingent displays, where models are rendered nonisotropically. An early example of a nonisotropical model-based gaze-contingent system, where gaze direction is directly applied to the rendering algorithm, was presented by Levoy and Whitaker [LW90]. The authors' spatially adaptive near real-time ray tracer for volume data displayed an eye-slaved ROI by modulating both the number of rays cast per unit area on the image plane and the number of samples drawn per unit length along each ray as a function of local retinal acuity. The ray-traced image was sampled by a nonisotropic convolution filter to generate a 12deg. foveal ROI within a 20deg. mid-resolution transitional region. Based on preliminary estimates, the authors suggested a reduction in image generation time by a factor of up to 5. An NAC Eye Mark eye tracker was used to determine the user's Point Of Regard (POR) while viewing a conventional 19" TV monitor. A chin rest and immobilization strap were used to eliminate the need for head tracking. Recently, Luebke and Erikson developed a gaze-directed LOD technique to facilitate the gaze-contingent display of geometric objects [LHNW00]. To test their rendering approach, the authors employed a table-mounted monocular eye tracker to measure the viewer's real-time location of gaze over a desktop display. While this work shows the feasibility of employing an eye tracker, the implementation framework used by the authors lacked a head tracker and required a chin rest to ensure tracker accuracy. In our own work, we are developing gaze-contingent geometric modeling techniques for VR display. We use an off-the-shelf binocular eye tracker built into an HMD. In one scenario, we use the eye tracker as an indicator of gaze in a gaze-contingent multiresolution terrain navigation environment [DDGM00]. A surface, represented as a quadrilateral mesh, is divided into fixed-size (number of vertices) sub-blocks, which permits rendering for variable LOD on a per-sub-block basis. Resolution level is chosen per sub-block and is based on viewer distance. The resolution level is not discrete; it is interpolated between the pre-computed discrete levels to avoid ``popping'' effects. The approach used is reasonably effective; however, it is not clear whether the technique is applicable to arbitrary meshes. More recently, we have developed an object-based LOD method similar to that of Ohshima et al. where objects are modeled for gaze-contingent viewing. However, unlike Ohshima's approach, our resolution degradation method is applied nonisotropically, i.e., objects are not necessarily degraded uniformly. Our modeling technique is based on that of Eck et al. [EDDH+95], where the original mesh is partitioned into Voronoi tiles and remeshed into multiresolution form. A 3D spatial degradation function is used for LOD selection. This function differs significantly from the area-based criteria originally proposed by Clarke. Instead of evaluating the screen coverage of the projected object, our degradation function is based on the evaluation of visual angle in world coordinates. Besides the need to further develop nonisotropic model-based PACG techniques, a good deal of subsidiary problems remain. Due to the speed of saccadic eye movements, gaze-contingent displays often result in jerky, or noisy scene rendering. Real-time analysis of fixations would no doubt promote greater rendering stability. In addition, if the eye movement shows little variation, as is the case during a fixation, only the geometry subtended by the fovea need be reconstructed; cached geometry from previously rendered frames could be used in the periphery. Promising techniques are available which have yet to be exploited in gaze-contingent displays. Particularly exciting is real-time eye movement prediction, which can be used to alleviate the response lag inherent in all eye tracking systems.
Figure 2: Martian terrain: gaze-contingent rendering.
Figure 3: Nonisotropic gaze-contingent geometric LOD rendering. Diagnostic SystemsHuman visual perception of imagery is an important contributing factor to the design of perceptually-based image and video display systems. Human observers have been used in various aspects of display design, including the development of corrective display functions (e.g., gamma function) dependent on models of human color and luminance perception, color spaces (e.g., CIE Lab color space), and image and video codecs. JPEG and MPEG both use quantization tables based on the notion of JNDs to quantize colors of perceptually similar hue [Wal91]. Perceptually adaptive image renderers (e.g., ray tracers and radiosity engines) have been developed to efficiently encode similar spatial or color regions which are below the human JND threshold [BM98,FPSG96]. Often, however, these studies are based on automatically located image regions, which may or may not correspond to foveally viewed segments of the scene. That is, these studies do not necessarily employ an eye tracker to verify the ROI-based coding schemes. Instead, a figure-ground assumption is often used to argue for more or less obvious foveal candidates in the scene, or other signal-based approaches are used to identify candidate foveal regions. This research area is particularly suitable for diagnostic eye movement studies which can be used to corroborate the figure-ground assumption. Diagnostic eye tracking systems for perceptually-adaptive graphics are not widespread, although these systems are generally easier to assemble than interactive ones. For non-immersive displays (e.g., a monitor vs. an HMD), a table-mounted remote monocular eye tracker is often sufficient to monitor gaze over the CG-generated stimuli. A scanpath recorded by such a device provides compelling evidence of the viewer's overt visual attention, which in turn facilitates fairly simple comparisons between ROIs automatically identified by the perceptually-adaptive algorithm against those actually foveated by the observer. Instead of assuming a feature-based approach to foveal (high-resolution) image encoding, an eye tracker can be used to directly establish the foveal ROI. A suitable image degradation scheme may then be employed to render detail within this region (or regions). This motivated research by the author to find a suitable image degradation scheme which would match foveal acuity [Duc00]. Using a short video clip of a TV news anchor, foveal regions were selected over the TV anchor's right eye and another over the ``timebox'' located in the bottom right corner of the image. These ROIs were identified as commonly foveated regions by human subjects. The resolution of the reconstructed image dropss off smoothly, matching visual acuity on the television display used during experimentation. Spatial visual acuity is but one characteristic of vision that can be exploited in the design of non-interactive CG displays. Further degradation may be possible in terms of color, contrast, and motion, gaining greater perceptually-adaptive compression savings.
Figure 4:Image reconstruction and wavelet coefficient resolution mapping (assuming 50dpi screen resolution). ConclusionExamples of Perceptually-Adaptive Computer Graphics techniques where eye tracking can and does play a vital role have been discussed. Interactive gaze-contingent techniques are still relatively new, and many opportunities exist for advancing both screen-based and particularly model-based modeling approaches matching perception. Diagnostic eye tracking techniques for PACG are as yet underutilized. Clearly, objective evidence of where viewers actually foveate presents a benefit to any future PACG methodology where Regions Of Interest are selected automatically for perceptual compression. This work was supported in part by a University Innovation grant (#1-20-1906-51-4087) and NSF CAREER award #9984278. References[BM98] Bolin, M. R., and Meyer, G. W., A Perceptually Based Adaptive Sampling Algorithm. In Computer Graphics SIGGRAPH'98 Proceedings (1998), ACM, pp.299--309.[Cla76] Clarke, J. H., Hierarchical Geometric Models for Visible Surface Algorithms. Communications of the ACM 19, 10 (October 1976), 547--554. [DDGM00] Danforth, R., Duchowski, A., Geist, R., and McAliley, E., A Platform for Gaze-Contingent Virtual Environments. In Smart Graphics (Papers from the 2000 AAAI Spring Symposium, Technical Report SS-00-04) (Menlo Park, CA, 2000), AAAI, pp.66--70. [DKS00] Duchowski, A., Karn, K. S., and Senders, J. W., Eds., Eye Tracking Research & Applications (ETRA) (Palm Beach Gardens, FL, 2000), ACM. URL: http://www.vr.clemson.edu/eyetracking/et-conf/ [Duc00] Duchowski, A. T., Acuity-Matching Resolution Degradation Through Wavelet Coefficient Scaling, IEEE Transactions on Image Processing 9, 8 (August 2000), 1437--1440. [DV00] Duchowski, A. T., and Vertegaal, R., Course 05: Eye-Based Interaction in Graphical Systems: Theory & Practice. ACM SIGGRAPH, New York, NY, July 2000. SIGGRAPH 2000 Course Notes, URL: http://www.vr.clemson.edu/eyetracking/sigcourse/ [EDDH+95] Eck, M., DeRose, T., Duchamp, T., Hoppe, H., Lounsbery, M., and Stuetzle, W., Multiresolution Analysis of Arbitrary Meshes. In Computer Graphics (SIGGRAPH '95) (New York, NY, 1995), ACM, pp.173--182. [FPSG96] Ferwerda, J. A., Pattanaik, S. N., Shirley, P., and Greenberg, D. P., A Model of Visual Adaptation for Realistic Image Synthesis. In Computer Graphics (SIGGRAPH '96) (New York, NY, 1996), ACM, pp.249--258. [Hop97] Hoppe, H., View-Dependent Refinement of Progressive Meshes. In Computer Graphics (SIGGRAPH '97) (New York, NY, 1997), ACM. [Koc87] Kocian, D., Visual World Subsystem. In Super Cockpit Industry Days: Super Cockpit/Virtual Crew Systems (Air Force Museum, Wright-Patterson AFB, OH, 31 March--1 April 1987), Air Force Systems Command/Human Systems Division/Armstrong Aerospace Medical Research Laboratory. [KG96] Kortum, P., and Geisler, W. S., Implementation of a foveated image coding system for bandwidth reduction of video images. In Human Vision and Electronic Imaging (Bellingham, WA, January 1996), SPIE, pp.350--360. [LW90] Levoy, M., and Whitaker, R. Gaze-Directed Volume Rendering. In Computer Graphics (SIGGRAPH '90) (New York, NY, 1990), ACM, pp.217--223. [LKRH+96] Lindstrom, P., Koller, D., Ribarsky, W., Hodges, L. F., Faust, N., and Turner, G. A., Real-Time, Continuous Level of Detail Rendering of Height Fields. In Computer Graphics (SIGGRAPH '96) (New York, NY, 1996), ACM, pp.109--118. [LTFW+89] Longridge, T., Thomas, M., Fernie, A., Williams, T., and Wetzel, P. Design of an Eye Slaved Area of Interest System for the Simulator Complexity Testbed. In Area of Interest/Field-Of-View Research Using ASPT (Interservice/Industry Training Systems Conference) (Brooks Air Force Base, TX, 1989), T. Longridge, Ed., National Security Industrial Association, Air Force Human Resources Laboratory, Air Force Systems Command, pp.275--283. [LM00] Loschky, L. C., and McConkie, G. W., User Performance With Gaze Contingent Multiresolutional Displays. In Eye Tracking Research & Applications Symposium (Palm Beach Gardens, FL, 2000), ACM, pp.97--103. [LHNW00] Luebke, D., Hallen, B., Newfield, D., and Watson, B. Perceptually Driven Simplification Using Gaze-Directed Rendering. Tech. Rep. CS-2000-04, University of Virginia, 2000. [LVCW+00] Luebke, D., Varshney, A., Cohen, J., Watson, B., and Reddy, M. Course 41: Advanced Issues In Level Of Detail. ACM SIGGRAPH, New York, NY, 2000. SIGGRAPH 2000 Course Notes. [MJ96] MacCracken, R., and Joy, K. Free-From Deformations With Lattices of Arbitrary Topology. In Computer Graphics (SIGGRAPH '96) (New York, NY, 1996), ACM, pp.181--188. [NLO94] Nguyen, E., Labit, C., and Odobez, J.-M. A ROI Approach for Hybrid Image Sequence Coding. In International Conference on Image Processing (ICIP)'94 (November 1994), IEEE, pp.245--249. [OYT96] Ohshima, T., Yamamoto, H., and Tamura, H. Gaze-Directed Adaptive Rendering for Interacting with Virtual Space. In Proceedings of VRAIS'96 (March 30--April 3 1996), IEEE, pp.103--110. [PCN00] Parkhurst, D., Culurciello, E., and Niebur, E. Evaluating Variable Resolution Displays with Visual Search: Task Performance and Eye Movements. In Eye Tracking Research & Applications Symposium (Palm Beach Gardens, FL, 2000), ACM, pp.105--109. [Red98] Reddy, M. Specification and Evaluation of Level of Detail Selection Criteria. Virtual Reality: Research, Development and Application 3, 2 (1998), 132--143. [SS97] Schmalstieg, D., and Schaufler, G. Smooth Levels of Detail. In Proceedings of VRAIS'97 (March 1--5 1997), IEEE, pp.12--19. [ST94] Stelmach, L. B., and Tam, W. J. Processing Image Sequences Based on Eye Movements. In Conference on Human Vision, Visual Processing, and Digital Display V (San Jose, CA, February 8-10 1994), SPIE, pp.90--98. [TJ00] Tanriverdi, V., and Jacob, R. J. K. Interacting with Eye Movements in Virtual Environments. In Human Factors in Computing Systems: CHI 2000 Conference Proceedings (2000), ACM Press, pp.265--272. [TEHM96] Tsumura, N., Endo, C., Haneishi, H., and Miyake, Y. Image compression and decompression based on gazing area. In Human Vision and Electronic Imaging (Bellingham, WA, January 1996), SPIE. [Vin95] Vince, J. A. Virtual Reality Systems. Addison-Wesley, Reading, MA, 1995. [Wal91] Wallace, G. K. The JPEG Still Picture Compression Standard. Communications of the ACM 34, 4 (April 1991), 30--45. [WWH+97] Watson, B., Walker, N., Hodges, L. F., and Worden, A. Managing Level of Detail through Peripheral Degradation: Effects on Search Performance with a Head-Mounted Display. ACM Transactions on Computer-Human Interaction 4, 4 (December 1997), 323--346. [ZS00] Zorin, D., and Schroder, P. Course 23: Subdivision for Modeling and Animation. ACM SIGGRAPH, New York, NY, 2000. URL: http://www.mrl.nyu.edu/dzorin/sig00course/ [ZSS97] Zorin, D., Schroder, P., and Sweldens, W. Interactive Multiresolution Mesh Editing. In Computer Graphics (SIGGRAPH '97) (New York, NY, 1997), ACM. This file is also available in PDF © Copyright is held by the author, Andrew Duchowski, 2001
|
Contact |
Ann McNamara and Carol O'Sullivan Image Synthesis Group, Trinity College Dublin |
|
| BACK TO TOP | maintained by John.Dingliana@cs.tcd.ie |