news

New Framework Enhances AR Experience by Predicting Where Users Will Look

Primary tabs

Augmented reality (AR) devices like smart glasses may soon be able to predict where a user will look and provide an enhanced interactive experience.

Fiona Ryan, a Ph.D. student in Georgia Tech’s School of Interactive Computing, is pioneering research that tracks and predicts user gaze from a first-person perspective in 3D environments.

Currently, most AR devices react to where users look, playing catch-up. Ryan’s method could give these devices a heads-up and make the user experience more seamless.

“It allows an AR system to anticipate what the person will interact with next and where they’re going to look next so it can proactively render the experience,” she said.

Ryan is the lead author of the paper Forecasting 3D Scanpaths in Egocentric Video, which she will present next week at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) in Denver.

While there is existing research on predicting user gaze from 2D still images, her work is the first to address the issue through a 3D framework.

“Because we live in a 3D world and people are dynamically moving around from multiple points of view, we need to predict gaze in 3D rather than 2D,” she said. “What we’re seeing is a path of the person’s attention in 3D through space. Our paper is the first to attempt to model this.”

Ryan conducted most of the research while interning at Meta, where she used data from Meta’s Aria Digital Twin dataset. The dataset contains first-person video footage of users interacting with objects in an apartment.

“We chose that dataset because it has a high-fidelity 3D reconstruction of a full environment, which helps us get a ground-truth 3D gaze,” she said. “We can trace eye movement and see how it intersects with the environment.”

A video demonstration of Ryan’s work shows her software tracking a user’s path toward a table with a cup on it. Once the user picks up the cup, the software correctly predicts the direction the user will turn next.

“When we look at a scene, we don’t take in everything in full detail all at once,” she said. “We fixate on certain areas, and our gaze is a sequence of fixations, which might depend on what we’re trying to do. If we want to pick up a cup, we might look toward that and then the next step would be looking at where we’re going to put it down.”

Ryan said the software can predict, on average, up to three seconds into the future — and as far as 10 seconds in some cases. That’s enough time for the AR system to proactively render a more enhanced environment.

“We’re not looking that far into the future right now, but it would be interesting to explore longer forecasting windows,” she said. “I think potential futures would diverge pretty quickly, so we’re trying to explore what can reasonably be predicted from a short segment of a person looking and moving through space.”

Ryan said her paper served as a proof-of-concept, and that there is still much future work to be done. She already has some ideas.

“I think future models can include different scenarios to help narrow down possibilities. Sometimes a person’s gaze stays on one thing for a long time. If we know what someone is trying to do, we’ll have a better idea of the likely path their attention might go.”

There could also be future implications for her work in robotics research.

“It could potentially be used for training algorithms for robots to emulate active human perception. If we can understand what a person looks at as they perform a task, we could use that to facilitate a robot learning to do that same task.” 

Status

  • Workflow status: Published
  • Created by: Nathan Deen
  • Created: 05/27/2026
  • Modified By: Nathan Deen
  • Modified: 05/27/2026

User Data