Categories
Misc

Keeping Games up to Date in the Cloud

GeForce NOW ensures your favorite games are automatically up to date, avoiding game updates and patches. Simply login, click PLAY, and enjoy an optimal cloud gaming experience. Here’s an overview on how the service keeps your library game ready at all times. Updating Games for All GeForce NOW Members When a gamer downloads an update Read article >

The post Keeping Games up to Date in the Cloud appeared first on The Official NVIDIA Blog.

Categories
Misc

Interactively Visualizing a DriveTime Radius from Any Point in the US

What is geospatial drive time? Geospatial analytics is an important part of real estate decisions for businesses, especially for retailers. There are many factors that go into deciding where to place a new store (demographics, competitors, traffic) and such a decision is often a significant investment. Retailers who understand these factors have an advantage over … Continued

What is geospatial drive time?

Geospatial analytics is an important part of real estate decisions for businesses, especially for retailers. There are many factors that go into deciding where to place a new store (demographics, competitors, traffic) and such a decision is often a significant investment. Retailers who understand these factors have an advantage over their competitors and can thrive. In this blog post, we’ll explore how RAPIDS’ cuDF, cuGraph, cuSpatial, and Plotly Dash with NVIDIA GPUs can be used to solve these complex geospatial analytics problems interactively.

Interactive dashboard in Plotly Dash to Interactively Visualizing a Drive Time Radius from Any Point in the US
Figure 1: Animation of Plotly Dashboard using cuDF, cuSpatial, and cuGraph. See our video of it in action. 

Let’s consider a retailer looking to pick the next location for a store, or in these times of a pandemic, delivery hub. After the retailer picks a candidate location, they consider a surrounding “drive-time trade area”, or more formally, isochrone. An isochrone is the resulting polygon if one starts at one geographic location and drives in all possible directions for a specified time.

Example image of an isochrone or the resulting polygon if one starts at one geographic location and drives in all possible directions for a specified time.
Figure 2: https://gis.stackexchange.com/questions/164156/travel-time-calculator-qgis

Why are isochrones sometimes used instead of “as the crow flies” (i.e. a circle)? In Retail, time is one of the most important factors when going to a store or shipping out a delivery. A location might be 2 miles from a customer, but due to dense traffic, it might take me 10 minutes to get there. Instead, it might be easier to hop on the highway and drive 5 miles but instead arrive in 5 minutes.

Isochrones are also more robust to the differences between urban, suburban and rural areas. If one uses an “as the crow flies” methodology and specifies a 5 mile radius, one might be including too many customers in an urban area or excluding customers in a less dense, rural area.

Once a retailer has calculated the isochrone, they can combine it with other data like demographics or competitor datasets to generate insights about that serviceable area. How many customers live in that isochrone? What is the average household income? How many of my competitors are in the area? Is that area a “food desert” (limited access to supermarkets, general affordable resources) or a “food oasis”? Is this area highly trafficked? Answers to these questions are incredibly valuable when making a decision about a potentially multi-million dollar investment.

However, demographics, competitor, and traffic analytics datasets can be large, diverse, and difficult to crunch – even more so if complex operations like geospatial analytics are involved. Additionally, these algorithms can be challenging to scale across large geographic areas like states or countries and even harder to interact with in real time.

By combining the accelerated compute power of RAPIDS cuDF, cuGraph, and cuSpatial with the interactivity of a Plotly Dash visualization application, we are able to transform this complicated problem into an application with a simple user interface.

Mocking up a GPU accelerated workflow

Let’s think about the algorithm. So how can one calculate a drive-time area? The general flow is as follows:

  • Pick a point
  • Build a network representing roads
  • Identify the node in that network that is closest to that point
  • Traverse that network using an SSSP (single source shortest path) algorithm and identify all the nodes within some distance
  • Create a bounding polygon from the furthest nodes
  • Append data like demographics and competitor information filtered by the point in polygon operations of the polygon area

However, the idealized workflow above needs to be translated into real world functionality. Like all data visualization projects, it will take a few iterations to dial in.

Sourcing data and testing the workflow

While there are large and highly current datasets available for this type of information, they are generally not publicly available or are very costly. For the sake of simplicity and accessibility of this demo, we will use open-source data. For added accessibility, we will also scope the application to run on a single GPU with around 24-32GB of memory.

We started out by prototyping the workflow using a notebook to create a PoC workflow to compute isochrones. Despite the broad data science tooling required to tackle this problem (from graph to geospatial), the ability of the different RAPIDS’ libraries to work easily in conjunction with one another without moving out of GPU memory greatly speeds up development.

Our initial workflow used the Overpass API for OpenStreetMap to download the open-source street data in a graph format. However, a query for a 25 mi radius took approximately 4-5 minutes to complete. For the end goal of making an interactive visualization application, this was way too long

The next step was to then pre-download the entire United States OpenStreetMap data, optimize it, and save it as a parquet file for quick loading by cuDF. With the isochrones workflow producing good results, we then moved on to adding demographic data.

We conveniently had formatted 2010 census data from a previous Plotly Dash Census Visualization available. All together, the datasets were as follows:

  • 2010 Census for Population (expanded to individuals) combined with the 2010 ACS for demographics (~2.9 GB)
  • US-street data ~240M nodes (~4.4 GB)
  • US-street data ~300M edges (~3.3 GB)

Combining and filtering datasets

We then prototyped the full workflow with the census data in another notebook. With such a large, combined dataset, computational spikes often resulted in OOM (out of memory) errors or in certain conditions took longer than ~ 10 seconds to complete. We want to enable a user to click on ANY point in the continental US and quickly get a response back. Yet, after that initial click interaction, only a small fraction of the total data set is needed. To increase speed and reduce memory spikes we had to encode some boundary assumptions:

  • For this use case, a retailer would not be interested in demographic information more than a 30 mi radius away from the point.
  • While datasets exist for more realistic road speeds (speed limit and traffic based), for simplicity we would assume an average speed.

After adding in these conditions, we were confident that we could port the notebook into a full Plotly Dash visualization dashboard.

Building an interactive dashboard with Plotly Dash

Why build an interactive dashboard? If we can already do the compute in a notebook, why should it be necessary to go through the effort of building a visualization application and optimize the compute to make it more interactive? Not everyone is familiar with using notebooks, but most can use a dashboard (especially those designed for ease of use). Furthermore, an interface that reduces the mental overhead and friction associated with doing what we actually want, asking questions of our data, encourages more experimentation and exploration. This leads to better, higher quality answers.

Sketched image used to help determine features and components of the visualization dashboard.
Figure 3: Skitch Mockup to define general functionality.

Starting out with a quick sketch mock-up (highly recommended for any visualization project) we went through several iterations of the dashboard to further optimize memory usage, reduce interaction times, and simplify the UI.

We found that by taking advantage of Dask for segmented data loading, we could drastically optimize loading times. Further optimization of cuSpatia’s PiP (point in polygon) and cuDF groupBy further reduced the compute times to around ~2-3 seconds per query in a sparsely populated area, and ~5-8 seconds in a densely populated area like New York City.

We also experimented with using quad-tree based point in polygon calculations for a 25-30% reduction in compute time, but because the current PiP only takes ~0.4 – 0.5 seconds in our optimized data format, the added memory overhead was not worth the speed up in this case.

For our final implementation, running a chain of operations to filter a radius of road data from the entire continental US to a selected point, load that data, find the nearest point on the road graph, compute the shortest path for the selected drive-time distance, create a polygon, and filter 300 million+ individual’s demographic data typically takes just 3-8 seconds!

Conclusion

Being able to click on any point in the continental US and, within seconds, compute both the drive time radius and demographic information within is impressive. Although this is only using limited open-source data for a generalized business use case, adapting this demo into a fully functional application would only require more precise data and a few minor optimizations.

While the performance of using GPUs is noteworthy, equally important is the speedup from using RAPIDS libraries together. Their ability to seamlessly provide end-to-end acceleration from notebook prototype to stand-alone visualization application enables interactive data science at the speed of thought.

Categories
Misc

NVIDIA Transforms Mainstream Laptops into Gaming Powerhouses with GeForce RTX 30 Series

Brings RTX Real-Time Ray Tracing and AI-Based DLSS to Tens of Millions More Gamers and Creators with $799 Portable PowerhousesSANTA CLARA, Calif., May 11, 2021 (GLOBE NEWSWIRE) — NVIDIA today …

Categories
Misc

Create in Record Time with New NVIDIA Studio Laptops from Dell, HP, Lenovo, Gigabyte, MSI and Razer

New NVIDIA Studio laptops from Dell, HP, Lenovo, Gigabyte, MSI and Razer were announced today as part of the record-breaking GeForce laptop launch. The new Studio laptops are powered by GeForce RTX 30 Series and NVIDIA RTX professional laptop GPUs, including designs with the new GeForce RTX 3050 Ti and 3050 laptop GPUs, and the Read article >

The post Create in Record Time with New NVIDIA Studio Laptops from Dell, HP, Lenovo, Gigabyte, MSI and Razer appeared first on The Official NVIDIA Blog.

Categories
Misc

My girlfriend left me due to an NLP Competition

Yep, seems odd but I am unsure how to take it now.

A little bit of background: I have been participating in Deep Learning related competitions for a pretty long time. I am 25 right now, started in the field when I was 20. Started from Keras, then PyTorch then eventually chose Tensorflow because that gave me an edge with GPU parallelization and every firm around my region uses TF / helping me get a better job.

I dropped out of college, left my degree that majored in statistics and realised that AI was something that I could learn without spending a buck. It worked out pretty well. I eventually applied to a Speakers Giant ( can’t name ofc ) for the position of a data scientist and they gave it to me. Which was pretty fricking nuts given that I was doing research in NLP on a scale that was not professional.

That jump gave me hope in life that I won’t die as a dork with new cash. I eventually got a girlfriend. She was my colleague there, she left tho, to work at another firm 5 months ago.

The downfall started when I came across this ML challenge at a website called AIcrowd. I had participated and won a ton at Kaggle but this was new. I started making submissions and boom there it was. The ping noise. The sheer tension between me and time as I rushed to make more submissions. They were giving the participants like $10000+ for winning this thing and I was not even focused on that. Four days went away before I actually got out of my room, these were my stretches. Separating these sounds and making submissions felt like sex. I had never seen anything like this before.

I stopped answering my girlfriend’s texts initially, I would complete office assignments at night. I pretended to have a throat allergy to avoid office calls. We had been dating for a year and she had always hated my instinct to participate in these challenges which she looked down upon. She felt that I did them for the quick money, when I did it for the growth in my skills (Well Kaggle paid okayish, this AIcrowd’s prizes are nuts tho).

She called me a couple of days ago, asked me what would I do if she left due to my obsession. I told her that let me tighten up my ranking and then we could talk. Lol she broke up over text. F*ck.

I am still on top of the challenge’s leader board tho.

W

submitted by /u/aichimp-101
[visit reddit] [comments]

Categories
Misc

Is it possible to vectorize (perform on batch) random cropping per element?

Nevermind, solved.

EDIT: Got it working, there were two bugs. First, I had mistakenly initialized batch_size twice during all my editing, and so I was mismatching between batch_size depending on where in the code I was. The second bug, which I still haven’t entirely fixed, is that if the batch size does not evenly divide the input generator the code fails, even though I have it set with a take amount that IS divisible by the batch_size. I get the same error even if I set steps per epoch to 1 (so it should never be reaching the end of the batches). I can only assume it’s an error during graph generation where it’s trying to define that last partial batch even though it will never be trained over. Hmm.

EDIT EDIT: Carefully following the size of my dataset throughout my pipeline, I discovered the source of the second issue, which is actually just that I didn’t delete my cache files when I previously had a larger take. The last thing I would still like to do is fix the code such that it actually CAN handle variable length batches so I don’t have to worry about making sure I don’t have partial batches. However, from what I can see, tf.unstack along variable length dimensions is straight up not supported, so this will require refactoring my computation to use some other method, like loops maybe. To be honest, though, it’s not worth my time to do so right now when I can just use drop_remainder = True and drop the last incomplete batch. In my real application there will be a lots of data per epoch, so losing 16 or so random examples from each epoch is rather minor.

So, I am making a project where I randomly crop images. In my data pipeline, I was trying to write code such that I could crop batches of my data at once, as the docs suggested that vectorizing my operations would reduce scheduling overhead.

However, I have run into some issues. If I use tf.image.random_crop, the problem is that the same random crop will be used on every image in the batch. I, however, want different random crops for every image. Moreover, since where I randomly crop an image will affect my labels, I need to track every random crop performed per image and adjust the label for that image.

I was able to write code that seems like it would work by using unstack, doing my operation per element, then restacking, like so:

images = tf.unstack( img, num=None, axis=0, name='unstack' ) xshifts = [] yshifts = [] newimages =[] for image in images: if not is_valid: x = np.random.randint(0, width - dx + 1) y = np.random.randint(0, height - dy + 1) else: x = 0 y = 0 newimages.append(image[ y:(y+dy), x:(x+dx), :]) print(image[ y:(y+dy), x:(x+dx), :]) xshifts.append((float(x) / img.shape[2]) * img_real_size) yshifts.append((float(y) / img.shape[1]) * img_real_size) images = tf.stack(newimages,0) 

But oh no! Whenever I use this code in a map function, it doesn’t work because in unstack I set num=None, which requires it to infer how much to unstack from the actual batch size. But because tensorflow for reasons decided that batches should have size none when specifying things, the code fails because you can’t infer size from None. If I patch the code to put in num=batch_size, it changes my datasets output signature to be hard coded to batch_size, which seems like it shouldn’t be a problem, except this happens.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape axis 0 must equal 2, got shape [1,448,448,3] [[{{node unstack}}]] 

Which is to say, it’s failing because instead of receiving the expected batch input with the appropriate batch_size (2 for testing), it’s receiving a single input image, and 2 does not equal 1. The documentation strongly implied to me that if I batch my dataset before mapping (which I do), then the map function should be receiving the entire batch and should therefore be vectorized. But is this not the case? I double checked my batch and dataset sizes to make sure that it isn’t just an error arising due to some smaller final batch.

To sum up: I want to crop my images uniquely per image and alter labels as I do so. I also want to do this per batch, not just per image. The code I wrote that does this requires me to unstack my batch. But unstacking my batch can’t have num=None. But tensorflow batches have shape none, so the input of my method has shape none at specification. But if I change unstack’s num argument to anything but None, it changes my output specification to that number (which isn’t None), and the output signature of my method must ALSO have shape none at specification. How can I get around this?

Or, if someone can figure out why my batched dataset, batched before my map function, is apparently feeding in single samples instead of full batches, that would also solve the mystery.

submitted by /u/Drinniol
[visit reddit] [comments]

Categories
Misc

Need help in installing tensorflow

I bought new macair M1 and was installing tensorflow on it, downloaded python3.8 using xcode-select-install but got error i between, “…arm64.whl” not supported, any help is appreciated.

submitted by /u/Anu_Rag9704
[visit reddit] [comments]

Categories
Misc

Meet the Researcher: Marco Aldinucci, Convergence of HPC and AI to Fight Against COVID

‘Meet the Researcher’ is a series in which we spotlight different researchers in academia who use NVIDIA technologies to accelerate their work.  This month we spotlight Marco Aldinucci, Full Professor at the University of Torino, Italy, whose research focuses on parallel programming models, language, and tools.  Since March 2021, Marco Aldinucci has been the Director … Continued

‘Meet the Researcher’ is a series in which we spotlight different researchers in academia who use NVIDIA technologies to accelerate their work. 

This month we spotlight Marco Aldinucci, Full Professor at the University of Torino, Italy, whose research focuses on parallel programming models, language, and tools

Since March 2021, Marco Aldinucci has been the Director of the brand new “HPC Key Technologies and Tools” national lab at the Italian National Interuniversity Consortium for Informatics (CINI), which affiliates researchers interested in HPC and cloud from 35 Italian universities. 

He is the recipient of the HPC Advisory Council University Award in 2011, the NVIDIA Research Award in 2013, and the IBM Faculty Award in 2015. He has participated in over 30 EU and national research projects on parallel computing, attracting over 6M€ of research funds to the University of Torino. He is also the founder of HPC4AI, the competence center on HPC-AI convergence that federates four labs in the two universities of Torino.

What are your research areas of focus?

I like to define myself as a pure computer scientist with a natural inclination for multi-disciplinarity. Parallel and High-Performance Computing is useful when applied to other scientific domains, such as chemistry, geology, physics, mathematics, medicine. It is, therefore, crucial for me to work with domain experts, and do so while maintaining my ability to delve into performance issues regardless of a particular application. To me, Artificial Intelligence is also a class of applications. 

When did you know that you wanted to be a researcher and wanted to pursue this field?

I was a curious child and I would say that discovering new things is simply the condition that makes me feel most satisfied. I got my MSc and Ph.D. at the University of Pisa, Italy’s first Computer Science department, established in the late 1960s. The research group on Parallel Computing was strong. As a Ph.D. student, I have deeply appreciated their approach to distilling and abstracting computational paradigms independent of the specific domain and, therefore, somehow universal. That is mesmerizing.

What motivated you to pursue your recent research area of focus in supercomputing and the fight against COVID?

People remember the importance of research only in times of need, but sometimes it’s too late. When COVID arrived, all of us researchers felt the moral duty to be the front-runners in investing our energy and our time well beyond regular working hours. When CLAIRE (“Confederation of Laboratories for Artificial Intelligence Research in Europe”) proposed I lead a task force of volunteer scientists to help develop tools to fight against COVID, I immediately accepted. It was the task force on the HPC plus AI-based classification of interstitial pneumonia. I presented the results in my talk at GTC ’21; they are pretty interesting.

** Click here to watch the presentation from GTC ’21, “The Universal Cloud-HPC Pipeline for the AI-Assisted Explainable Diagnosis of of COVID-19 Pneumonia“.

The CLAIRE-COVID19 universal pipeline, designed to compare different training algorithms to define a baseline for such techniques and to allow the community to quantitatively measure AI’s progress in the diagnosis of COVID-19 and similar diseases. [source]

What problems or challenges does your research address?

I am specifically interested in what I like to call “the modernization of HPC applications,” which is the convergence of HPC and AI, but also all the methodologies needed to build portable HPC applications running on the compute continuum, from HPC to cloud to edge. The portability of applications and performances is a severe issue for traditional HPC programming models.   

In the long term, writing scalable parallel programs that are efficient, portable, and correct must be no more onerous than writing sequential programs. To date, parallel programming has not embraced much more than low-level libraries, which often require the application’s architectural redesign. In the hierarchy of abstractions, they are only slightly above toggling absolute binary in the machine’s front panel. This approach cannot effectively scale to support the mainstream of software development where human productivity, total cost, and time to the solution are equally, if not more, important aspects. Modern AI toolkits and their cloud interfaces represent a tremendous modernization opportunity because they contaminate the small world of MPI and the batch jobs with new concepts: modular design, the composition of services, segregation of effects and data, multi-tenancy, rapid prototyping, massive GPU exploitation, interactive interfaces. After the integration with AI, HPC will not be what it used to be.

What challenges did you face during the research process, and how did you overcome them?

Technology is rapidly evolving, and keeping up with the new paradigms and innovative products that appear almost daily is the real challenge. Having a solid grounding in computer science and math is essential to being an HPC and AI researcher in this ever-changing world. Technology evolves every day, but revolutions are rare events. I happen to say to my students: it doesn’t matter how many programming languages ​​you know; the problem is how much effort is needed to learn the next.

How is your work impacting the community?

I have always imagined the HPC realm as organized around three pillars: 1) infrastructures, 2) enabling technologies for computing, and 3) applications. We were strong on infrastructures and applications in Italy, but excellencies in technologies for computing were spread around different universities as leopard spots. 

For this, we recently started a new national laboratory called “HPC Key technologies and Tools” (HPC-KTT). I am the founding Director. HPC-KTT co-affiliates hundreds of researchers from 35 Italian universities to reach the critical mass to impact international research with our methods and tools. In the first year of activity, we gathered EU research projects in competitive calls for a total cost 95M€ (ADMIRE, ACROSS, TEXTAROSSA, EUPEX, The European Pilot). We have just started, more information can be found in:

M. Aldinucci et al, “The Italian research on HPC key technologies across EuroHPC,” in ACM Computing Frontiers, Virtual Conference, Italy, 2021. doi:10.1145/3457388.3458508

What are some of your most proudest breakthroughs?

I routinely use both NVIDIA hardware and software. Most of the important research results I recently achieved in multi-disciplinary teams have been achieved thanks to NVIDIA technologies that are capable to accelerate machine learning tasks. Among recent results, I can mention a couple of important papers that appeared on “the Lancet” and “Medical image analysis“, but also HPC4AI (High-Performance Computing for Artificial Intelligence), a new data center I started at my university. HPC4AI runs an OpenStack cloud with almost 5000 cores, 100 GPUs (V100/T4), and six different storage systems. HPC4AI is the living laboratory where researchers and students of the University of Torino understand how to build performant data-centric applications across the entire HPC-cloud stack, from bare metal configuration to algorithms to services.

What’s next for your research?

We are working on two new pieces of software: a next-generation Workflow Management System called StreamFlow and CAPIO (Cross-Application Programmable I/O), a system for fast data transfer between parallel applications with support for parallel in-transit data filtering. We can sue them separately, but together, they express the most significant potential.

StreamFlow enables the design of workflows and pipelines portable across the cloud (on Kubernetes), HPC systems (via SLURM/PBS), or across them. It adopts an open standard interface (CWL) to describe the data dependencies among workflow steps but separate deployment instructions, making it possible to re-deploy the same containerized code onto different platforms. Using StreamFlow, we make the CLAIRE COVID universal pipeline” and QuantumESPRESSO almost everywhere, from a single NVIDIA DGX Station to the CINECA MARCONI100 supercomputer (11th in the TOP500 List — NVIDIA GPUs, and dual-rail Mellanox EDR InfiniBand), and across them. And they are quite different systems. 

CAPIO, which is still under development, aims at efficiently (in parallel, in memory) moving data across different steps of the pipelines. The nice design feature of CAPIO is that it turns files into streams across applications without requiring code changes. It supports parallel and programmable in-transit data filtering. The essence of many AI pipelines is moving a lot of data around the system; we are embracing the file system interface to get compositionality and segregation. We do believe we will get performance as well.

Any advice for new researchers, especially to those who are inspired and motivated by your work? 

Contact us, we are hiring.

Categories
Misc

cuSOLVERMp v0.0.1 Now Available: Through Early Access

cuSOLVERMp provides a distributed-memory multi-node and multi-GPU solution for solving systems of linear equations at scale! In the future, it will also solve eigenvalue and singular value problems.

Today, cuSOLVERMp version 0.0.1 is now available at no charge for members of the NVIDIA Developer Program.

Download Now

What’s New

  • Support for LU solver, with and with pivoting.
  • The Early Access release targets P9 + IBM’s Spectrum MPI

About cuSOLVERMp

cuSOLVERMp provides a distributed-memory multi-node and multi-GPU solution for solving systems of linear equations at scale! In the future, it will also solve eigenvalue and singular value problems.

Future releases will be hosted in the HPC SDK. It will provide additional functionality and support for x86_64 + OpenMPI.

Learn more:

GTC 2021: S31754 Recent Developments in NVIDIA Math Libraries

GTC 2021: S31286 A Deep Dive into the Latest HPC Software

GTC 2021: CWES1098 Tensor Core-Accelerated Math Libraries for Dense and Sparse Linear Algebra in AI and HPC

Blog post coming soon!

Categories
Offsites

Accelerating Eye Movement Research for Wellness and Accessibility

Eye movement has been studied widely across vision science, language, and usability since the 1970s. Beyond basic research, a better understanding of eye movement could be useful in a wide variety of applications, ranging across usability and user experience research, gaming, driving, and gaze-based interaction for accessibility to healthcare. However, progress has been limited because most prior research has focused on specialized hardware-based eye trackers that are expensive and do not easily scale.

In “Accelerating eye movement research via accurate and affordable smartphone eye tracking”, published in Nature Communications, and “Digital biomarker of mental fatigue”, published in npj Digital Medicine, we present accurate, smartphone-based, ML-powered eye tracking that has the potential to unlock new research into applications across the fields of vision, accessibility, healthcare, and wellness, while additionally providing orders-of-magnitude scaling across diverse populations in the world, all using the front-facing camera on a smartphone. We also discuss the potential use of this technology as a digital biomarker of mental fatigue, which can be useful for improved wellness.

Model Overview
The core of our gaze model was a multilayer feed-forward convolutional neural network (ConvNet) trained on the MIT GazeCapture dataset. A face detection algorithm selected the face region with associated eye corner landmarks, which were used to crop the images down to the eye region alone. These cropped frames were fed through two identical ConvNet towers with shared weights. Each convolutional layer was followed by an average pooling layer. Eye corner landmarks were combined with the output of the two towers through fully connected layers. Rectified Linear Units (ReLUs) were used for all layers except the final fully connected output layer (FC6), which had no activation.

Architecture of the unpersonalized gaze model. Eye regions, extracted from a front-facing camera image, serve as input into a convolutional neural network. Fully-connected (FC) layers combine the output with eye corner landmarks to infer gaze x– and y-locations on screen via a multi-regression output layer.

The unpersonalized gaze model accuracy was improved by fine-tuning and per-participant personalization. For the latter, a lightweight regression model was fitted to the model’s penultimate ReLU layer and participant-specific data.

Model Evaluation
To evaluate the model, we collected data from consenting study participants as they viewed dots that appeared at random locations on a blank screen. The model error was computed as the distance (in cm) between the stimulus location and model prediction. Results show that while the unpersonalized model has high error, personalization with ~30s of calibration data led to an over fourfold error reduction (from 1.92 to 0.46cm). At a viewing distance of 25-40 cm, this corresponds to 0.6-1° accuracy, a significant improvement over the 2.4-3° reported in previous work [1, 2].

Additional experiments show that the smartphone eye tracker model’s accuracy is comparable to state-of-the-art wearable eye trackers both when the phone is placed on a device stand, as well as when users hold the phone freely in their hand in a near frontal headpose. In contrast to specialized eye tracking hardware with multiple infrared cameras close to each eye, running our gaze model using a smartphone’s single front-facing RGB camera is significantly more cost effective (~100x cheaper) and scalable.

Using this smartphone technology, we were able to replicate key findings from prior eye movement research in neuroscience and psychology, including standard oculomotor tasks (to understand basic visual functioning in the brain) and natural image understanding. For example, in a simple prosaccade task, which tests a person’s ability to quickly move their eyes towards a stimulus that appears on the screen, we found that the average saccade latency (time to move the eyes) matches prior work for basic visual health (210ms versus 200-250ms). In controlled visual search tasks, we were able to replicate key findings, such as the effect of target saliency and clutter on eye movements.

Example gaze scanpaths show the effect of the target’s saliency (i.e., color contrast) on visual search performance. Fewer fixations are required to find a target (left) with high saliency (different from the distractors), while more fixations are required to find a target (right) with low saliency (similar to the distractors).

For complex stimuli, such as natural images, we found that the gaze distribution (computed by aggregating gaze positions across all participants) from our smartphone eye tracker are similar to those obtained from bulky, expensive eye trackers that used highly controlled settings, such as laboratory chin rest systems. While the smartphone-based gaze heatmaps have a broader distribution (i.e., they appear more “blurred”) than hardware-based eye trackers, they are highly correlated both at the pixel level (r = 0.74) and object level (r = 0.90). These results suggest that this technology could be used to scale gaze analysis for complex stimuli such as natural and medical images (e.g., radiologists viewing MRI/PET scans).

Similar gaze distribution from our smartphone approach vs. a more expensive (100x) eye tracker (from the OSIE dataset).

We found that smartphone gaze could also help detect difficulty with reading comprehension. Participants reading passages spent significantly more time looking within the relevant excerpts when they answered correctly. However, as comprehension difficulty increased, they spent more time looking at the irrelevant excerpts in the passage before finding the relevant excerpt that contained the answer. The fraction of gaze time spent on the relevant excerpt was a good predictor of comprehension, and strongly negatively correlated with comprehension difficulty (r = −0.72).

Digital Biomarker of Mental Fatigue
Gaze detection is an important tool to detect alertness and wellbeing, and is studied widely in medicine, sleep research, and mission-critical settings such as medical surgeries, aviation safety, etc. However, existing fatigue tests are subjective and often time-consuming. In our recent paper published in npj Digital Medicine, we demonstrated that smartphone gaze is significantly impaired with mental fatigue, and can be used to track the onset and progression of fatigue.

A simple model predicts mental fatigue reliably using just a few minutes of gaze data from participants performing a task. We validated these findings in two different experiments — using a language-independent object-tracking task and a language-dependent proofreading task. As shown below, in the object-tracking task, participants’ gaze initially follows the object’s circular trajectory, but under fatigue, their gaze shows high errors and deviations. Given the pervasiveness of phones, these results suggest that smartphone-based gaze could provide a scalable, digital biomarker of mental fatigue.

Example gaze scanpaths for a participant with no fatigue (left) versus with mental fatigue (right) as they track an object following a circular trajectory.
The corresponding progression of fatigue scores (ground truth) and model prediction as a function of time on task.

Beyond wellness, smartphone gaze could also provide a digital phenotype for screening or monitoring health conditions such as autism spectrum disorder, dyslexia, concussion and more. This could enable timely and early interventions, especially for countries with limited access to healthcare services.

Another area that could benefit tremendously is accessibility. People with conditions such as ALS, locked-in syndrome and stroke have impaired speech and motor ability. Smartphone gaze could provide a powerful way to make daily tasks easier by using gaze for interaction, as recently demonstrated with Look to Speak.

Ethical Considerations
Gaze research needs careful consideration, including being mindful of the correct use of such technology — applications should obtain explicit approval and fully informed consent from users for the specific task at hand. In our work, all data was collected for research purposes with users’ explicit approval and consent. In addition, users were allowed to opt out at any point and request their data to be deleted. We continue to research additional ways to ensure ML fairness and improve the accuracy and robustness of gaze technology across demographics, in a responsible, privacy-preserving way.

Conclusion
Our findings of accurate and affordable ML-powered smartphone eye tracking offer the potential for orders-of-magnitude scaling of eye movement research across disciplines (e.g., neuroscience, psychology and human-computer interaction). They unlock potential new applications for societal good, such as gaze-based interaction for accessibility, and smartphone-based screening and monitoring tools for wellness and healthcare.

Acknowledgements
This work involved collaborative efforts from a multidisciplinary team of software engineers, researchers, and cross-functional contributors. We’d like to thank all the co-authors of the papers, including our team members, Junfeng He, Na Dai, Pingmei Xu, Venky Ramachandran; interns, Ethan Steinberg, Kantwon Rogers, Li Guo, and Vincent Tseng; collaborators, Tanzeem Choudhury; and UXRs: Mina Shojaeizadeh, Preeti Talwai, and Ran Tao. We’d also like to thank Tomer Shekel, Gaurav Nemade, and Reena Lee for their contributions to this project, and Vidhya Navalpakkam for her technical leadership in initiating and overseeing this body of work.