Categories
Offsites

World scale inverse reinforcement learning in Google Maps

Routing in Google Maps remains one of our most helpful and frequently used features. Determining the best route from A to B requires making complex trade-offs between factors including the estimated time of arrival (ETA), tolls, directness, surface conditions (e.g., paved, unpaved roads), and user preferences, which vary across transportation mode and local geography. Often, the most natural visibility we have into travelers’ preferences is by analyzing real-world travel patterns.

Learning preferences from observed sequential decision making behavior is a classic application of inverse reinforcement learning (IRL). Given a Markov decision process (MDP) — a formalization of the road network — and a set of demonstration trajectories (the traveled routes), the goal of IRL is to recover the users’ latent reward function. Although past research has created increasingly general IRL solutions, these have not been successfully scaled to world-sized MDPs. Scaling IRL algorithms is challenging because they typically require solving an RL subroutine at every update step. At first glance, even attempting to fit a world-scale MDP into memory to compute a single gradient step appears infeasible due to the large number of road segments and limited high bandwidth memory. When applying IRL to routing, one needs to consider all reasonable routes between each demonstration’s origin and destination. This implies that any attempt to break the world-scale MDP into smaller components cannot consider components smaller than a metropolitan area.

To this end, in “Massively Scalable Inverse Reinforcement Learning in Google Maps“, we share the result of a multi-year collaboration among Google Research, Maps, and Google DeepMind to surpass this IRL scalability limitation. We revisit classic algorithms in this space, and introduce advances in graph compression and parallelization, along with a new IRL algorithm called Receding Horizon Inverse Planning (RHIP) that provides fine-grained control over performance trade-offs. The final RHIP policy achieves a 16–24% relative improvement in global route match rate, i.e., the percentage of de-identified traveled routes that exactly match the suggested route in Google Maps. To the best of our knowledge, this represents the largest instance of IRL in a real world setting to date.

Google Maps improvements in route match rate relative to the existing baseline, when using the RHIP inverse reinforcement learning policy.

The benefits of IRL

A subtle but crucial detail about the routing problem is that it is goal conditioned, meaning that every destination state induces a slightly different MDP (specifically, the destination is a terminal, zero-reward state). IRL approaches are well suited for these types of problems because the learned reward function transfers across MDPs, and only the destination state is modified. This is in contrast to approaches that directly learn a policy, which typically require an extra factor of S parameters, where S is the number of MDP states.

Once the reward function is learned via IRL, we take advantage of a powerful inference-time trick. First, we evaluate the entire graph’s rewards once in an offline batch setting. This computation is performed entirely on servers without access to individual trips, and operates only over batches of road segments in the graph. Then, we save the results to an in-memory database and use a fast online graph search algorithm to find the highest reward path for routing requests between any origin and destination. This circumvents the need to perform online inference of a deeply parameterized model or policy, and vastly improves serving costs and latency.

Reward model deployment using batch inference and fast online planners.

Receding Horizon Inverse Planning

To scale IRL to the world MDP, we compress the graph and shard the global MDP using a sparse Mixture of Experts (MoE) based on geographic regions. We then apply classic IRL algorithms to solve the local MDPs, estimate the loss, and send gradients back to the MoE. The worldwide reward graph is computed by decompressing the final MoE reward model. To provide more control over performance characteristics, we introduce a new generalized IRL algorithm called Receding Horizon Inverse Planning (RHIP).

IRL reward model training using MoE parallelization, graph compression, and RHIP.

RHIP is inspired by people’s tendency to perform extensive local planning (“What am I doing for the next hour?”) and approximate long-term planning (“What will my life look like in 5 years?”). To take advantage of this insight, RHIP uses robust yet expensive stochastic policies in the local region surrounding the demonstration path, and switches to cheaper deterministic planners beyond some horizon. Adjusting the horizon H allows controlling computational costs, and often allows the discovery of the performance sweet spot. Interestingly, RHIP generalizes many classic IRL algorithms and provides the novel insight that they can be viewed along a stochastic vs. deterministic spectrum (specifically, for H=∞ it reduces to MaxEnt, for H=1 it reduces to BIRL, and for H=0 it reduces to MMP).

Given a demonstration from so to sd, (1) RHIP follows a robust yet expensive stochastic policy in the local region surrounding the demonstration (blue region). (2) Beyond some horizon H, RHIP switches to following a cheaper deterministic planner (red lines). Adjusting the horizon enables fine-grained control over performance and computational costs.

Routing wins

The RHIP policy provides a 15.9% and 24.1% lift in global route match rate for driving and two-wheelers (e.g., scooters, motorcycles, mopeds) relative to the well-tuned Maps baseline, respectively. We’re especially excited about the benefits to more sustainable transportation modes, where factors beyond journey time play a substantial role. By tuning RHIP’s horizon H, we’re able to achieve a policy that is both more accurate than all other IRL policies and 70% faster than MaxEnt.

Our 360M parameter reward model provides intuitive wins for Google Maps users in live A/B experiments. Examining road segments with a large absolute difference between the learned rewards and the baseline rewards can help improve certain Google Maps routes. For example:

Nottingham, UK. The preferred route (blue) was previously marked as private property due to the presence of a large gate, which indicated to our systems that the road may be closed at times and would not be ideal for drivers. As a result, Google Maps routed drivers through a longer, alternate detour instead (red). However, because real-world driving patterns showed that users regularly take the preferred route without an issue (as the gate is almost never closed), IRL now learns to route drivers along the preferred route by placing a large positive reward on this road segment.

Conclusion

Increasing performance via increased scale – both in terms of dataset size and model complexity – has proven to be a persistent trend in machine learning. Similar gains for inverse reinforcement learning problems have historically remained elusive, largely due to the challenges with handling practically sized MDPs. By introducing scalability advancements to classic IRL algorithms, we’re now able to train reward models on problems with hundreds of millions of states, demonstration trajectories, and model parameters, respectively. To the best of our knowledge, this is the largest instance of IRL in a real-world setting to date. See the paper to learn more about this work.

Acknowledgements

This work is a collaboration across multiple teams at Google. Contributors to the project include Matthew Abueg, Oliver Lange, Matt Deeds, Jason Trader, Denali Molitor, Markus Wulfmeier, Shawn O’Banion, Ryan Epp, Renaud Hartert, Rui Song, Thomas Sharp, Rémi Robert, Zoltan Szego, Beth Luan, Brit Larabee and Agnieszka Madurska.

We’d also like to extend our thanks to Arno Eigenwillig, Jacob Moorman, Jonathan Spencer, Remi Munos, Michael Bloesch and Arun Ahuja for valuable discussions and suggestions.

Categories
Misc

NVIDIA Lends Support to Washington’s Efforts to Ensure AI Safety

In an event at the White House today, NVIDIA announced support for voluntary commitments that the Biden Administration developed to ensure advanced AI systems are safe, secure and trustworthy. The news came the same day NVIDIA’s chief scientist, Bill Dally, testified before a U.S. Senate subcommittee seeking input on potential legislation covering generative AI. Separately, Read article >

Categories
Misc

Mobility Gets Amped: IAA Show Floor Energized by Surge in EV Reveals, Generative AI

Generative AI’s transformative effect on the auto industry took center stage last week at the International Motor Show Germany, known as IAA, in Munich. NVIDIA’s Danny Shapiro, VP of automotive marketing, explained in his IAA keynote how this driving force is accelerating innovation and streamlining processes — from advancing design, engineering and digital-twin deployment for Read article >

Categories
Misc

Generative AI and Accelerated Computing for Spear Phishing Detection

Spear phishing is the largest and most costly form of cyber threat, with an estimated 300,000 reported victims in 2021 representing $44 million in reported…

Spear phishing is the largest and most costly form of cyber threat, with an estimated 300,000 reported victims in 2021 representing $44 million in reported losses in the United States alone. Business e-mail compromises led to $2.4 billion in costs in 2021, according to the FBI Internet Crime Report. In the period from June 2016 to December 2021, costs related to phishing and spear phishing totaled $43 billion for businesses, according to IBM Security Cost of a Data Breach

Spear phishing e-mails are indistinguishable from a benign e-mail that a victim would receive. This is also why traditional classification of spear phishing e-mails is so difficult. The content difference between a scam and a legitimate e-mail can be minuscule. Often, the only difference between the two is the intent of the sender: is the invoice legitimate, or is it a scam? 

This post details a two-fold approach to improve spear phishing detection by boosting the signals of intent using NVIDIA Morpheus to run data processing and inferencing.

Generating e-mails with new phishing intent

The first step involves using generative AI to create large, varied corpora of e-mails with various intents associated with spear phishing and scams. As new threats emerge, the NVIDIA Morpheus team uses the NVIDIA NeMo framework to generate a new corpus of e-mails with such threats. Following the generation of new e-mails with the new type of phishing intent, the team trains a new language model to recognize the intent. In traditional phishing detection mechanisms, such models would require a significant number of human-labeled e-mails.

Diagram showing an overview of the spear phishing detection methodology. AI- generated e-mails with specific intents are used to train intent models that label incoming user e-mails. These labels are joined with past sender behavior (if any) and e-mail metadata to classify the e-mail as spear phishing or not.
Figure 1. Overview of the spear phishing detection methodology

Detecting sender intent

The first step targets the intent behind the e-mail. The next step targets the intent of the sender. To defend against spear phishing attacks that use spoofing, known senders, or longer cons that do not express their true intent immediately, we construct additional signals by building up behavioral sketches from senders or groups of senders. 

Building on the intent work described above, known senders’ past observed intents are recorded. For example, the first time a known sender asks for money can be a signal to alert the user. 

Syntax usage is also observed and recorded. The syntax of new e-mails is compared to the syntax history of the sender. A deviation from the observed syntax could indicate a possible spoofing attack. 

Finally, the temporal patterns of a sender’s e-mails are collected and cross-referenced when a new e-mail arrives to check for out-of-pattern behavior. Is the sender sending an e-mail for the first time at midnight on a Saturday? If so, that becomes a signal in the final prediction. These signals in aggregate are used to classify e-mails. They are also presented to the end user as an explanation for why an e-mail may be malicious.

Adapting to new attacks and improving protection

Existing machine learning (ML) methods rely nearly entirely on human-labeled data and cannot adapt to emerging threats quickly. The biggest benefit to detecting spear phishing e-mails using the approach presented here is how quickly the model can be adapted to new attacks. When a new attack emerges, generative AI is leveraged to create a training corpus for the attack. Intent models are trained to detect its presence in received e-mails. 

Using models built with NeMo generates thousands of high-quality, on-topic e-mails in just a few hours. The new intents are added to the existing spear phishing detector. The entire end-to-end workflow of creating new phishing attack e-mails and updating the existing models happens in less than 24 hours. Once the models are in place, ‌e-mail processing and inferencing become a Morpheus pipeline to provide near real-time protection against spear phishing threats.

Results

To illustrate the flexibility of this approach, a model was trained using only money, banking, and personal identifying information (PII) intents. Next, cryptocurrency-flavored phishing e-mails were generated using models built with NeMo. These e-mails were incorporated into the original training and validation subsets. 

The validation set, now containing the new crypto attacks, was then passed into the original model. Then a second model was trained incorporating the crypto attack intents. Figure 2 shows how the models compare in their detection. 

After training for the attack, the F1 score increased from 0.54 to 0.89 (Figure 3). This illustrates how quickly new attacks can be trained for and adapted to using NVIDIA Morpheus and NeMo.

Chart showing how the models compare in their detection.
Figure 2. Differences in detection between an untrained model and the model trained for a cryptocurrency-based spear phishing attack
Chart showing the F1-score difference between the models
Figure 3. F1-score difference between an untrained model and the model trained for a cryptocurrency-based spear phishing attack

Get started with NVIDIA Morpheus

Watch the video, Improve Spear Phishing Detection with Generative AI for more details. Learn more about how to use NVIDIA Morpheus to detect spear phishing e-mails faster and with greater accuracy using the NVIDIA AI workflow example. You can also apply to try NVIDIA Morpheus in LaunchPad and request a 90-day free trial to test drive NVIDIA Morpheus, part of the NVIDIA AI Enterprise software family.

Categories
Misc

Event: RecSys at Work: Best Practices and Insights

An illustration showing different scenes with recommender systems in action.On Sept. 27, join us to learn recommender systems best practices for building, training, and deploying at any scale.An illustration showing different scenes with recommender systems in action.

On Sept. 27, join us to learn recommender systems best practices for building, training, and deploying at any scale.

Categories
Misc

A Quantum Boost: cuQuantum With PennyLane Lets Simulations Ride Supercomputers

Ten miles in from Long Island’s Atlantic coast, Shinjae Yoo is revving his engine. The computational scientist and machine learning group lead at the U.S. Department of Energy’s Brookhaven National Laboratory is one of many researchers gearing up to run quantum computing simulations on a supercomputer for the first time, thanks to new software. Yoo’s Read article >

Categories
Misc

Selecting the Right Camera for the NVIDIA Jetson and Other Embedded Systems

The camera module is the most integral part of an AI-based embedded system. With so many camera module choices on the market, the selection process may seem…

The camera module is the most integral part of an AI-based embedded system. With so many camera module choices on the market, the selection process may seem overwhelming. This post breaks down the process to help make the right selection for an embedded application, including the NVIDIA Jetson.

Camera selection considerations

Camera module selection involves consideration of three key aspects: sensor, interface (connector), and optics. 

Sensor 

The two main types of electronic image sensors are the charge-coupled device (CCD) and the active-pixel sensor (CMOS). For a CCD sensor, pixel values can only be read on a per-row basis. Each row of pixels is shifted, one by one, into a readout register. For a CMOS sensor, each pixel can be read individually and in parallel. 

CMOS is less expensive and consumes less energy without sacrificing image quality, in most cases. It can also achieve higher frame rates due to the parallel readout of pixel values. However, there are some specific scenarios in which CCD sensors still prevail—for example, when long exposure is necessary and very low-noise images are required, such as in astronomy. 

Electronic shutter 

There are two options for the electronic shutter: global or rolling. A global shutter exposes each pixel to incoming light at the same time. A rolling shutter exposes the pixel rows in a certain order (top to bottom, for example) and can cause distortion (Figure 1).

Two images of a helicopter showing distortion of moving blades caused by rolling shutter.
Figure 1. Distortion of rotor blades caused by rolling shutter

The global shutter is not impacted by motion blur and distortion due to object movement. It is much easier to sync multiple cameras with a global shutter because there is a single point in time when exposure starts. However, sensors with a global shutter are much more expensive than those with a rolling shutter. 

Color or monochrome 

In most cases, a monochrome image sensor is sufficient for typical machine vision tasks like fault detection, presence monitoring, and recording measurements.

With a monochrome sensor, each pixel is usually described by eight bits. With a color sensor, each pixel has eight bits for the red channel, eight bits for the green channel, and eight bits for the blue channel. The color sensor requires processing three times the amount of data, resulting in a higher processing time and, consequently, a slower frame rate.  

Dynamic range 

Dynamic range is the ratio between the maximum and minimum signal that is acquired by the sensor. At the upper limit, pixels appear white for higher values of intensity (saturation), while pixels appear black at the lower limit and below. An HDR of at least 80db is needed for indoor application and up to 140db is needed for outdoor application. 

Resolution 

Resolution is a sensor’s ability to reproduce object details. It can be influenced by factors such as the type of lighting used, the sensor pixel size, and the capabilities of the optics. The smaller the object detail, the higher the required resolution. 

Pixel resolution translates to how many millimeters each pixel is equal to on the image. The higher the resolution, the sharper your image will be. The camera or sensor’s resolution should enable coverage of a feature’s area of at least two pixels. 

CMOS sensors with high resolutions tend to have low frame rates. While a sensor may achieve the resolution you need, it will not capture the quality images you need without achieving enough frames per second. It is important to evaluate the speed of the sensor. 

A general rule of thumb to determine the resolution needed for the use case is shown below and in Figure 2.  The multiplier (2) represents the typical desire to have a minimum two pixels on an object in order to successfully detect it.

Resolution = 2times frac{Field  of  View (FOV)}{Size  of  feature  of  interest}

Diagram showing the representation of a person and the working distance from an object as an example of minimum object feature size of interest in the field of view.
Figure 2. Sensor resolution required is determined by lens field of view and feature of interest size

For example, suppose you have an image of an injury around the eye of a boxer. 

  • Resolution= 2times frac{2000}{4}
  • FOV, mm = 2000mm 
  • Size of feature of interest (the eye), mm = 4mm

Based on the calculation, 1000 x 1000, a one-megapixel camera should be sufficient to detect the eye using a CV or AI algorithm. 

Note that a sensor is made up of multiple rows of pixels. These pixels are also called photosites. The number of photons collected by a pixel is directly proportional to the size of the pixel. Selecting a larger pixel may seem tempting but may not be the optimal choice in all the cases. 

Small pixel  Sensitive to noise (-)  Higher spatial resolution for same sensor size (+) 
Large pixel  Less sensitive to noise (+)  Less spatial resolution for same sensor size (-) 
Table 1.  Pros and cons of small and large pixel size

Back-illuminated sensors maximize the amount of light being captured and converted by each photodiode. In front-illuminated sensors, metal wiring above the photodiodes blocks off some photons, hence reducing the amount of light captured.

On the left, a diagram of a front-illuminated structure with substrate, photodiodes, metal wiring, and microlenses. On the right, a diagram of a back-illuminated structure with metal wiring, photodiodes, and microlenses.
Figure 3. Cross-section of a front-illuminated structure (left) and a back-illuminated structure (right)

Frame rate and shutter speed 

The frame rate refers to the number of frames (or images captured) per second (FPS). The frame rate should be determined based on the number of inspections required per second. This correlates with the shutter speed (or exposure time), which is the time that the camera sensor is exposed to capture the image. 

Theoretically, the maximum frame rate is equal to the inverse of the exposure time. But achievable FPS is lower because of latency introduced by frame readout, sensor resolution, and the data transfer rate of the interface including cabling. 

FPS can be increased by reducing the need for large exposure times by adding additional lighting, binning the pixels. 

CMOS sensors can achieve higher FPS, as the process of reading out each pixel can be done more quickly than with the charge transfer in a CCD sensor’s shift register. 

Interface

There are multiple ways to connect the camera module to an embedded system. Typically, for evaluation purposes, cameras with USB and Ethernet interfaces are used because custom driver development is not needed. 

Other important parameters for interface selection are transmission length, data rate, and operating conditions. Table 2 lists the most popular interfaces. Each option has its pros and cons. 

Features  USB 3.2  Ethernet (1 GbE)  MIPI CSI-2  GMSL2  FPDLINK III 
Bandwidth  10Gbps  1Gbps  DPHY 2.5 Gbps/lane CPHY 5.71 Gbps/lane  6Gbps  4.2Gbps 
Cable length supported  Up to 100m 
Plug-and-play  Supported  Supported  Not supported  Not supported  Not supported 
Development costs  Low  Low  Medium to high  Medium to high  Medium to high 
Operating environment  Indoor  Indoor  Indoor  Indoor and outdoor  Indoor and outdoor 
Table 2. Comparison of various camera interfaces

Optics 

The basic purpose of an optical lens is to collect the light scattered by an object and recreate an image of the object on a light-sensitive image sensor (CCD or CMOS). The following factors should be considered when selecting an optimized lens-focal length, sensor format, field of view, aperture, chief ray angle, resolving power, and distortion. 

Lenses are manufactured with a limited number of standard focal lengths. Common lens focal lengths include 6mm, 8mm, 12.5mm, 25mm, and 50mm. 

Once you choose a lens with a focal length closest to the focal length required by your imaging system, you need to adjust the working distance to get the object under inspection in focus. Lenses with short focal lengths (less than 12mm) produce images with a significant amount of distortion. 

If your application is sensitive to image distortion, try to increase the working distance and use a lens with a higher focal length. If you cannot change the working distance, you are somewhat limited in choosing an optimized lens. 

  Wide-angle lens  Normal lens  Telephoto lens 
Focal length  50mm >=70mm 
Use case  Nearby scenes  Same as human eye  Far-away scenes 
Table 3. Main types of camera lenses

To attach a lens to a camera requires some type of mounting system. Both mechanical stability (a loose lens will deliver an out-of-focus image) and the distance to the sensor must be defined. 

To ensure compatibility between different lenses and cameras, the following standard lens mounts are defined. 

  Most popular For industrial applications
Lens mount M12/S mount C-mount
Flange focal length Non-standard 17.526mm
Threads (per mm) 0.5  0.75 
Sensor size accommodated (inches) Up to ⅔ Up to 1
Table 4. Common lens mounts used in embedded space

NVIDIA camera module partners 

NVIDIA maintains a rich ecosystem of partnerships with highly competent camera module makers all over the world. See Jetson Partner Supported Cameras for details. These partners can help you design imaging systems for your application from concept to production for the NVIDIA Jetson

Graphic showing NVIDIA Jetson with camera modules for various use cases and industries.
Figure 4. NVIDIA Jetson in combination with camera modules can be used across industries for various needs

Summary

This post has explained the most important camera characteristics to consider when selecting a camera for an embedded application. Although the selection process may seem daunting, the first step is to understand your key constraints based on design, performance, environment, and cost. 

Once you understand the constraints, then focus on the characteristics most relevant to your use case. For example, if the camera will be deployed away from the compute or in a rugged environment, consider using the GMSL interface. If the camera will be used in low-light conditions, consider a camera module with larger pixel and sensor sizes. If the camera will be used in a motion application, consider using a camera with a global shutter. 

To learn more, watch Optimize Your Edge Application: Unveiling the Right Combination of Jetson Processors and Cameras. For detailed specs on AI performance, GPU, CPU, and more for both Xavier and Orin-based Jetson modules, visit Jetson Modules

Categories
Misc

One Small Step for Artists, One Giant Leap for Creative-Kind

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks and demonstrates how NVIDIA Studio technology improves creative workflows.  When it comes to converting 2D concepts into 3D masterpieces, self-taught visual development artist Alex Treviño has confidence in the potential of all Read article >

Categories
Misc

Webinar: NVIDIA RTX Caustics Branch of Unreal Engine

The interior of an empty office building with shadows.Explore how ray-traced caustics combined with NVIDIA RTX features can enhance the performance of your games.The interior of an empty office building with shadows.

Explore how ray-traced caustics combined with NVIDIA RTX features can enhance the performance of your games.

Categories
Offsites

Differentially private median and more

Differential privacy (DP) is a rigorous mathematical definition of privacy. DP algorithms are randomized to protect user data by ensuring that the probability of any particular output is nearly unchanged when a data point is added or removed. Therefore, the output of a DP algorithm does not disclose the presence of any one data point. There has been significant progress in both foundational research and adoption of differential privacy with contributions such as the Privacy Sandbox and Google Open Source Library.

ML and data analytics algorithms can often be described as performing multiple basic computation steps on the same dataset. When each such step is differentially private, so is the output, but with multiple steps the overall privacy guarantee deteriorates, a phenomenon known as the cost of composition. Composition theorems bound the increase in privacy loss with the number k of computations: In the general case, the privacy loss increases with the square root of k. This means that we need much stricter privacy guarantees for each step in order to meet our overall privacy guarantee goal. But in that case, we lose utility. One way to improve the privacy vs. utility trade-off is to identify when the use cases admit a tighter privacy analysis than what follows from composition theorems.

Good candidates for such improvement are when each step is applied to a disjoint part (slice) of the dataset. When the slices are selected in a data-independent way, each point affects only one of the k outputs and the privacy guarantees do not deteriorate with k. However, there are applications in which we need to select the slices adaptively (that is, in a way that depends on the output of prior steps). In these cases, a change of a single data point may cascade — changing multiple slices and thus increasing composition cost.

In “Õptimal Differentially Private Learning of Thresholds and Quasi-Concave Optimization”, presented at STOC 2023, we describe a new paradigm that allows for slices to be selected adaptively and yet avoids composition cost. We show that DP algorithms for multiple fundamental aggregation and learning tasks can be expressed in this Reorder-Slice-Compute (RSC) paradigm, gaining significant improvements in utility.

The Reorder-Slice-Compute (RSC) paradigm

An algorithm A falls in the RSC paradigm if it can be expressed in the following general form (see visualization below). The input is a sensitive set D of data points. The algorithm then performs a sequence of k steps as follows:

  1. Select an ordering over data points, a slice size m, and a DP algorithm M. The selection may depend on the output of A in prior steps (and hence is adaptive).
  2. Slice out the (approximately) top m data points according to the order from the dataset D, apply M to the slice, and output the result.
A visualization of three Reorder-Slice-Compute (RSC) steps.

If we analyze the overall privacy loss of an RSC algorithm using DP composition theorems, the privacy guarantee suffers from the expected composition cost, i.e., it deteriorates with the square root of the number of steps k. To eliminate this composition cost, we provide a novel analysis that removes the dependence on k altogether: the overall privacy guarantee is close to that of a single step! The idea behind our tighter analysis is a novel technique that limits the potential cascade of affected steps when a single data point is modified (details in the paper).

Tighter privacy analysis means better utility. The effectiveness of DP algorithms is often stated in terms of the smallest input size (number of data points) that suffices in order to release a correct result that meets the privacy requirements. We describe several problems with algorithms that can be expressed in the RSC paradigm and for which our tighter analysis improved utility.

Private interval point

We start with the following basic aggregation task. The input is a dataset D of n points from an ordered domain X (think of the domain as the natural numbers between 1 and |X|). The goal is to return a point y in X that is in the interval of D, that is between the minimum and the maximum points in D.

The solution to the interval point problem is trivial without the privacy requirement: simply return any point in the dataset D. But this solution is not privacy-preserving as it discloses the presence of a particular datapoint in the input. We can also see that if there is only one point in the dataset, a privacy-preserving solution is not possible, as it must return that point. We can therefore ask the following fundamental question: What is the smallest input size N for which we can solve the private interval point problem?

It is known that N must increase with the domain size |X| and that this dependence is at least the iterated log function log* |X| [1, 2]. On the other hand, the best prior DP algorithm required the input size to be at least (log* |X|)1.5. To close this gap, we designed an RSC algorithm that requires only an order of log* |X| points.

The iterated log function is extremely slow growing: It is the number of times we need to take a logarithm of a value before we reach a value that is equal to or smaller than 1. How did this function naturally come out in the analysis? Each step of the RSC algorithm remapped the domain to a logarithm of its prior size. Therefore there were log* |X| steps in total. The tighter RSC analysis eliminated a square root of the number of steps from the required input size.

Even though the interval point task seems very basic, it captures the essence of the difficulty of private solutions for common aggregation tasks. We next describe two of these tasks and express the required input size to these tasks in terms of N.

Private approximate median

One of these common aggregation tasks is approximate median: The input is a dataset D of n points from an ordered domain X. The goal is to return a point y that is between the ⅓ and ⅔ quantiles of D. That is, at least a third of the points in D are smaller or equal to y and at least a third of the points are larger or equal to y. Note that returning an exact median is not possible with differential privacy, since it discloses the presence of a datapoint. Hence we consider the relaxed requirement of an approximate median (shown below).

We can compute an approximate median by finding an interval point: We slice out the N smallest points and the N largest points and then compute an interval point of the remaining points. The latter must be an approximate median. This works when the dataset size is at least 3N.

An example of a data D over domain X, the set of interval points, and the set of approximate medians.

Private learning of axis-aligned rectangles

For the next task, the input is a set of n labeled data points, where each point x = (x1,….,xd) is a d-dimensional vector over a domain X. Displayed below, the goal is to learn values ai , bi for the axes i=1,…,d that define a d-dimensional rectangle, so that for each example x

  • If x is positively labeled (shown as red plus signs below) then it lies within the rectangle, that is, for all axes i, xi is in the interval [ai ,bi], and
  • If x is negatively labeled (shown as blue minus signs below) then it lies outside the rectangle, that is, for at least one axis i, xi is outside the interval [ai ,bi].
A set of 2-dimensional labeled points and a respective rectangle.

Any DP solution for this problem must be approximate in that the learned rectangle must be allowed to mislabel some data points, with some positively labeled points outside the rectangle or negatively labeled points inside it. This is because an exact solution could be very sensitive to the presence of a particular data point and would not be private. The goal is a DP solution that keeps this necessary number of mislabeled points small.

We first consider the one-dimensional case (d = 1). We are looking for an interval [a,b] that covers all positive points and none of the negative points. We show that we can do this with at most 2N mislabeled points. We focus on the positively labeled points. In the first RSC step we slice out the N smallest points and compute a private interval point as a. We then slice out the N largest points and compute a private interval point as b. The solution [a,b] correctly labels all negatively labeled points and mislabels at most 2N of the positively labeled points. Thus, at most ~2N points are mislabeled in total.

Illustration for d = 1, we slice out N left positive points and compute an interval point a, slice out N right positive points and compute an interval point b.

With d > 1, we iterate over the axes i = 1,….,d and apply the above for the ith coordinates of input points to obtain the values ai , bi. In each iteration, we perform two RSC steps and slice out 2N positively labeled points. In total, we slice out 2dN points and all remaining points were correctly labeled. That is, all negatively-labeled points are outside the final d-dimensional rectangle and all positively-labeled points, except perhaps ~2dN, lie inside the rectangle. Note that this algorithm uses the full flexibility of RSC in that the points are ordered differently by each axis. Since we perform d steps, the RSC analysis shaves off a factor of square root of d from the number of mislabeled points.

Training ML models with adaptive selection of training examples

The training efficiency or performance of ML models can sometimes be improved by selecting training examples in a way that depends on the current state of the model, e.g., self-paced curriculum learning or active learning.

The most common method for private training of ML models is DP-SGD, where noise is added to the gradient update from each minibatch of training examples. Privacy analysis with DP-SGD typically assumes that training examples are randomly partitioned into minibatches. But if we impose a data-dependent selection order on training examples, and further modify the selection criteria k times during training, then analysis through DP composition results in deterioration of the privacy guarantees of a magnitude equal to the square root of k.

Fortunately, example selection with DP-SGD can be naturally expressed in the RSC paradigm: each selection criteria reorders the training examples and each minibatch is a slice (for which we compute a noisy gradient). With RSC analysis, there is no privacy deterioration with k, which brings DP-SGD training with example selection into the practical domain.

Conclusion

The RSC paradigm was introduced in order to tackle an open problem that is primarily of theoretical significance, but turns out to be a versatile tool with the potential to enhance data efficiency in production environments.

Acknowledgments

The work described here was done jointly with Xin Lyu, Jelani Nelson, and Tamas Sarlos.