The responsible research and development of machine learning (ML) can play a pivotal role in helping to solve a wide variety of societal challenges. At Google, our research reflects our AI Principles, from helping to protect patients from medication errors and improving flood forecasting models, to presenting methods that tackle unfair bias in products, such as Google Translate, and providing resources for other researchers to do the same.
One broad category for applying ML responsibly is the task of classification — systems that sort data into labeled categories. At Google, such models are used throughout our products to enforce policies, ranging from the detection of hate speech to age-appropriate content filtering. While these classifiers serve vital functions, it is also essential that they are built in ways that minimize unfair biases for users.
Today, we are announcing the release of MinDiff, a new regularization technique available in the TF Model Remediation library for effectively and efficiently mitigating unfair biases when training ML models. In this post, we discuss the research behind this technique and explain how it addresses the practical constraints and requirements we’ve observed when incorporating it in Google’s products.
Unfair Biases in Classifiers
To illustrate how MinDiff can be used, consider an example of a product policy classifier that is tasked with identifying and removing text comments that could be considered toxic. One challenge is to make sure that the classifier is not unfairly biased against submissions from a particular group of users, which could result in incorrect removal of content from these groups.
The academic community has laid a solid theoretical foundation for ML fairness, offering a breadth of perspectives on what unfair bias means and on the tensions between different frameworks for evaluating fairness. One of the most common metrics is equality of opportunity, which, in our example, means measuring and seeking to minimize the difference in false positive rate (FPR) across groups. In the example above, this means that the classifier should not be more likely to incorrectly remove safe comments from one group than another. Similarly, the classifier’s false negative rate should be equal between groups. That is, the classifier should not miss toxic comments against one group more than it does for another.
When the end goal is to improve products, it’s important to be able to scale unfair bias mitigation to many models. However, this poses a number of challenges:
- Sparse demographic data: The original work on equality of opportunity proposed a post-processing approach to the problem, which consisted of assigning each user group a different classifier threshold at serving time to offset biases of the model. However, in practice this is often not possible for many reasons, such as privacy policies. For example, demographics are often collected by users self-identifying and opting in, but while some users will choose to do this, others may choose to opt-out or delete data. Even for in-process solutions (i.e., methods that change how a model is trained) one needs to assume that most data will not have associated demographics, and thus needs to make efficient use of the few examples for which demographics are known.
- Ease of Use: In order for any technique to be adopted broadly, it should be easy to incorporate into existing model architectures, and not be highly sensitive to hyperparameters. While an early approach to incorporating ML fairness principles into applications utilized adversarial learning, we found that it too frequently caused models to degenerate during training, which made it difficult for product teams to iterate and made new product teams wary.
- Quality: The method for removing unfair biases should also reduce the overall classification performance (e.g., accuracy) as little as possible. Because any decrease in accuracy caused by the mitigation approach could result in the moderation model allowing more toxic comments, striking the right balance is crucial.
MinDiff Framework
We iteratively developed the MinDiff framework over the previous few years to meet these design requirements. Because demographic information is so rarely known, we utilize in-process approaches in which the model’s training objective is augmented with an objective specifically focused on removing biases. This new objective is then optimized over the small sample of data with known demographic information. To improve ease of use, we switched from adversarial training to a regularization framework, which penalizes statistical dependency between its predictions and demographic information for non-harmful examples. This encourages the model to equalize error rates across groups, e.g., classifying non-harmful examples as toxic.
There are several ways to encode this dependency between predictions and demographic information. Our initial MinDiff implementation minimized the correlation between the predictions and the demographic group, which essentially optimized for the average and variance of predictions to be equal across groups, even if the distributions still differ afterward. We have since improved MinDiff further by considering the maximum mean discrepancy (MMD) loss, which is closer to optimizing for the distribution of predictions to be independent of demographics. We have found that this approach is better able to both remove biases and maintain model accuracy.
MinDiff with MMD better closes the FPR gap with less decrease in accuracy (on an academic benchmark dataset). |
To date we have launched modeling improvements across several classifiers at Google that moderate content quality. We went through multiple iterations to develop a robust, responsible, and scalable approach, solving research challenges and enabling broad adoption.
Gaps in error rates of classifiers is an important set of unfair biases to address, but not the only one that arises in ML applications. For ML researchers and practitioners, we hope this work can further advance research toward addressing even broader classes of unfair biases and the development of approaches that can be used in practical applications. In addition, we hope that the release of the MinDiff library and the associated demos and documentation, along with the tools and experience shared here, can help practitioners improve their models and products.
Acknowledgements
This research effort on ML Fairness in classification was jointly led with Jilin Chen, Shuo Chen, Ed H. Chi, Tulsee Doshi, and Hai Qian. Further, this work was pursued in collaboration with Jonathan Bischof, Qiuwen Chen, Pierre Kreitmann, and Christine Luu. The MinDiff infrastructure was also developed in collaboration with Nick Blumm, James Chen, Thomas Greenspan, Christina Greer, Lichan Hong, Manasi Joshi, Maciej Kula, Summer Misherghi, Dan Nanas, Sean O’Keefe, Mahesh Sathiamoorthy, Catherina Xu, and Zhe Zhao. (All names are listed in alphabetical order of last names.)
Creating art for digital video games takes a high degree of artistic creativity and technical knowledge, while also requiring game artists to quickly iterate on ideas and produce a high volume of assets, often in the face of tight deadlines. What if artists had a paintbrush that acted less like a tool and more like an assistant? A machine learning model acting as such a paintbrush could reduce the amount of time necessary to create high-quality art without sacrificing artistic choices, perhaps even enhancing creativity.
Today, we present Chimera Painter, a trained machine learning (ML) model that automatically creates a fully fleshed out rendering from a user-supplied creature outline. Employed as a demo application, Chimera Painter adds features and textures to a creature outline segmented with body part labels, such as “wings” or “claws”, when the user clicks the “transform” button. Below is an example using the demo with one of the preset creature outlines.
In this post, we describe some of the challenges in creating the ML model behind Chimera Painter and demonstrate how one might use the tool for the creation of video game-ready assets.
Prototyping for a New Type of Model
In developing an ML model to produce video-game ready creature images, we created a digital card game prototype around the concept of combining creatures into new hybrids that can then battle each other. In this game, a player would begin with cards of real-world animals (e.g., an axolotl or a whale) and could make them more powerful by combining them (making the dreaded Axolotl-Whale chimera). This provided a creative environment for demonstrating an image-generating model, as the number of possible chimeras necessitated a method for quickly designing large volumes of artistic assets that could be combined naturally, while still retaining identifiable visual characteristics of the original creatures.
Since our goal was to create high-quality creature card images guided by artist input, we experimented with generative adversarial networks (GANs), informed by artist feedback, to create creature images that would be appropriate for our fantasy card game prototype. GANs pair two convolutional neural networks against each other: a generator network to create new images and a discriminator network to determine if these images are samples from the training dataset (in this case, artist-created images) or not. We used a variant called a conditional GAN, where the generator takes a separate input to guide the image generation process. Interestingly, our approach was a strict departure from other GAN efforts, which typically focus on photorealism.
To train the GANs, we created a dataset of full color images with single-species creature outlines adapted from 3D creature models. The creature outlines characterized the shape and size of each creature, and provided a segmentation map that identified individual body parts. After model training, the model was tasked with generating multi-species chimeras, based on outlines provided by artists. The best performing model was then incorporated into Chimera Painter. Below we show some sample assets generated using the model, including single-species creatures, as well as the more complex multi-species chimeras.
Generated card art integrated into the card game prototype showing basic creatures (bottom row) and chimeras from multiple creatures, including an Antlion-Porcupine, Axolotl-Whale, and a Crab-Antion-Moth (top row). More info about the game itself is detailed in this Stadia Research presentation. |
Learning to Generate Creatures with Structure
An issue with using GANs for generating creatures was the potential for loss of anatomical and spatial coherence when rendering subtle or low-contrast parts of images, despite these being of high perceptual importance to humans. Examples of this can include eyes, fingers, or even distinguishing between overlapping body parts with similar textures (see the affectionately named BoggleDog below).
GAN-generated image showing mismatched body parts. |
Generating chimeras required a new non-photographic fantasy-styled dataset with unique characteristics, such as dramatic perspective, composition, and lighting. Existing repositories of illustrations were not appropriate to use as datasets for training an ML model, because they may be subject to licensing restrictions, have conflicting styles, or simply lack the variety needed for this task.
To solve this, we developed a new artist-led, semi-automated approach for creating an ML training dataset from 3D creature models, which allowed us to work at scale and rapidly iterate as needed. In this process, artists would create or obtain a set of 3D creature models, one for each creature type needed (such as hyenas or lions). Artists then produced two sets of textures that were overlaid on the 3D model using the Unreal Engine — one with the full color texture (left image, below) and the other with flat colors for each body part (e.g., head, ears, neck, etc), called a “segmentation map” (right image, below). This second set of body part segments was given to the model at training to ensure that the GAN learned about body part-specific structure, shapes, textures, and proportions for a variety of creatures.
Example dataset training image and its paired segmentation map. |
The 3D creature models were all placed in a simple 3D scene, again using the Unreal Engine. A set of automated scripts would then take this 3D scene and interpolate between different poses, viewpoints, and zoom levels for each of the 3D creature models, creating the full color images and segmentation maps that formed the training dataset for the GAN. Using this approach, we generated 10,000+ image + segmentation map pairs per 3D creature model, saving the artists millions of hours of time compared to creating such data manually (at approximately 20 minutes per image).
Fine Tuning
The GAN had many different hyper-parameters that could be adjusted, leading to different qualities in the output images. In order to better understand which versions of the model were better than others, artists were provided samples for different creature types generated by these models and asked to cull them down to a few best examples. We gathered feedback about desired characteristics present in these examples, such as a feeling of depth, style with regard to creature textures, and realism of faces and eyes. This information was used both to train new versions of the model and, after the model had generated hundreds of thousands of creature images, to select the best image from each creature category (e.g., gazelle, lynx, gorilla, etc).
We tuned the GAN for this task by focusing on the perceptual loss. This loss function component (also used in Stadia’s Style Transfer ML) computes a difference between two images using extracted features from a separate convolutional neural network (CNN) that was previously trained on millions of photographs from the ImageNet dataset. The features are extracted from different layers of the CNN and a weight is applied to each, which affects their contribution to the final loss value. We discovered that these weights were critically important in determining what a final generated image would look like. Below are some examples from the GAN trained with different perceptual loss weights.
Dino-Bat Chimeras generated using varying perceptual loss weights. |
Some of the variation in the images above is due to the fact that the dataset includes multiple textures for each creature (for example, a reddish or grayish version of the bat). However, ignoring the coloration, many differences are directly tied to changes in perceptual loss values. In particular, we found that certain values brought out sharper facial features (e.g., bottom right vs. top right) or “smooth” versus “patterned” (top right vs. bottom left) that made generated creatures feel more real.
Here are some creatures generated from the GAN trained with different perceptual loss weights, showing off a small sample of the outputs and poses that the model can handle.
Creatures generated using different models. |
A generated chimera (Dino-Bat-Hyena, to be exact) created using the conditional GAN. Output from the GAN (left) and the post-processed / composited card (right). |
Chimera Painter
The trained GAN is now available in the Chimera Painter demo, allowing artists to work iteratively with the model, rather than drawing dozens of similar creatures from scratch. An artist can select a starting point and then adjust the shape, type, or placement of creature parts, enabling rapid exploration and for the creation of a large volume of images. The demo also allows for uploading a creature outline created in an external program, like Photoshop. Simply download one of the preset creature outlines to get the colors needed for each creature part and use this as a template for drawing one outside of Chimera Painter, and then use the “Load’ button on the demo to use this outline to flesh out your creation.
It is our hope that these GAN models and the Chimera Painter demonstration tool might inspire others to think differently about their art pipeline. What can one create when using machine learning as a paintbrush?
Acknowledgments
This project is conducted in collaboration with many people. Thanks to Ryan Poplin, Lee Dotson, Trung Le, Monica Dinculescu, Marc Destefano, Aaron Cammarata, Maggie Oh, Richard Wu, Ji Hun Kim, Erin Hoffman-John, and Colin Boswell. Thanks to everyone who pitched in to give hours of art direction, technical feedback, and drawings of fantastic creatures.
As wearables and handheld devices decrease in size, haptics become an increasingly vital channel for feedback, be it through silent alerts or a subtle “click” sensation when pressing buttons on a touch screen. Haptic feedback, ubiquitous in nearly all wearable devices and mobile phones, is commonly enabled by a linear resonant actuator (LRA), a small linear motor that leverages resonance to provide a strong haptic signal in a small package. However, the touch and pressure sensing needed to activate the haptic feedback tend to depend on additional, separate hardware which increases the price, size and complexity of the system.
In “Haptics with Input: Back-EMF in Linear Resonant Actuators to Enable Touch, Pressure and Environmental Awareness”, presented at ACM UIST 2020, we demonstrate that widely available LRAs can sense a wide range of external information, such as touch, tap and pressure, in addition to being able to relay information about contact with the skin, objects and surfaces. We achieve this with off-the-shelf LRAs by multiplexing the actuation with short pulses of custom waveforms that are designed to enable sensing using the back-EMF voltage. We demonstrate the potential of this approach to enable expressive discrete buttons and vibrotactile interfaces and show how the approach could bring rich sensing opportunities to integrated haptics modules in mobile devices, increasing sensing capabilities with fewer components. Our technique is potentially compatible with many existing LRA drivers, as they already employ back-EMF sensing for autotuning of the vibration frequency.
Different off-the-shelf LRAs that work using this technique. |
Back-EMF Principle in an LRA
Inside the LRA enclosure is a magnet attached to a small mass, both moving freely on a spring. The magnet moves in response to the excitation voltage introduced by the voice coil. The motion of the oscillating mass produces a counter-electromotive force, or back-EMF, which is a voltage proportional to the rate of change of magnetic flux. A greater oscillation speed creates a larger back-EMF voltage, while a stationary mass generates zero back-EMF voltage.
Anatomy of the LRA. |
Active Back-EMF for Sensing
Touching or making contact with the LRA during vibration changes the velocity of the interior mass, as energy is dissipated into the contact object. This works well with soft materials that deform under pressure, such as the human body. A finger, for example, absorbs different amounts of energy depending on the contact force as it flattens against the LRA. By driving the LRA with small amounts of energy, we can measure this phenomenon using the back-EMF voltage. Because leveraging the back-EMF behavior for sensing is an active process, the key insight that enabled this work was the design of a custom, off-resonance driver waveform that allows continuous sensing while minimizing vibrations, sound and power consumption.
Touch and pressure sensing on the LRA. |
We measure back-EMF from the floating voltage between the two LRA leads, which requires disconnecting the motor driver briefly to avoid disturbances. While the driver is disconnected, the mass is still oscillating inside the LRA, producing an oscillating back-EMF voltage. Because commercial back-EMF sensing LRA drivers do not provide the raw data, we designed a custom circuit that is able to pick up and amplify small back-EMF voltage. We also generated custom drive pulses that minimize vibrations and energy consumption.
Simplified schematic of the LRA driver and the back-EMF measurement circuit for active sensing. |
Applications
The behavior of the LRAs used in mobile phones is the same, whether they are on a table, on a soft surface, or hand held. This may cause problems, as a vibrating phone could slide off a glass table or emit loud and unnecessary vibrating sounds. Ideally, the LRA on a phone would automatically adjust based on its environment. We demonstrate our approach for sensing using the LRA back-EMF technique by wiring directly to a Pixel 4’s LRA, and then classifying whether the phone is held in hand, placed on a soft surface (foam), or placed on a table.
Sensing phone surroundings. |
We also present a prototype that demonstrates how LRAs could be used as combined input/output devices in portable electronics. We attached two LRAs, one on the left and one on the right side of a phone. The buttons provide tap, touch, and pressure sensing. They are also programmed to provide haptic feedback, once the touch is detected.
Pressure-sensitive side buttons. |
There are a number of wearable tactile aid devices, such as sleeves, vests, and bracelets. To transmit tactile feedback to the skin with consistent force, the tactor has to apply the right pressure; it can not be too loose or too tight. Currently, the typical way to do so is through manual adjustment, which can be inconsistent and lacks measurable feedback. We show how the LRA back-EMF technique can be used to continuously monitor the fit bracelet device and prompt the user if it’s too tight, too loose, or just right.
Fit sensing bracelet. |
Evaluating an LRA as a Sensor
The LRA works well as a pressure sensor, because it has a quadratic response to the force magnitude during touch. Our method works for all five off-the-shelf LRA types that we evaluated. Because the typical power consumption is only 4.27 mA, all-day sensing would only reduce the battery life of a Pixel 4 phone from 25 to 24 hours. The power consumption can be greatly reduced by using low-power amplifiers and employing active sensing only when needed, such as when the phone is active and interacting with the user.
Back-EMF voltage changes when pressure is applied with a finger. |
The challenge with active sensing is to minimize vibrations, so they are not perceptible when touching and do not produce audible sound. We optimize the active sensing to produce only 2 dB of sound and 0.45 m/s2 peak-to-peak acceleration, which is just barely perceptible by finger and is quiet, in contrast to the regular 8.49 m/s2 .
Future Work and Conclusion
To see the work presented here in action, please see the video below.
In the future, we plan to explore other sensing techniques, perhaps measuring the current could be an alternative approach. Also, using machine learning could potentially improve the sensing and provide more accurate classification of the complex back-EMF patterns. Our method could be developed further to enable closed-loop feedback with the actuator and sensor, which would allow the actuator to provide the same force, regardless of external conditions.
We believe that this work opens up new opportunities for leveraging existing ubiquitous hardware to provide rich interactions and closed-loop feedback haptic actuators.
Acknowledgments
This work was done by Artem Dementyev, Alex Olwal, and Richard Lyon. Thanks to Mathieu Le Goc and Thad Starner for feedback on the paper.
As natural language processing (NLP) models become more powerful and are deployed in more real-world contexts, understanding their behavior is becoming increasingly critical. While advances in modeling have brought unprecedented performance on many NLP tasks, many research questions remain about not only the behavior of these models under domain shift and adversarial settings, but also their tendencies to behave according to social biases or shallow heuristics.
For any new model, one might want to know in which cases a model performs poorly, why a model makes a particular prediction, or whether a model will behave consistently under varying inputs, such as changes to textual style or pronoun gender. But, despite the recent explosion of work on model understanding and evaluation, there is no “silver bullet” for analysis. Practitioners must often experiment with many techniques, looking at local explanations, aggregate metrics, and counterfactual variations of the input to build a better understanding of model behavior, with each of these techniques often requiring its own software package or bespoke tool. Our previously released What-If Tool was built to address this challenge by enabling black-box probing of classification and regression models, thus enabling researchers to more easily debug performance and analyze the fairness of machine learning models through interaction and visualization. But there was still a need for a toolkit that would address challenges specific to NLP models.
With these challenges in mind, we built and open-sourced the Language Interpretability Tool (LIT), an interactive platform for NLP model understanding. LIT builds upon the lessons learned from the What-If Tool with greatly expanded capabilities, which cover a wide range of NLP tasks including sequence generation, span labeling, classification and regression, along with customizable and extensible visualizations and model analysis.
LIT supports local explanations, including salience maps, attention, and rich visualizations of model predictions, as well as aggregate analysis including metrics, embedding spaces, and flexible slicing. It allows users to easily hop between visualizations to test local hypotheses and validate them over a dataset. LIT provides support for counterfactual generation, in which new data points can be added on the fly, and their effect on the model visualized immediately. Side-by-side comparison allows for two models, or two individual data points, to be visualized simultaneously. More details about LIT can be found in our system demonstration paper, which was presented at EMNLP 2020.
Exploring a sentiment classifier with LIT. |
Customizability
In order to better address the broad range of users with different interests and priorities that we hope will use LIT, we’ve built the tool to be easily customizable and extensible from the start. Using LIT on a particular NLP model and dataset only requires writing a small bit of Python code. Custom components, such as task-specific metrics calculations or counterfactual generators, can be written in Python and added to a LIT instance through our provided APIs. Also, the front end itself can be customized, with new modules that integrate directly into the UI. For more on extending the tool, check out our documentation on GitHub.
Demos
To illustrate some of the capabilities of LIT, we have created a few demos using pre-trained models. The full list is available on the LIT website, and we describe two of them here:
- Sentiment analysis: In this example, a user can explore a BERT-based binary classifier that predicts if a sentence has positive or negative sentiment. The demo uses the Stanford Sentiment Treebank of sentences from movie reviews to demonstrate model behavior. One can examine local explanations using saliency maps provided by a variety of techniques (such as LIME and integrated gradients), and can test model behavior with perturbed (counterfactual) examples using techniques such as back-translation, word replacement, or adversarial attacks. These techniques can help pinpoint under what scenarios a model fails, and whether those failures are generalizable, which can then be used to inform how best to improve a model.
Analyzing token-based salience of an incorrect prediction. The word “laughable” seems to be incorrectly raising the positive sentiment score of this example. - Masked word prediction: Masked language modeling is a “fill-in-the-blank” task, where the model predicts different words that could complete a sentence. For example, given the prompt, “I took my ___ for a walk”, the model might predict a high score for “dog.” In LIT one can explore this interactively by typing in sentences or choosing from a pre-loaded corpus, and then clicking specific tokens to see what a model like BERT understands about language, or about the world.
Interactively selecting a token to mask, and viewing a language model’s predictions.
LIT in Practice and Future Work
Although LIT is a new tool, we have already seen the value that it can provide for model understanding. Its visualizations can be used to find patterns in model behavior, such as outlying clusters in embedding space, or words with outsized importance to the predictions. Exploration in LIT can test for potential biases in models, as demonstrated in our case study of LIT exploring gender bias in a coreference model. This type of analysis can inform next steps in improving model performance, such as applying MinDiff to mitigate systemic bias. It can also be used as an easy and fast way to create an interactive demo for any NLP model.
Check out the tool either through our provided demos, or by bringing up a LIT server for your own models and datasets. The work on LIT has just started, and there are a number of new capabilities and refinements planned, including the addition of new interpretability techniques from cutting edge ML and NLP research. If there are other techniques that you’d like to see added to the tool, please let us know! Join our mailing list to stay up to date as LIT evolves. And as the code is open-source, we welcome feedback on and contributions to the tool.
Acknowledgments
LIT is a collaborative effort between the Google Research PAIR and Language teams. This post represents the work of the many contributors across Google, including Andy Coenen, Ann Yuan, Carey Radebaugh, Ellen Jiang, Emily Reif, Jasmijn Bastings, Kristen Olson, Leslie Lai, Mahima Pushkarna, Sebastian Gehrmann, and Tolga Bolukbasi. We would like to thank all those who contributed to the project, both inside and outside Google, and the teams that have piloted its use and provided valuable feedback.
Last year we launched Recorder, a new kind of recording app that made audio recording smarter and more useful by leveraging on-device machine learning (ML) to transcribe the recording, highlight audio events, and suggest appropriate tags for titles. Recorder makes editing, sharing and searching through transcripts easier. Yet because Recorder can transcribe very long recordings (up to 18 hours!), it can still be difficult for users to find specific sections, necessitating a new solution to quickly navigate such long transcripts.
To increase the navigability of content, we introduce Smart Scrolling, a new ML-based feature in Recorder that automatically marks important sections in the transcript, chooses the most representative keywords from each section, and then surfaces those keywords on the vertical scrollbar, like chapter headings. The user can then scroll through the keywords or tap on them to quickly navigate to the sections of interest. The models used are lightweight enough to be executed on-device without the need to upload the transcript, thus preserving user privacy.
Smart Scrolling feature UX |
Under the Hood
The Smart Scrolling feature is composed of two distinct tasks. The first extracts representative keywords from each section and the second picks which sections in the text are the most informative and unique.
For each task, we utilize two different natural language processing (NLP) approaches: a distilled bidirectional transformer (BERT) model pre-trained on data sourced from a Wikipedia dataset, alongside a modified extractive term frequency–inverse document frequency (TF-IDF) model. By using the bidirectional transformer and the TF-IDF-based models in parallel for both the keyword extraction and important section identification tasks, alongside aggregation heuristics, we were able to harness the advantages of each approach and mitigate their respective drawbacks (more on this in the next section).
The bidirectional transformer is a neural network architecture that employs a self-attention mechanism to achieve context-aware processing of the input text in a non-sequential fashion. This enables parallel processing of the input text to identify contextual clues both before and after a given position in the transcript.
Bidirectional Transformer-based model architecture |
The extractive TF-IDF approach rates terms based on their frequency in the text compared to their inverse frequency in the trained dataset, and enables the finding of unique representative terms in the text.
Both models were trained on publicly available conversational datasets that were labeled and evaluated by independent raters. The conversational datasets were from the same domains as the expected product use cases, focusing on meetings, lectures, and interviews, thus ensuring the same word frequency distribution (Zipf’s law).
Extracting Representative Keywords
The TF-IDF-based model detects informative keywords by giving each word a score, which corresponds to how representative this keyword is within the text. The model does so, much like a standard TF-IDF model, by utilizing the ratio of the number of occurrences of a given word in the text compared to the whole of the conversational data set, but it also takes into account the specificity of the term, i.e., how broad or specific it is. Furthermore, the model then aggregates these features into a score using a pre-trained function curve. In parallel, the bidirectional transformer model, which was fine tuned on the task of extracting keywords, provides a deep semantic understanding of the text, enabling it to extract precise context-aware keywords.
The TF-IDF approach is conservative in the sense that it is prone to finding uncommon keywords in the text (high bias), while the drawback for the bidirectional transformer model is the high variance of the possible keywords that can be extracted. But when used together, these two models complement each other, forming a balanced bias-variance tradeoff.
Once the keyword scores are retrieved from both models, we normalize and combine them by utilizing NLP heuristics (e.g., the weighted average), removing duplicates across sections, and eliminating stop words and verbs. The output of this process is an ordered list of suggested keywords for each of the sections.
Rating a Section’s Importance
The next task is to determine which sections should be highlighted as informative and unique. To solve this task, we again combine the two models mentioned above, which yield two distinct importance scores for each of the sections. We compute the first score by taking the TF-IDF scores of all the keywords in the section and weighting them by their respective number of appearances in the section, followed by a summation of these individual keyword scores. We compute the second score by running the section text through the bidirectional transformer model, which was also trained on the sections rating task. The scores from both models are normalized and then combined to yield the section score.
Smart Scrolling pipeline architecture |
Some Challenges
A significant challenge in the development of Smart Scrolling was how to identify whether a section or keyword is important – what is of great importance to one person can be of less importance to another. The key was to highlight sections only when it is possible to extract helpful keywords from them.
To do this, we configured the solution to select the top scored sections that also have highly rated keywords, with the number of sections highlighted proportional to the length of the recording. In the context of the Smart Scrolling features, a keyword was more highly rated if it better represented the unique information of the section.
To train the model to understand this criteria, we needed to prepare a labeled training dataset tailored to this task. In collaboration with a team of skilled raters, we applied this labeling objective to a small batch of examples to establish an initial dataset in order to evaluate the quality of the labels and instruct the raters in cases where there were deviations from what was intended. Once the labeling process was complete we reviewed the labeled data manually and made corrections to the labels as necessary to align them with our definition of importance.
Using this limited labeled dataset, we ran automated model evaluations to establish initial metrics on model quality, which were used as a less-accurate proxy to the model quality, enabling us to quickly assess the model performance and apply changes in the architecture and heuristics. Once the solution metrics were satisfactory, we utilized a more accurate manual evaluation process over a closed set of carefully chosen examples that represented expected Recorder use cases. Using these examples, we tweaked the model heuristics parameters to reach the desired level of performance using a reliable model quality evaluation.
Runtime Improvements
After the initial release of Recorder, we conducted a series of user studies to learn how to improve the usability and performance of the Smart Scrolling feature. We found that many users expect the navigational keywords and highlighted sections to be available as soon as the recording is finished. Because the computation pipeline described above can take a considerable amount of time to compute on long recordings, we devised a partial processing solution that amortizes this computation over the whole duration of the recording. During recording, each section is processed as soon as it is captured, and then the intermediate results are stored in memory. When the recording is done, Recorder aggregates the intermediate results.
When running on a Pixel 5, this approach reduced the average processing time of an hour long recording (~9K words) from 1 minute 40 seconds to only 9 seconds, while outputting the same results.
Summary
The goal of Recorder is to improve users’ ability to access their recorded content and navigate it with ease. We have already made substantial progress in this direction with the existing ML features that automatically suggest title words for recordings and enable users to search recordings for sounds and text. Smart Scrolling provides additional text navigation abilities that will further improve the utility of Recorder, enabling users to rapidly surface sections of interest, even for long recordings.
Acknowledgments
Bin Zhang, Sherry Lin, Isaac Blankensmith, Henry Liu, Vincent Peng, Guilherme Santos, Tiago Camolesi, Yitong Lin, James Lemieux, Thomas Hall, Kelly Tsai, Benny Schlesinger, Dror Ayalon, Amit Pitaru, Kelsie Van Deman, Console Chen, Allen Su, Cecile Basnage, Chorong Johnston, Shenaz Zack, Mike Tsao, Brian Chen, Abhinav Rastogi, Tracy Wu, Yvonne Yang.
While convolutional neural networks (CNNs) have been used in computer vision since the 1980s, they were not at the forefront until 2012 when AlexNet surpassed the performance of contemporary state-of-the-art image recognition methods by a large margin. Two factors helped enable this breakthrough: (i) the availability of training sets like ImageNet, and (ii) the use of commoditized GPU hardware, which provided significantly more compute for training. As such, since 2012, CNNs have become the go-to model for vision tasks.
The benefit of using CNNs was that they avoided the need for hand-designed visual features, instead learning to perform tasks directly from data “end to end”. However, while CNNs avoid hand-crafted feature-extraction, the architecture itself is designed specifically for images and can be computationally demanding. Looking forward to the next generation of scalable vision models, one might ask whether this domain-specific design is necessary, or if one could successfully leverage more domain agnostic and computationally efficient architectures to achieve state-of-the-art results.
As a first step in this direction, we present the Vision Transformer (ViT), a vision model based as closely as possible on the Transformer architecture originally designed for text-based tasks. ViT represents an input image as a sequence of image patches, similar to the sequence of word embeddings used when applying Transformers to text, and directly predicts class labels for the image. ViT demonstrates excellent performance when trained on sufficient data, outperforming a comparable state-of-the-art CNN with four times fewer computational resources. To foster additional research in this area, we have open-sourced both the code and models.
The Vision Transformer treats an input image as a sequence of patches, akin to a series of word embeddings generated by a natural language processing (NLP) Transformer. |
The Vision Transformer
The original text Transformer takes as input a sequence of words, which it then uses for classification, translation, or other NLP tasks. For ViT, we make the fewest possible modifications to the Transformer design to make it operate directly on images instead of words, and observe how much about image structure the model can learn on its own.
ViT divides an image into a grid of square patches. Each patch is flattened into a single vector by concatenating the channels of all pixels in a patch and then linearly projecting it to the desired input dimension. Because Transformers are agnostic to the structure of the input elements we add learnable position embeddings to each patch, which allow the model to learn about the structure of the images. A priori, ViT does not know about the relative location of patches in the image, or even that the image has a 2D structure — it must learn such relevant information from the training data and encode structural information in the position embeddings.
Scaling Up
We first train ViT on ImageNet, where it achieves a best score of 77.9% top-1 accuracy. While this is decent for a first attempt, it falls far short of the state of the art — the current best CNN trained on ImageNet with no extra data reaches 85.8%. Despite mitigation strategies (e.g., regularization), ViT overfits the ImageNet task due to its lack of inbuilt knowledge about images.
To investigate the impact of dataset size on model performance, we train ViT on ImageNet-21k (14M images, 21k classes) and JFT (300M images, 18k classes), and compare the results to a state-of-the-art CNN, Big Transfer (BiT), trained on the same datasets. As previously observed, ViT performs significantly worse than the CNN equivalent (BiT) when trained on ImageNet (1M images). However, on ImageNet-21k (14M images) performance is comparable, and on JFT (300M images), ViT now outperforms BiT.
Finally, we investigate the impact of the amount of computation involved in training the models. For this, we train several different ViT models and CNNs on JFT. These models span a range of model sizes and training durations. As a result, they require varying amounts of compute for training. We observe that, for a given amount of compute, ViT yields better performance than the equivalent CNNs.
Left: Performance of ViT when pre-trained on different datasets. Right: ViT yields a good performance/compute trade-off. |
High-Performing Large-Scale Image Recognition
Our data suggest that (1) with sufficient training ViT can perform very well, and (2) ViT yields an excellent performance/compute trade-off at both smaller and larger compute scales. Therefore, to see if performance improvements carried over to even larger scales, we trained a 600M-parameter ViT model.
This large ViT model attains state-of-the-art performance on multiple popular benchmarks, including 88.55% top-1 accuracy on ImageNet and 99.50% on CIFAR-10. ViT also performs well on the cleaned-up version of the ImageNet evaluations set “ImageNet-Real”, attaining 90.72% top-1 accuracy. Finally, ViT works well on diverse tasks, even with few training data points. For example, on the VTAB-1k suite (19 tasks with 1,000 data points each), ViT attains 77.63%, significantly ahead of the single-model state of the art (SOTA) (76.3%), and even matching SOTA attained by an ensemble of multiple models (77.6%). Most importantly, these results are obtained using fewer compute resources compared to previous SOTA CNNs, e.g., 4x fewer than the pre-trained BiT models.
Vision Transformer matches or outperforms state-of-the-art CNNs on popular benchmarks. Left: Popular image classification tasks (ImageNet, including new validation labels ReaL, and CIFAR, Pets, and Flowers). Right: Average across 19 tasks in the VTAB classification suite. |
Visualizations
To gain some intuition into what the model learns, we visualize some of its internal workings. First, we look at the position embeddings — parameters that the model learns to encode the relative location of patches — and find that ViT is able to reproduce an intuitive image structure. Each position embedding is most similar to others in the same row and column, indicating that the model has recovered the grid structure of the original images. Second, we examine the average spatial distance between one element attending to another for each transformer block. At higher layers (depths of 10-20) only global features are used (i.e., large attention distances), but the lower layers (depths 0-5) capture both global and local features, as indicated by a large range in the mean attention distance. By contrast, only local features are present in the lower layers of a CNN. These experiments indicate that ViT can learn features hard-coded into CNNs (such as awareness of grid structure), but is also free to learn more generic patterns, such as a mix of local and global features at lower layers, that can aid generalization.
Summary
While CNNs have revolutionized computer vision, our results indicate that models tailor-made for imaging tasks may be unnecessary, or even sub-optimal. With ever-increasing dataset sizes, and the continued development of unsupervised and semi-supervised methods, the development of new vision architectures that train more efficiently on these datasets becomes increasingly important. We believe ViT is a preliminary step towards generic, scalable architectures that can solve many vision tasks, or even tasks from many domains, and are excited for future developments.
A preprint of our work as well as code and models are publically available.
Acknowledgements
We would like to thank our co-authors in Berlin, Zürich, and Amsterdam: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, and Jakob Uszkoreit. We would like to thank Andreas Steiner for crucial help with infrastructure and open-sourcing, Joan Puigcerver and Maxim Neumann for work on large-scale training infrastructure, and Dmitry Lepikhin, Aravindh Mahendran, Daniel Keysers, Mario Lučić, Noam Shazeer, and Colin Raffel for useful discussions. Finally, we thank Tom Small for creating the Visual Transformer animation in this post.
Using AutoML for Time Series Forecasting
Time series forecasting is an important research area for machine learning (ML), particularly where accurate forecasting is critical, including several industries such as retail, supply chain, energy, finance, etc. For example, in the consumer goods domain, improving the accuracy of demand forecasting by 10-20% can reduce inventory by 5% and increase revenue by 2-3%. Current ML-based forecasting solutions are usually built by experts and require significant manual effort, including model construction, feature engineering and hyper-parameter tuning. However, such expertise may not be broadly available, which can limit the benefits of applying ML towards time series forecasting challenges.
To address this, automated machine learning (AutoML) is an approach that makes ML more widely accessible by automating the process of creating ML models, and has recently accelerated both ML research and the application of ML to real-world problems. For example, the initial work on neural architecture search enabled breakthroughs in computer vision, such as NasNet, AmoebaNet, and EfficientNet, and in natural language processing, such as Evolved Transformer. More recently, AutoML has also been applied to tabular data.
Today we introduce a scalable end-to-end AutoML solution for time series forecasting, which meets three key criteria:
- Fully automated: The solution takes in data as input, and produces a servable TensorFlow model as output with no human intervention.
- Generic: The solution works for most time series forecasting tasks and automatically searches for the best model configuration for each task.
- High-quality: The produced models have competitive quality compared to those manually crafted for specific tasks.
We demonstrate the success of this approach through participation in the M5 forecasting competition, where this AutoML solution achieved competitive performance against hand-crafted models with moderate compute cost.
Challenges in Time Series Forecasting
Time series forecasting presents several challenges to machine learning models. First, the uncertainty is often high since the goal is to predict the future based on historical data. Unlike other machine learning problems, the test set, for example, future product sales, might have a different distribution from the training and validation set, which are extracted from the historical data. Second, the time series data from the real world often suffers from missing data and high intermittency (i.e., when a high fraction of the time series has the value of zero). Some time series tasks may not have historical data available and suffer from the cold start problem, for example, when predicting the sales of a new product. Third, since we aim to build a fully automated generic solution, the same solution needs to apply to a variety of datasets, which can vary significantly in the domain (product sales, web traffic, etc), the granularity (daily, hourly, etc), the history length, the types of features (categorical, numerical, date time, etc), and so on.
An AutoML Solution
To tackle these challenges, we designed an end-to-end TensorFlow pipeline with a specialized search space for time series forecasting. It is based on an encoder-decoder architecture, in which an encoder transforms the historical information in a time series into a set of vectors, and a decoder generates the future predictions based on these vectors. Inspired by the state-of-the-art sequence models, such as Transformer and WaveNet, and best practices in time series forecasting, our search space included components such as attention, dilated convolution, gating, skip connections, and different feature transformations. The resulting AutoML solution searches for the best combination of these components as well as core hyperparameters.
To combat the uncertainty in predicting the future of a time series, an ensemble of the top models discovered in the search is used to make final predictions. The diversity in the top models made the predictions more robust to uncertainty and less prone to overfitting the historical data. To handle time series with missing data, we fill in the gaps with a trainable vector and let the model learn to adapt to the missing time steps. To address intermittency, we predict, for each future time step, not only the value, but also the probability that the value at this time step is non-zero, and combine the two predictions. Finally, we found that the automated search is able to adjust the architecture and hyperparameter choices for different datasets, which makes the AutoML solution generic and automates the modeling efforts.
Benchmarking in Forecasting Competitions
To benchmark our AutoML solution, we participated in the M5 forecasting competition, the latest in the M-competition series, which is one of the most important competitions in the forecasting community, with a long history spanning nearly 40 years. This most recent competition was hosted on Kaggle and used a dataset from Walmart product sales, the real-world nature of which makes the problem quite challenging.
We participated in the competition with our fully automated solution and achieved a rank of 138 out of 5558 participants (top 2.5%) on the final leaderboard, which is in the silver medal zone. Participants in the competition had almost four months to produce their models. While many of the competitive forecasting models required months of manual effort to create, our AutoML solution found the model in a short time with only a moderate compute cost (500 CPUs for 2 hours) and no human intervention.
We also benchmarked our AutoML forecasting solution on several other Kaggle datasets and found that on average it outperforms 92% of hand-crafted models, despite its limited resource use.
Evaluation of the AutoML Forecasting solution on other Kaggle Datasets (Rossman Store Sales, Web Traffic, Favorita Grocery Sales) besides M5. |
This work demonstrates the strength of an end-to-end AutoML solution for time series forecasting, and we are excited about its potential impact on real-world applications.
Acknowledgements
This project was a joint effort of Google Brain team members Chen Liang, Da Huang, Yifeng Lu and Quoc V. Le. We also thank Junwei Yuan, Xingwei Yang, Dawei Jia, Chenyu Zhao, Tin-yun Ho, Meng Wang, Yaguang Li, Nicolas Loeff, Manish Kurse, Kyle Anderson and Nishant Patil for their collaboration.
Google at NeurIPS 2020
This week marks the beginning of the 34th annual Conference on Neural Information Processing Systems (NeurIPS 2020), the biggest machine learning conference of the year. Held virtually for the first time, this conference includes invited talks, demonstrations and presentations of some of the latest in machine learning research. As a Platinum Sponsor of NeurIPS 2020, Google will have a strong presence with more than 180 accepted papers, additionally contributing to and learning from the broader academic research community via talks, posters, workshops and tutorials.
If you are registered for NeurIPS 2020, we hope you’ll visit our virtual booth and chat with our researchers about the projects and opportunities at Google that go into solving the world’s most challenging research problems, and to see demonstrations of some of the exciting research we pursue, such as Transformers for image recognition, Tone Transfer, large-scale distributed RL, recreating historical streetscapes and much more. You can also learn more about our work being presented in the list below (Google affiliations highlighted in blue).
Organizing Committees
General Chair: Hugo Larochelle
Workshop Co-Chair: Sanmi Koyejo
Diversity and Inclusion Chairs include: Katherine Heller
Expo Chair: Pablo Samuel Castro
Senior Area Chairs include: Corinna Cortes, Fei Sha, Mohammad Ghavamzadeh, Sanjiv Kumar, Charles Sutton, Dale Schuurmans, David Duvenaud, Elad Hazan, Marco Cuturi, Peter Bartlett, Samy Bengio, Tong Zhang, Claudio Gentile, Kevin Murphy, Cordelia Schmid, Amir Globerson
Area Chairs include: Boqing Gong, Afshin Rostamizadeh, Alex Kulesza, Branislav Kveton, Craig Boutilier, Heinrich Jiang, Manzil Zaheer, Silvio Lattanzi, Slav Petrov, Srinadh Bhojanapalli, Rodolphe Jenatton, Mathieu Blondel, Aleksandra Faust, Alexey Dosovitskiy, Ashish Vaswani, Augustus Odena, Balaji Lakshminarayanan, Ben Poole, Colin Raffel, Danny Tarlow, David Ha, Denny Zhou, Dumitru Erhan, Dustin Tran, George Tucker, Honglak Lee, Ilya Tolstikhin, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Kevin Swersky, Matthew Johnson, Minmin Chen, Mohammad Norouzi, Moustapha Cisse, Naman Agarwal, Nicholas Carlini, Olivier Bachem, Tim Salimans, Vincent Dumoulin, Yann Dauphin, Andrew Dai, Izhak Shafran, Karthik Sridharan, Abhinav Gupta, Abhishek Kumar, Adam White, Aditya Menon, Kun Zhang, Ce Liu, Cristian Sminchisescu, Hossein Mobahi, Phillip Isola, Tomer Koren, Chelsea Finn, Amin Karbasi
NeurIPS 2020 Foundation Board includes: Michael Mozer, Samy Bengio, Corinna Cortes, Hugo Larochelle, John C. Platt, Fernando Pereira
Accepted Papers
Rankmax: An Adaptive Projection Alternative to the Softmax Function
Weiwei Kong*, Walid Krichene, Nicolas Mayoraz, Steffen Rendle, Li Zhang
Unsupervised Sound Separation Using Mixture Invariant Training
Scott Wisdom, Efthymios Tzinis*, Hakan Erdogan, Ron Weiss, Kevin Wilson, John Hershey
Learning to Select Best Forecast Tasks for Clinical Outcome Prediction
Yuan Xue, Nan Du, Anne Mottram, Martin Seneviratne, Andrew M. Dai
Interpretable Sequence Learning for Covid-19 Forecasting
Sercan O. Arık, Chun-Liang Li, Jinsung Yoon, Rajarishi Sinha, Arkady Epshteyn, Long T. Le, Vikas Menon, Shashank Singh, Leyou Zhang, Nate Yoder, Martin Nikoltchev, Yash Sonthalia, Hootan Nakhost, Elli Kanal, Tomas Pfister
Towards Learning Convolutions from Scratch
Behnam Neyshabur
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine
Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
Minhae Kwon, Saurabh Daptardar, Paul Schrater, Xaq Pitkow
Off-Policy Evaluation via the Regularized Lagrangian
Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans
CoinDICE: Off-Policy Confidence Interval Estimation
Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans
Unsupervised Data Augmentation for Consistency Training
Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le
VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain
Jinsung Yoon, Yao Zhang, James Jordon, Mihaela van der Schaar
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai, Guokun Lai, Yiming Yang, Quoc Le
Big Bird: Transformers for Longer Sequences
Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed
Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach
Luofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Zhaoran Wang, Mladen Kolar
Conservative Q-Learning for Offline Reinforcement Learning
Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine
MOReL: Model-Based Offline Reinforcement Learning
Rahul Kidambi, Aravind Rajeswaran, Praneeth Netrapalli, Thorsten Joachims
Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness
Long Zhao, Ting Liu, Xi Peng, Dimitris Metaxas
Generative View Synthesis: From Single-view Semantics to Novel-view Images
Tewodros Habtegebrial, Varun Jampani, Orazio Gallo, Didier Stricker
PIE-NET: Parametric Inference of Point Cloud Edges
Xiaogang Wang, Yuelang Xu, Kai Xu, Andrea Tagliasacchi, Bin Zhou, Ali Mahdavi-Amiri, Hao Zhang
Enabling Certification of Verification-Agnostic Networks via Memory-Efficient Semidefinite Programming
Sumanth Dathathri, Krishnamurthy (Dj) Dvijotham, Alex Kurakin, Aditi Raghunathan, Jonathan Uesato, Rudy Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow*, Percy Liang, Pushmeet Kohli
An Analysis of SVD for Deep Rotation Estimation
Jake Levinson, Carlos Esteves, Kefan Chen, Noah Snavely, Angjoo Kanazawa, Afshin Rostamizadeh, Ameesh Makadia
Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces
Guy Lorberbom, Chris J. Maddison, Nicolas Heess, Tamir Hazan, Daniel Tarlow
Faster Differentially Private Samplers via Rényi Divergence Analysis of Discretized Langevin MCMC
Arun Ganesh*, Kunal Talwar*
DISK: Learning Local Features with Policy Gradient
Michał J. Tyszkiewicz, Pascal Fua, Eduard Trulls
Robust Large-margin Learning in Hyperbolic Space
Melanie Weber*, Manzil Zaheer, Ankit Singh Rawat, Aditya Menon, Sanjiv Kumar
Gamma-Models: Generative Temporal Difference Learning for Infinite-Horizon Prediction
Michael Janner, Igor Mordatch, Sergey Levine
Adversarially Robust Streaming Algorithms via Differential Privacy
Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Yossi Matias, Uri Stemmer
Faster DBSCAN via Subsampled Similarity Queries
Heinrich Jiang, Jennifer Jang, Jakub Łacki
Exact Recovery of Mangled Clusters with Same-Cluster Queries
Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice
A Maximum-Entropy Approach to Off-Policy Evaluation in Average-Reward MDPs
Nevena Lazic, Dong Yin, Mehrdad Farajtabar, Nir Levine, Dilan Görür, Chris Harris, Dale Schuurmans
Fairness in Streaming Submodular Maximization: Algorithms and Hardness
Marwa El Halabi, Slobodan Mitrović, Ashkan Norouzi-Fard, Jakab Tardos, Jakub Tarnawski
Efficient Active Learning of Sparse Halfspaces with Arbitrary Bounded Noise
Chicheng Zhang, Jie Shen, Pranjal Awasthi
Private Learning of Halfspaces: Simplifying the Construction and Reducing the Sample Complexity
Haim Kaplan, Yishay Mansour, Uri Stemmer, Eliad Tsfadia
Synthetic Data Generators — Sequential and Private
Olivier Bousquet, Roi Livni, Shay Moran
Learning Discrete Distributions: User vs Item-level Privacy
Yuhan Liu, Ananda Theertha Suresh, Felix Xinnan X. Yu, Sanjiv Kumar, Michael Riley
Learning Differential Equations that are Easy to Solve
Jacob Kelly, Jesse Bettencourt, Matthew J. Johnson, David K. Duvenaud
An Optimal Elimination Algorithm for Learning a Best Arm
Avinatan Hassidim, Ron Kupfer, Yaron Singer
The Convex Relaxation Barrier, Revisited: Tightened Single-Neuron Relaxations for Neural Network Verification
Christian Tjandraatmadja, Ross Anderson, Joey Huchette, Will Ma, Krunal Kishor Patel*, Juan Pablo Vielma
Escaping the Gravitational Pull of Softmax
Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li*, Csaba Szepesvari, Dale Schuurmans
The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise
Ilias Diakonikolas, Daniel M. Kane, Pasin Manurangsi
PAC-Bayes Learning Bounds for Sample-Dependent Priors
Pranjal Awasthi, Satyen Kale, Stefani Karp, Mehryar Mohri
Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications
Sarah Perrin, Julien Perolat, Mathieu Lauriere, Matthieu Geist, Romuald Elie, Olivier Pietquin
What Do Neural Networks Learn When Trained With Random Labels?
Hartmut Maennel, Ibrahim M. Alabdulmohsin, Ilya O. Tolstikhin, Robert Baldock*, Olivier Bousquet, Sylvain Gelly, Daniel Keysers
Online Planning with Lookahead Policies
Yonathan Efroni, Mohammad Ghavamzadeh, Shie Mannor
Smoothly Bounding User Contributions in Differential Privacy
Alessandro Epasto, Mohammad Mahdian, Jieming Mao, Vahab Mirrokni, Lijie Ren
Differentially Private Clustering: Tight Approximation Ratios
Badih Ghazi, Ravi Kumar, Pasin Manurangsi
Hitting the High Notes: Subset Selection for Maximizing Expected Order Statistics
Aranyak Mehta, Uri Nadav, Alexandros Psomas*, Aviad Rubinstein
Myersonian Regression
Allen Liu, Renato Leme, Jon Schneider
Assisted Learning: A Framework for Multi-Organization Learning
Xun Xian, Xinran Wang, Jie Ding, Reza Ghanadan
Adversarial Robustness via Robust Low Rank Representations
Pranjal Awasthi, Himanshu Jain, Ankit Singh Rawat, Aravindan Vijayaraghavan
Multi-Plane Program Induction with 3D Box Priors
Yikai Li, Jiayuan Mao, Xiuming Zhang, Bill Freeman, Josh Tenenbaum, Noah Snavely, Jiajun Wu
Privacy Amplification via Random Check-Ins
Borja Balle, Peter Kairouz, Brendan McMahan, Om Dipakbhai Thakkar, Abhradeep Thakurta
Rethinking Pre-training and Self-training
Barret Zoph, Golnaz Ghiasi, Tsung-Yi Lin, Yin Cui, Hanxiao Liu, Ekin Dogus Cubuk, Quoc Le
Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing
Arthur Delarue, Ross Anderson, Christian Tjandraatmadja
Online Agnostic Boosting via Regret Minimization
Nataly Brukhim, Xinyi Chen, Elad Hazan, Shay Moran*
From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering
Ines Chami, Albert Gu, Vaggos Chatziafratis, Christopher Ré
Faithful Embeddings for Knowledge Base Queries
Haitian Sun, Andrew Arnold*, Tania Bedrax Weiss, Fernando Pereira, William W. Cohen
Contextual Reserve Price Optimization in Auctions via Mixed Integer Programming
Joey Huchette, Haihao Lu, Hossein Esfandiari, Vahab Mirrokni
An Operator View of Policy Gradient Methods
Dibya Ghosh, Marlos C. Machado, Nicolas Le Roux
Reinforcement Learning with Feedback Graphs
Christoph Dann, Yishay Mansour, Mehryar Mohri, Ayush Sekhari, Karthik Sridharan
On Completeness-aware Concept-Based Explanations in Deep Neural Networks
Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar
Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement
Benjamin Eysenbach, Xinyang Geng, Sergey Levine, Ruslan Salakhutdinov
The Flajolet-Martin Sketch Itself Preserves Differential Privacy: Private Counting with Minimal Space
Adam Smith, Shuang Song, Abhradeep Thakurta
What is Being Transferred in Transfer Learning?
Behnam Neyshabur, Hanie Sedghi, Chiyuan Zhang
Latent Bandits Revisited
Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Craig Boutilier
MetaSDF: Meta-Learning Signed Distance Functions
Vincent Sitzmann, Eric Chan, Richard Tucker, Noah Snavely, Gordon Wetzstein
Measuring Robustness to Natural Distribution Shifts in Image Classification
Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, Ludwig Schmidt
Robust Optimization for Fairness with Noisy Protected Groups
Serena Wang, Wenshuo Guo, Harikrishna Narasimhan, Andrew Cotter, Maya Gupta, Michael I. Jordan
Learning Discrete Energy-based Models via Auxiliary-variable Local Exploration
Hanjun Dai, Rishabh Singh, Bo Dai, Charles Sutton, Dale Schuurmans
Breaking the Communication-Privacy-Accuracy Trilemma
Wei-Ning Chen, Peter Kairouz, Ayfer Ozgur
Differentiable Meta-Learning of Bandit Policies
Craig Boutilier, Chih-wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer
Multi-Stage Influence Function
Hongge Chen*, Si Si, Yang Li, Ciprian Chelba, Sanjiv Kumar, Duane Boning, Cho-Jui Hsieh
Compositional Visual Generation with Energy Based Models
Yilun Du, Shuang Li, Igor Mordatch
O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Chulhee Yun, Yin-Wen Chang, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, Sanjiv Kumar
Curriculum By Smoothing
Samarth Sinha, Animesh Garg, Hugo Larochelle
Online Linear Optimization with Many Hints
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit
Prediction with Corrupted Expert Advice
Idan Amir, Idan Attias, Tomer Koren, Roi Livni, Yishay Mansour
Agnostic Learning with Multiple Objectives
Corinna Cortes, Mehryar Mohri, Javier Gonzalvo, Dmitry Storcheus
CoSE: Compositional Stroke Embeddings
Emre Aksan, Thomas Deselaers*, Andrea Tagliasacchi, Otmar Hilliges
Reparameterizing Mirror Descent as Gradient Descent
Ehsan Amid, Manfred K. Warmuth
Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition
Ben Adlam, Jeffrey Pennington
DisARM: An Antithetic Gradient Estimator for Binary Latent Variables
Zhe Dong, Andriy Mnih, George Tucker
Big Self-Supervised Models are Strong Semi-Supervised Learners
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton
JAX MD: A Framework for Differentiable Physics
Samuel S. Schoenholz, Ekin D. Cubuk
Gradient Surgery for Multi-Task Learning
Tianhe Yu, Saurabh Kumar, Abhishek Gupta, Sergey Levine, Karol Hausman, Chelsea Finn
LoopReg: Self-supervised Learning of Implicit Surface Correspondences, Pose and Shape for 3D Human Mesh Registration
Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll
ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA
Ilyes Khemakhem, Ricardo P. Monti, Diederik P. Kingma, Aapo Hyvärinen
Demystifying Orthogonal Monte Carlo and Beyond
Han Lin, Haoxian Chen, Tianyi Zhang, Clement Laroche, Krzysztof Choromanski
FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence
Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel
Compositional Generalization via Neural-Symbolic Stack Machines
Xinyun Chen, Chen Liang, Adams Wei Yu, Dawn Song, Denny Zhou
Universally Quantized Neural Compression
Eirikur Agustsson, Lucas Theis
Self-Distillation Amplifies Regularization in Hilbert Space
Hossein Mobahi, Mehrdad Farajtabar, Peter L. Bartlett
ShapeFlow: Learnable Deformation Flows Among 3D Shapes
Chiyu “Max” Jiang, Jingwei Huang, Andrea Tagliasacchi, Leonidas Guibas
Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form
Hicham Janati, Boris Muzellec, Gabriel Peyré, Marco Cuturi
High-Fidelity Generative Image Compression
Fabian Mentzer*, George Toderici, Michael Tschannen*, Eirikur Agustsson
COT-GAN: Generating Sequential Data via Causal Optimal Transport
Tianlin Xu, Li K. Wenliang, Michael Munn, Beatrice Acciaio
When Do Neural Networks Outperform Kernel Methods?
Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari
Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding
Victor Veitch, Anisha Zaveri
Exemplar VAE: Linking Generative Models, Nearest Neighbor Retrieval, and Data Augmentation
Sajad Norouzi, David J. Fleet, Mohamamd Norouzi
Mitigating Forgetting in Online Continual Learning via Instance-Aware Parameterization
Hung-Jen Chen, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun
Consistent Plug-in Classifiers for Complex Objectives and Constraints
Shiv Kumar Tavker, Harish Guruprasad Ramaswamy, Harikrishna Narasimhan
Online MAP Inference of Determinantal Point Processes
Aditya Bhaskara, Amin Karbasi, Silvio Lattanzi, Morteza Zadimoghaddam
Organizing Recurrent Network Dynamics by Task-computation to Enable Continual Learning
Lea Duncker, Laura Driscoll, Krishna V. Shenoy, Maneesh Sahani, David Sussillo
RL Unplugged: A Collection of Benchmarks for Offline Reinforcement Learning
Caglar Gulcehre, Ziyu Wang, Alexander Novikov, Thomas Paine, Sergio Gómez, Konrad Zolna, Rishabh Agarwal, Josh S. Merel, Daniel J. Mankowitz, Cosmin Paduraru, Gabriel Dulac-Arnold, Jerry Li, Mohammad Norouzi, Matthew Hoffman, Nicolas Heess, Nando de Freitas
Neural Execution Engines: Learning to Execute Subroutines
Yujun Yan*, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi
Spin-Weighted Spherical CNNs
Carlos Esteves, Ameesh Makadia, Kostas Daniilidis
An Efficient Nonconvex Reformulation of Stagewise Convex Optimization Problems
Rudy R. Bunel, Oliver Hinder, Srinadh Bhojanapalli, Krishnamurthy Dvijotham
Stochastic Optimization with Laggard Data Pipelines
Naman Agarwal, Rohan Anil, Tomer Koren, Kunal Talwar*, Cyril Zhang*
Regularizing Towards Permutation Invariance In Recurrent Models
Edo Cohen-Karlik, Avichai Ben David, Amir Globerson
Fast and Accurate kk-means++ via Rejection Sampling
Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler*, Ola Svensson
Fairness Without Demographics Through Adversarially Reweighted Learning
Preethi Lahoti*, Alex Beutel, Jilin Chen, Kang Lee, Flavien Prost, Nithum Thain, Xuezhi Wang, Ed Chi
Gradient Estimation with Stochastic Softmax Tricks
Max Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, Chris J. Maddison
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Zhao Chen, Jiquan Ngiam, Yanping Huang, Thang Luong, Henrik Kretzschmar, Yuning Chai, Dragomir Anguelov
A Spectral Energy Distance for Parallel Speech Synthesis
Alexey A. Gritsenko, Tim Salimans, Rianne van den Berg, Jasper Snoek, Nal Kalchbrenner
Ode to an ODE
Krzysztof Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani
RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
Ekin Dogus Cubuk, Barret Zoph, Jon Shlens, Quoc Le
On Adaptive Attacks to Adversarial Example Defenses
Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry
Fair Performance Metric Elicitation
Gaurush Hiranandani, Harikrishna Narasimhan, Oluwasanmi O. Koyejo
Robust Pre-Training by Adversarial Contrastive Learning
Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang
Why are Adaptive Methods Good for Attention Models?
Jingzhao Zhang, Sai Praneeth Karimireddy, Andreas Veit, Seungyeon Kim, Sashank Reddi, Sanjiv Kumar, Suvrit Sra
PyGlove: Symbolic Programming for Automated Machine Learning
Daiyi Peng, Xuanyi Dong, Esteban Real, Mingxing Tan, Yifeng Lu, Gabriel Bender, Hanxiao Liu, Adam Kraft, Chen Liang, Quoc Le
Fair Hierarchical Clustering
Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii, Yuyan Wang
Fairness with Overlapping Groups; a Probabilistic Perspective
Forest Yang*, Moustapha Cisse, Sanmi Koyejo
Differentiable Top-k with Optimal Transport
Yujia Xie*, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister
The Origins and Prevalence of Texture Bias in Convolutional Neural Networks
Katherine Hermann, Ting Chen, Simon Kornblith
Approximate Heavily-Constrained Learning with Lagrange Multiplier Models
Harikrishna Narasimhan, Andrew Cotter, Yichen Zhou, Serena Wang, Wenshuo Guo
Evaluating Attribution for Graph Neural Networks
Benjamin Sanchez-Lengeling, Jennifer Wei, Brian Lee, Emily Reif, Peter Wang, Wesley Wei Qian, Kevin McCloskey, Lucy Colwell, Alexander Wiltschko
Sliding Window Algorithms for k-Clustering Problems
Michele Borassi, Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, Morteza Zadimoghaddam
Meta-Learning Requires Meta-Augmentation
Janarthanan Rajendran*, Alex Irpan, Eric Jang
What Makes for Good Views for Contrastive Learning?
Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, Phillip Isola
Supervised Contrastive Learning
Prannay Khosla*, Piotr Teterwak*, Chen Wang*, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan
Critic Regularized Regression
Ziyu Wang, Alexander Novikov, Konrad Zolna, Josh Merel, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas
Off-Policy Imitation Learning from Observations
Zhuangdi Zhu, Kaixiang Lin, Bo Dai, Jiayu Zhou
Effective Diversity in Population Based Reinforcement Learning
Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts
Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards
Yijie Guo, Jongwook Choi, Marcin Moczulski, Shengyu Feng, Samy Bengio, Mohammad Norouzi, Honglak Lee
Object-Centric Learning with Slot Attention
Francesco Locatello*, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, Thomas Kipf
On the Power of Louvain in the Stochastic Block Model
Vincent Cohen-Addad, Adrian Kosowski, Frederik Mallmann-Trenn, David Saulpic
Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks
David Bieber, Charles Sutton, Hugo Larochelle, Daniel Tarlow
SMYRF – Efficient Attention using Asymmetric Clustering
Giannis Daras, Nikita Kitaev, Augustus Odena, Alexandros G. Dimakis
Graph Contrastive Learning with Augmentations
Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, Yang Shen
WOR and p’s: Sketches for ℓp-Sampling Without Replacement
Edith Cohen, Rasmus Pagh, David P. Woodruff
Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ramamoorthi, Jonathan Barron, Ren Ng
Model Selection in Contextual Stochastic Bandit Problems
Aldo Pacchiano, My Phan, Yasin Abbasi Yadkori, Anup Rao, Julian Zimmert, Tor Lattimore, Csaba Szepesvari
Adapting to Misspecification in Contextual Bandits
Dylan J. Foster, Claudio Gentile, Mehryar Mohri, Julian Zimmert
Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning
Nino Vieillard, Tadashi Kozunoú, Bruno Scherrer, Olivier Pietquin, Rémi Munos, Matthieu Geist
Learning with Differentiable Pertubed Optimizers
Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach
Munchausen Reinforcement Learning
Nino Vieillard, Olivier Pietquin, Matthieu Geist
Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment
Ben Usman, Avneesh Sud, Nick Dufour, Kate Saenko
Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling
Tong Che, Ruixiang Zhang, Jascha Sohl-Dickstein, Hugo Larochelle, Liam Paull, Yuan Cao, Yoshua Bengio
Sample Complexity of Uniform Convergence for Multicalibration
Eliran Shabat, Lee Cohen, Yishay Mansour
Implicit Regularization and Convergence for Weight Normalization
Xiaoxia Wu, Edgar Dobriban, Tongzheng Ren, Shanshan Wu, Zhiyuan Li, Suriya Gunasekar, Rachel Ward, Qiang Liu
Most ReLU Networks Suffer from ℓ² Adversarial Perturbations
Amit Daniely, Hadas Shacham
Geometric Exploration for Online Control
Orestis Plevrakis, Elad Hazan
PLLay: Efficient Topological Layer Based on Persistent Landscapes
Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Sik Kim, Frederic Chazal, Larry Wasserman
Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness
Jeremiah Zhe Liu*, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, Balaji Lakshminarayanan
Bayesian Deep Ensembles via the Neural Tangent Kernel
Bobby He, Balaji Lakshminarayanan, Yee Whye Teh
Hyperparameter Ensembles for Robustness and Uncertainty Quantification
Florian Wenzel, Jasper Snoek, Dustin Tran, Rodolphe Jenatton
Conic Descent and its Application to Memory-efficient Optimization Over Positive Semidefinite Matrices
John Duchi, Oliver Hinder, Andrew Naber, Yinyu Ye
On the Training Dynamics of Deep Networks with L₂ Regularization
Aitor Lewkowycz, Guy Gur-Ari
The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
Wei Hu*, Lechao Xiao, Ben Adlam, Jeffrey Pennington
Adaptive Probing Policies for Shortest Path Routing
Aditya Bhaskara, Sreenivas Gollapudi, Kostas Kollias, Kamesh Munagala
Optimal Approximation — Smoothness Tradeoffs for Soft-Max Functions
Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, Emmanouil Zampetakis
An Unsupervised Information-Theoretic Perceptual Quality Metric
Sangnie Bhardwaj, Ian Fischer, Johannes Ballé, Troy Chinen
Learning Graph Structure With A Finite-State Automaton Layer
Daniel Johnson, Hugo Larochelle, Daniel Tarlow
Estimating Training Data Influence by Tracing Gradient Descent
Garima Pruthi, Frederick Liu, Satyen Kale, Mukund Sundararajan
Tutorials
Designing Learning Dynamics
Organizers: Marta Garnelo, David Balduzzi, Wojciech Czarnecki
Where Neuroscience meets AI (And What’s in Store for the Future)
Organizers: Jane Wang, Kevin Miller, Adam Marblestone
Offline Reinforcement Learning: From Algorithm Design to Practical Applications
Organizers: Sergey Levine, Aviral Kumar
Practical Uncertainty Estimation and Out-of-Distribution Robustness in Deep Learning
Organizers: Dustin Tran, Balaji Lakshminarayanan, Jasper Snoek
Abstraction & Reasoning in AI systems: Modern Perspectives
Organizers: Francois Chollet, Melanie Mitchell, Christian Szegedy
Policy Optimization in Reinforcement Learning
Organizers: Sham M Kakade, Martha White, Nicolas Le Roux
Federated Learning and Analytics: Industry Meets Academia
Organizers: Brendan McMahan, Virginia Smith, Peter Kairouz
Deep Implicit Layers: Neural ODEs, Equilibrium Models, and Differentiable Optimization
Organizers: David Duvenaud, J. Zico Kolter, Matthew Johnson
Beyond Accuracy: Grounding Evaluation Metrics for Human-Machine Learning Systems
Organizers: Praveen Chandar, Fernando Diaz, Brian St. Thomas
Workshops
Black in AI Workshop @ NeurIPS 2020 (Diamond Sponsor)
Mentorship Roundtables: Natasha Jacques
LatinX in AI Workshop @ NeurIPS 2020 (Platinum Sponsor)
Organizers include: Pablo Samuel Castro
Invited Speaker: Fernanda Viégas
Mentorship Roundtables: Tomas Izo
Queer in AI Workshop @ NeurIPS 2020 (Platinum Sponsor)
Organizers include: Raphael Gontijo Lopes
Women in Machine Learning (Platinum Sponsor)
Organizers include: Xinyi Chen, Jessica Schrouff
Invited Speaker: Fernanda Viégas
Sponsor Talk: Jessica Schrouff
Mentorship Roundtables: Hanie Sedghi, Marc Bellemare, Katherine Heller, Rianne van den Berg, Natalie Schluter, Colin Raffel, Azalia Mirhoseini, Emily Denton, Jesse Engel, Anusha Ramesh, Matt Johnson, Jeff Dean, Laurent Dinh, Samy Bengio, Yasaman Bahri, Corinna Cortes, Nicolas le Roux, Hugo Larochelle, Sergio Guadarrama, Natasha Jaques, Pablo Samuel Castro, Elaine Le, Cory Silvear
Muslims in ML
Organizers include: Mohammad Norouzi
Resistance AI Workshop
Organizers include: Elliot Creager, Raphael Gontijo Lopes
Privacy Preserving Machine Learning — PriML and PPML Joint Edition
Organizers include: Adria Gascon, Mariana Raykova
OPT2020: Optimization for Machine Learning
Organizers include: Courtney Paquette
Machine Learning for Health (ML4H): Advancing Healthcare for All
Organizers include: Subhrajit Roy
Human in the Loop Dialogue Systems
Organizers include: Rahul Goel
Invited Speaker: Ankur Parikh
Self-Supervised Learning for Speech and Audio Processing
Organizers include: Tara Sainath
Invited Speaker: Bhuvana Ramabhadran
3rd Robot Learning Workshop
Organizers include: Alex Bewley, Vincent Vanhoucke
Invited Speaker: Pete Florence
Workshop on Deep Learning and Inverse Problems
Invited Speaker: Peyman Milanfar
Crowd Science Workshop: Remoteness, Fairness, and Mechanisms as Challenges of Data Supply by Humans for Automation
Invited Speakers: Lora Aroyo, Praveen Paritosh
Workshop on Fair AI in Finance
Invited Speakers: Berk Ustun, Madeleine Clare Elish
Object Representations for Learning and Reasoning
Panel Moderator: Klaus Greff
Deep Reinforcement Learning
Organizers include: Chelsea Finn
Invited Speaker: Marc Bellemare
Algorithmic Fairness Through the Lens of Causality and Interpretability
Organizers include: Awa Dieng, Jessica Schrouff, Fernando Diaz
Machine Learning for the Developing World (ML4D)
Steering Committee Member: Ernest Mwebaze
Machine Learning for Engineering Modeling, Simulation and Design
Organizers include: Stephan Hoyer
Machine Learning for Creativity and Design
Organizers include: Adam Roberts, Daphne Ippolito
Invited Speaker: Jesse Engel
Cooperative AI
Invited Speaker: Natasha Jaques
International Workshop on Scalability, Privacy, and Security in Federated Learning (SpicyFL 2020)
Invited Speaker: Brendan McMahan
Machine Learning for Molecules
Organizers include: Jennifer Wei
Invited Speaker: Benjamin Sanchez-Lengeling
Navigating the Broader Impacts of AI Research
Panelists include: Nyalleng Moorosi, Colin Raffel, Natalie Schluter, Ben Zevenbergen
Beyond BackPropagation: Novel Ideas for Training Neural Architectures
Organizers include: Yanping Huang
Differentiable Computer Vision, Graphics, and Physics in Machine Learning
Invited Speaker: Andrea Tagliasacchi
AI for Earth Sciences
Invited Speaker: Milind Tambe
Machine Learning for Mobile Health
Organizers include: Katherine Heller, Marianne Njifon
Shared Visual Representations in Human and Machine Intelligence (SVRHM)
Invited Speaker: Gamaleldin Elsayed
The Challenges of Real World Reinforcement Learning
Organizers include: Gabriel Dulac-Arnold
Invited Speaker: Chelsea Finn
Workshop on Computer Assisted Programming (CAP)
Organizers include: Charles Sutton, Augustus Odena
Self-Supervised Learning — Theory and Practice
Organizers include: Barret Zoph
Invited Speaker: Quoc V. Le
Offline Reinforcement Learning
Organizers include: Rishabh Agarwal, George Tucker
Machine Learning for Systems
Organizers include: Anna Goldie, Azalia Mirhoseini, Martin Maas
Invited Speaker: Ed Chi
Deep Learning Through Information Geometry
Organizers include: Alexander Alemi
Expo
Drifting Efficiently Through the Stratosphere Using Deep Reinforcement Learning
Organizers include: Sal Candido
Accelerating Eye Movement Research via Smartphone Gaze
Organizers include: Vidhya Navalpakkam
Mining and Learning with Graphs at Scale
Organizers include: Bryan Perozzi, Vahab Mirrokni, Jonathan Halcrow, Jakub Lacki
*Work performed while at Google