Categories
Offsites

Multi-Task Robotic Reinforcement Learning at Scale

For general-purpose robots to be most useful, they would need to be able to perform a range of tasks, such as cleaning, maintenance and delivery. But training even a single task (e.g., grasping) using offline reinforcement learning (RL), a trial and error learning method where the agent uses training previously collected data, can take thousands of robot-hours, in addition to the significant engineering needed to enable autonomous operation of a large-scale robotic system. Thus, the computational costs of building general-purpose everyday robots using current robot learning methods becomes prohibitive as the number of tasks grows.

Multi-task data collection across multiple robots where different robots collect data for different tasks.

In other large-scale machine learning domains, such as natural language processing and computer vision, a number of strategies have been applied to amortize the effort of learning over multiple skills. For example, pre-training on large natural language datasets can enable few- or zero-shot learning of multiple tasks, such as question answering and sentiment analysis. However, because robots collect their own data, robotic skill learning presents a unique set of opportunities and challenges. Automating this process is a large engineering endeavour, and effectively reusing past robotic data collected by different robots remains an open problem.

Today we present two new advances for robotic RL at scale, MT-Opt, a new multi-task RL system for automated data collection and multi-task RL training, and Actionable Models, which leverages the acquired data for goal-conditioned RL. MT-Opt introduces a scalable data-collection mechanism that is used to collect over 800,000 episodes of various tasks on real robots and demonstrates a successful application of multi-task RL that yields ~3x average improvement over baseline. Additionally, it enables robots to master new tasks quickly through use of its extensive multi-task dataset (new task fine-tuning in <1 day of data collection). Actionable Models enables learning in the absence of specific tasks and rewards by training an implicit model of the world that is also an actionable robotic policy. This drastically increases the number of tasks the robot can perform (via visual goal specification) and enables more efficient learning of downstream tasks.

Large-Scale Multi-Task Data Collection System
The cornerstone for both MT-Opt and Actionable Models is the volume and quality of training data. To collect diverse, multi-task data at scale, users need a way to specify tasks, decide for which tasks to collect the data, and finally, manage and balance the resulting dataset. To that end, we create a scalable and intuitive multi-task success detector using data from all of the chosen tasks. The multi-task success is trained using supervised learning to detect the outcome of a given task and it allows users to quickly define new tasks and their rewards. When this success detector is being applied to collect data, it is periodically updated to accommodate distribution shifts caused by various real-world factors, such as varying lighting conditions, changing background surroundings, and novel states that the robots discover.

Second, we simultaneously collect data for multiple distinct tasks across multiple robots by using solutions to easier tasks to effectively bootstrap learning of more complex tasks. This allows training of a policy for the harder tasks and improves the data collected for them. As such, the amount of per-task data and the number of successful episodes for each task grows over time. To further improve the performance, we focus data collection on underperforming tasks, rather than collecting data uniformly across tasks.

This system collected 9600 robot hours of data (from 57 continuous data collection days on seven robots). However, while this data collection strategy was effective at collecting data for a large number of tasks, the success rate and data volume was imbalanced between tasks.

Learning with MT-Opt
We address the data collection imbalance by transferring data across tasks and re-balancing the per-task data. The robots generate episodes that are labelled as success or failure for each task and are then copied and shared across other tasks. The balanced batch of episodes is then sent to our multi-task RL training pipeline to train the MT-Opt policy.

Data sharing and task re-balancing strategy used by MT-Opt. The robots generate episodes which then get labelled as success or failure for the current task and are then shared across other tasks.

MT-Opt uses Q-learning, a popular RL method that learns a function that estimates the future sum of rewards, called the Q-function. The learned policy then picks the action that maximizes this learned Q-function. For multi-task policy training, we specify the task as an extra input to a large Q-learning network (inspired by our previous work on large-scale single-task learning with QT-Opt) and then train all of the tasks simultaneously with offline RL using the entire multi-task dataset. In this way, MT-Opt is able to train on a wide variety of skills that include picking specific objects, placing them into various fixtures, aligning items on a rack, rearranging and covering objects with towels, etc.

Compared to single-task baselines, MT-Opt performs similarly on the tasks that have the most data and significantly improves performance on underrepresented tasks. So, for a generic lifting task, which has the most supporting data, MT-Opt achieved an 89% success rate (compared to 88% for QT-Opt) and achieved a 50% average success rate across rare tasks, compared to 1% with a single-task QT-Opt baseline and 18% using a naïve, multi-task QT-Opt baseline. Using MT-Opt not only enables zero-shot generalization to new but similar tasks, but also can quickly (in about 1 day of data collection on seven robots) be fine-tuned to new, previously unseen tasks. For example, when applied to an unseen towel-covering task, the system achieved a zero-shot success rate of 92% for towel-picking and 79% for object-covering, which wasn’t present in the original dataset.

Example tasks that MT-Opt is able to learn, such as instance and indiscriminate grasping, chasing, placing, aligning and rearranging.

<!–

Example tasks that MT-Opt is able to learn, such as instance and indiscriminate grasping, chasing, placing, aligning and rearranging.

–>

Towel-covering task that was not present in the original dataset. We fine-tune MT-Opt on this novel task in 1 day to achieve a high (>90%) success rate.

Learning with Actionable Models
While supplying a rigid definition of tasks facilitates autonomous data collection for MT-Opt, it limits the number of learnable behaviors to a fixed set. To enable learning a wider range of tasks from the same data, we use goal-conditioned learning, i.e., learning to reach given goal configurations of a scene in front of the robot, which we specify with goal images. In contrast to explicit model-based methods that learn predictive models of future world observations, or approaches that employ online data collection, this approach learns goal-conditioned policies via offline model-free RL.

To learn to reach any goal state, we perform hindsight relabeling of all trajectories and sub-sequences in our collected dataset and train a goal-conditioned Q-function in a fully offline manner (in contrast to learning online using a fixed set of success examples as in recursive classification). One challenge in this setting is the distributional shift caused by learning only from “positive” hindsight relabeled examples. This we address by employing a conservative strategy to minimize Q-values of unseen actions using artificial negative actions. Furthermore, to enable reaching temporary-extended goals, we introduce a technique for chaining goals across multiple episodes.

Actionable Models relabel sub-sequences with all intermediate goals and regularize Q-values with artificial negative actions.

Training with Actionable Models allows the system to learn a large repertoire of visually indicated skills, such as object grasping, container placing and object rearrangement. The model is also able to generalize to novel objects and visual objectives not seen in the training data, which demonstrates its ability to learn general functional knowledge about the world. We also show that downstream reinforcement learning tasks can be learned more efficiently by either fine-tuning a pre-trained goal-conditioned model or through a goal-reaching auxiliary objective during training.

Example tasks (specified by goal-images) that our Actionable Model is able to learn.

Conclusion
The results of both MT-Opt and Actionable Models indicate that it is possible to collect and then learn many distinct tasks from large diverse real-robot datasets within a single model, effectively amortizing the cost of learning across many skills. We see this an important step towards general robot learning systems that can be further scaled up to perform many useful services and serve as a starting point for learning downstream tasks.

This post is based on two papers, “MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale” and “Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills,” with additional information and videos on the project websites for MT-Opt and Actionable Models.

Acknowledgements
This research was conducted by Dmitry Kalashnikov, Jake Varley, Yevgen Chebotar, Ben Swanson, Rico Jonschkowski, Chelsea Finn, Sergey Levine, Yao Lu, Alex Irpan, Ben Eysenbach, Ryan Julian and Ted Xiao. We’d like to give special thanks to Josh Weaver, Noah Brown, Khem Holden, Linda Luu and Brandon Kinman for their robot operation support; Anthony Brohan for help with distributed learning and testing infrastructure; Tom Small for help with videos and project media; Julian Ibarz, Kanishka Rao, Vikas Sindhwani and Vincent Vanhoucke for their support; Tuna Toksoz and Garrett Peake for improving the bin reset mechanisms; Satoshi Kataoka, Michael Ahn, and Ken Oslund for help with the underlying control stack, and the rest of the Robotics at Google team for their overall support and encouragement. All the above contributions were incredibly enabling for this research.

Categories
Misc

NASA and NVIDIA Collaborate to Accelerate Scientific Data Science Use Cases, Part 2

Over the past couple of years, NVIDIA and NASA have been working closely on accelerating data science workflows using RAPIDS, and integrating these GPU-accelerated libraries with scientific use cases. This is the second post in a series that will discuss the results from an air pollution monitoring use case conducted during the COVID-19 pandemic, and … Continued

Over the past couple of years, NVIDIA and NASA have been working closely on accelerating data science workflows using RAPIDS, and integrating these GPU-accelerated libraries with scientific use cases. This is the second post in a series that will discuss the results from an air pollution monitoring use case conducted during the COVID-19 pandemic, and share code snippets to port existing CPU workflows to RAPIDS on NVIDIA GPUs. This first post of this series, we covered Accelerated Simulation of Air Pollution.

Monitoring the Decline of Air Pollution Across the Globe During the COVID-19 Pandemic

Another air quality application leveraging XGBoost and RAPIDS is the live monitoring of air quality through the combination of surface monitoring data and near real-time model data produced by the NASA GEOS-CF model. This approach is particularly useful to detect and quantify air pollution anomalies, i.e., patterns in air quality observations that cannot be explained by the model. The most prominent (and extreme) example of this is the decline of air pollution in the wake of the COVID-19 pandemic. As a result of the stay-at-home orders, traffic emissions of air pollutants such as nitrogen dioxide (NO2) decreased significantly, as apparent from both satellite observations and surface monitoring data. However, exactly quantifying the impact of COVID-19 restrictions on surface air quality solely based on these atmospheric observations is very difficult given that many other factors impact surface air pollution, including weather, chemistry, or wildfires.

Video 1. Daily nitrogen dioxide (NO2) measurements and percentage difference from a baseline model. Credits: Christoph Keller and  NASA’s Scientific Visualization Studio.

The study conducted by Christoph and his colleagues fuses millions of observations – taken at 4,778 monitoring sites in 47 countries – with co-located model output produced by GEOS-CF. The sample code found at the covid_no2 repo, demonstrates the application for 10 selected cities (New York, Washington DC, San Francisco, Los Angeles, Beijing, Wuhan, London, Paris, Madrid, and Milan). The air quality observations for years 2018 through 2020 at these cities were obtained from the OpenAQ database (https://openaq.org/#/) and the European Environment Agency EEA (https://discomap.eea.europa.eu/map/fme/AirQualityExport.htm) and pre-processed into a single file for convenience:

Similarly, we preprocessed the GEOS-CF model output by subsampling the gridded native output (available in netCDF format at https://portal.nccs.nasa.gov/datashare/gmao/geos-cf/v1/das/) to the observation locations and saved the corresponding data as a table in text format that can be read similarly to the observation data:

The model data contains not only model predicted NO2 concentrations but also a number of ancillary model variables, such as information about the local weather and atmospheric composition (as taken from GEOS-CF) or calendar information.

The model data is then combined with the surface observation data to build an XGBoost bias-correction model that relates the model NO2 prediction to the observations. This is to account for the fact that the model prediction can be systematically different from the observations, e.g., because the model output represents the average over a 25×25 km2 domain while the surface observation is typically much more local in nature.

To train the XGBoost model, the model data is merged with the observations and the model bias is calculated from the merged data set to provide the label for the training (see sample code in repository mentioned above for full example):

Using the trained model, we can also calculate the SHAP values on GPU to analyze the factors that contribute most to the bias correction:

As shown in the figure below, the SHAP values for New York indicate that the most important predictors for the NO2 model bias (relative to the actual observation) is NO2 itself, followed by the hour of the day, wind speed (V10M) and the height of the planetary boundary layer (ZPBL). (note: to output the SHAP values in the example code the input argument shap needs to be set to 1, as well as gpu argument for accelerated code on the GPU).

Figure 2: Distribution of the 20 most important SHAP values for
 the bias correction model for New York City
.

After training the XGBoost model on the 2018 and 2019 data (using 8-fold cross-validation), we extend the NO2 bias correction to the model data produced for year 2020, resulting in a time series of the expected NO2 concentrations at a given observation site if there had been no mobility restrictions due to the pandemic (the NO2 ‘baseline’). The difference between the actual observations and these bias-corrected model predictions offers an estimate of the impact of COVID-19 restrictions on NO2 concentrations. The figure below shows the difference between observations and model predictions at New York City from Jan 2019 to Jan 2021. The solid green line shows the best estimate, defined as the 21-day rolling average across all four observation sites available for New York City, and the dark and light shaded areas show two uncertainty estimates derived from the time-averaged and hourly model-observation samples, respectively.

Throughout year 2019, the bias-corrected model mean estimate is in close agreement with the observations. Coinciding with the outbreak of the pandemic, the observed NO2 over New York City declines by up to 40% and only gradually recovers to the expected value by year end.

Difference between observed NO2 concentration and model-predicted values for New York City. Negative anomalies indicate a reduction in observed NO2 concentration relative to
the expected ‘business-as-usual' scenario.
Figure 3: Dark and light-shaded areas indicate low and high uncertainty estimates.

Conducting this analysis at 4,778 locations across the world enables us to identify regional patterns in air quality anomalies, and our study shows that these patterns tend to be related to differences in timing and intensity of COVID-19 restrictions.

The here described approach is not only useful to analyze the impact of COVID-19 on air pollution but can be generally used to monitor air pollution across the world (both the observations and model data are available in near real-time). Given the growing number of available air quality observations, fast data processing becomes ever more critical for such an application. The sample code available at https://github.com/GEOS-CF/covid_no2 demonstrates that conducting the analysis on a V100 GPU using cuDF offers an overall speed-up of up to 5x for, each city compared to 20-core Intel Xeon E5-2689 CPU.

References:

Keller, C. A., Evans, M. J., Knowland, K. E., Hasenkopf, C. A., Modekurty, S., Lucchesi, R. A., Oda, T., Franca, B. B., Mandarino, F. C., Díaz Suárez, M. V., Ryan, R. G., Fakes, L. H., and Pawson, S.: Global impact of COVID-19 restrictions on the surface concentrations of nitrogen dioxide and ozone, Atmos. Chem. Phys., 21, 3555–3592, https://doi.org/10.5194/acp-21-3555-2021, 2021.

NASA Model Reveals How Much COVID-related Pollution Levels Deviated from the Norm

A Data Science Series (Part 1): NASA and NVIDIA Collaborate to Accelerate Scientific Data Science Use Cases

Categories
Misc

Machine Learning with ML.NET – NLP with BERT

Machine Learning with ML.NET - NLP with BERT submitted by /u/RubiksCodeNMZ
[visit reddit] [comments]
Categories
Misc

Using feature columns on 3D data: How to avoid having 750+ input layers ?

Hello,

I’m trying to use Tensorflow to predict the outcome of a sport contest. What I have for every sample is the context of the competition (weather, type of stadium, …) and the competition history for every competing team.

Here is an overview of the data of every sample:

Context Teams History
CompetitionData [[CompetitionData], [CompetitionData]] (for every team, the past competition data and their result (win/lose/ranking)

I’m going to try to develop a Learning to Rank System, where given the context and every team history, predict the final ranking.

I think that feature columns are useful in this case, as they can ease the processing of the Competition Data. However, I can’t find a way to reuse the feature column code across all Competition data dimensions. The ideal would be to reuse the DenseFeatures layers across all competition data, but it doesn’t seems to work as tf requires the data to be of dict type to be fed to the Densefeature layers, which needs to be passed one by one trough an input layer to be correctly inputted.

I have also tried this:

history_input = tf.keras.layers.DenseFeatures([info_columns, info_columns2, info_columns3, age])

statics_hist = {
‘rapport’: Input((1,), dtype=tf.dtypes.int32, name=”rapport”),
‘weight’: Input((1,), dtype=tf.dtypes.int32, name=”weight”),
‘age’: Input((1,), name=”age”),
‘first’: Input((1,), dtype=tf.dtypes.int32, name=”first”),
‘stadium’: Input((1,), name=”stadiul”, dtype=tf.dtypes.string)
}
test = [stack([history_input(statics_hist) for _ in range(NB_RACE_HISTORY)], axis=1) for __ in range(MAX_NUMBER_PLAYERS)]
test = stack(test, axis=1)
But as I have 15 players with each 10 competition history of several columns, this gives me 750+ input layers, which can’t be the right way to go.

I have thought about flattening the data beforehand, but then I would lose the ability to run an LSTM trough a player history, which is important to modelize his current performance.

I’m not really sure of the right way to go, could anyone point me in the right direction ?

submitted by /u/Wats0ns
[visit reddit] [comments]

Categories
Misc

Infrared Image is better than visible light image for machine navigation in such environments

submitted by /u/Z_future1
[visit reddit] [comments]

Categories
Misc

AI and 5G to Fuel Next Wave of IoT Services, Says GTC Panel of Telecom Experts

The rollout of 5G for edge AI services promises to fuel a magic carpet ride into the future for everything from autonomous vehicles, to supply chains and education. That was a key takeaway from a panel of five 5G experts speaking at NVIDIA’s GPU Technology Conference this week. With speed boosts up to 10x that Read article >

The post AI and 5G to Fuel Next Wave of IoT Services, Says GTC Panel of Telecom Experts appeared first on The Official NVIDIA Blog.

Categories
Misc

Deep Learning Classifies Largest-Ever Catalog of Distant Galaxies

University of Pennsylvania researchers have used convolutional neural networks to catalog the morphology of 27 million galaxies, giving astronomers a massive dataset for studying the evolution of the universe.

University of Pennsylvania researchers have used convolutional neural networks to catalog the morphology of 27 million galaxies, giving astronomers a massive dataset for studying the evolution of the universe. 

“Galaxy morphology is one of the key aspects of galaxy evolution,” said study author Helena Domínguez Sánchez, former postdoc at Penn. “The shape and structure of galaxies has a lot of information about the way they were formed, and knowing their morphologies gives us clues as to the likely pathways for the formation of the galaxies.”

While past research projects have focused on classifying images of bright, nearby galaxies, the team focused their neural network on fainter, further galaxies captured by the Dark Energy Survey, an international project to image an eighth of the sky. 

The further away a galaxy is from the Milky Way, the longer it takes for light to reach our corner of the universe. So images from the Dark Energy Survey, which contains more images of distant galaxies than previous studies, “show us what galaxies looked like more than 6 billion years ago,” said Mariangela Bernardi, professor in the Department of Physics and Astronomy at Penn. 

While the researchers already had a CNN that could categorize galaxies as spiral or elliptical, the model had been trained on nearby galaxies captured in the Sloan Digital Sky Survey. To teach the neural network to process further, more pixelated images from the Dark Energy Survey, the team collected a labeled dataset of 20,000 galaxies from both astronomical surveys, where the morphological classifications were already known. 

They then created a synthetic dataset that simulated how the images would look if they depicted galaxies that were further away.

Simulated spiral and elliptical galaxy images illustrate how fainter and more distant galaxies would look in the Dark Energy Survey dataset.

Once trained on a combination of simulated and real galaxy images, the CNN was applied to the massive Dark Energy Survey dataset, cataloging 27 million galaxies as either early-type or late-type galaxies, and as face-on or edge-on images. 

The team used NVIDIA GPUs on Amazon Web Services for training and inference of their neural network. They found the model was 97 percent accurate at classifying the morphology of even faint galaxies too difficult to categorize by eye. 

The resulting collection is the largest multi-band catalog of automated galaxy morphologies to date.

“We pushed the limits by three orders of magnitude, to objects that are 1,000 times fainter than the original ones,” said lead author Jesús Vega-Ferrero. “That is why we were able to include so many more galaxies in the catalog.”

The researchers are next combining the morphological classification predictions with additional factors including the age, mass, distance, star-formation rate, and chemical composition of the galaxies to enable a better understanding of the relationship between galaxy morphology and star formation.

Find the full study in Monthly Notices of the Royal Astronomical Society. A preprint of the paper is available on ArXiv. 

Read more >> 

Categories
Misc

I Made This Using TF Lite On a RPI 4

I Made This Using TF Lite On a RPI 4 submitted by /u/NathanielF478
[visit reddit] [comments]
Categories
Misc

I published a tutorial where I build a preprocessing pipeline for audio data

I often get a ton of questions from programmers and data scientists about audio data preprocessing:

– How can I extract spectrograms?

– How can I normalise the signal?

– What if I have files of different lengths?

To answer these questions and more, I published a tutorial where you can learn how to build an audio preprocessing pipeline for AI applications. The pipeline batch preprocesses audio files applying Short-Time Fourier Transform, zero-padding, normalisation all in one go!

This video is a new installment of the series “Generating sound with neural nets”, where you can learn to generate sound using Variational AutoEncoders.

Here’s the video:

https://www.youtube.com/watch?v=O04v3cgHNeM&list=PL-wATfeyAMNpEyENTc-tVH5tfLGKtSWPp&index=12

submitted by /u/diabulusInMusica
[visit reddit] [comments]

Categories
Misc

EV Technology Goes into Hyperdrive with Mercedes-Benz EQS

Mercedes-Benz is calling on its long heritage of luxury to accelerate electric vehicle technology with the new EQS sedan. The premium automaker lifted the wraps off the long-awaited flagship EV during a digital event today. The focal point of the revolutionary vehicle is the MBUX Hyperscreen, a truly intuitive and personalized AI cockpit, powered by Read article >

The post EV Technology Goes into Hyperdrive with Mercedes-Benz EQS appeared first on The Official NVIDIA Blog.