Categories
Offsites

Speeding Up Reinforcement Learning with a New Physics Simulation Engine

Reinforcement learning (RL) is a popular method for teaching robots to navigate and manipulate the physical world, which itself can be simplified and expressed as interactions between rigid bodies1 (i.e., solid physical objects that do not deform when a force is applied to them). In order to facilitate the collection of training data in a practical amount of time, RL usually leverages simulation, where approximations of any number of complex objects are composed of many rigid bodies connected by joints and powered by actuators. But this poses a challenge: it frequently takes millions to billions of simulation frames for an RL agent to become proficient at even simple tasks, such as walking, using tools, or assembling toy blocks.

While progress has been made to improve training efficiency by recycling simulation frames, some RL tools instead sidestep this problem by distributing the generation of simulation frames across many simulators. These distributed simulation platforms yield impressive results that train very quickly, but they must run on compute clusters with thousands of CPUs or GPUs which are inaccessible to most researchers.

In “Brax – A Differentiable Physics Engine for Large Scale Rigid Body Simulation”, we present a new physics simulation engine that matches the performance of a large compute cluster with just a single TPU or GPU. The engine is designed to both efficiently run thousands of parallel physics simulations alongside a machine learning (ML) algorithm on a single accelerator and scale millions of simulations seamlessly across pods of interconnected accelerators. We’ve open sourced the engine along with reference RL algorithms and simulation environments that are all accessible via Colab. Using this new platform, we demonstrate 100-1000x faster training compared to a traditional workstation setup.

Three typical RL workflows. The left shows a typical workstation flow: on a single machine, with the environment on CPU, training takes hours or days. The middle shows a typical distributed simulation flow: training takes minutes by farming simulation out to thousands of machines. The right shows the Brax flow: learning and large batch simulation occur side by side on a single CPU/GPU chip.

Physics Simulation Engine Design Opportunities
Rigid body physics are used in video games, robotics, molecular dynamics, biomechanics, graphics and animation, and other domains. In order to accurately model such systems, simulators integrate forces from gravity, motor actuation, joint constraints, object collisions, and others to simulate the motion of a physical system across time.

Simulation of three spherical bodies, a wall, two joints, and one actuator. For each simulation timestep, forces and torques are integrated together to update the positions, rotations, and velocities of each physical body.

Taking a closer look at how most physics simulation engines are designed today, there are a few large opportunities to improve efficiency. As we noted above, a typical robotics learning pipeline places a single learner in a tight feedback with many simulations in parallel, but upon analyzing this architecture, one finds that:

  1. This layout imposes an enormous latency bottleneck. Because the data must travel over the network within a datacenter, the learner must wait for 10,000+ nanoseconds to fetch experience from the simulator. Were this experience instead already on the same device as the learner’s neural network, latency would drop to <1 nanosecond.
  2. The computation necessary for training the agent (one simulation step, followed by one update of the agent’s neural network) is overshadowed by the computation spent packaging the data (i.e., marshalling data within the engine, then into a wire format such as protobuf, then into TCP buffers, and then undoing all these steps on the learner side).
  3. The computations happening within each simulator are remarkably similar, but not exactly the same.

Brax Design
In response to these observations, Brax is designed so that its physics calculations are exactly the same across each of its thousands of parallel environments by ensuring that the simulation is free of branches (i.e., simulation “if” logic that diverges as a result of the environment state). An example of a branch in a physics engine is the application of a contact force between a ball and a wall: different code paths will execute depending on whether the ball is touching the wall. That is, if the ball contacts the wall, separate code for simulating the ball’s bounce off the wall will execute. Brax employs a mix of the following three strategies to avoid branching:

  • Replace the discrete branching logic with a continuous function, such as approximating the ball-wall contact force using a signed distance function. This approach results in the most efficiency gains.
  • Evaluate the branch during JAX’s just-in-time compile. Many branches based on static properties of the environment, such as whether it’s even possible for two objects to collide, may be evaluated prior to simulation time.
  • Run both sides of the branch during simulation but then select only the required results. Because this executes some code that isn’t ultimately used, it wastes operations compared to the above.

Once the calculations are guaranteed to be exactly uniform, the entire training architecture can be reduced in complexity to be executed on a single TPU or GPU. Doing so removes the computational overhead and latency of cross-machine communication. In practice, these changes lower the cost of training by 100x-1000x for comparable workloads.

Brax Environments
Environments are tiny packaged worlds that define a task for an RL agent to learn. Environments contain not only the means to simulate a world, but also functions, such as how to observe the world and the definition of the goal in that world.

A few standard benchmark environments have emerged in recent years for testing new RL algorithms and for evaluating the impact of those algorithms using metrics commonly understood by research scientists. Brax includes four such ready-to-use environments that come from the popular OpenAI gym: Ant, HalfCheetah, Humanoid, and Reacher.

   
   

   
From left to right: Ant, HalfCheetah, Humanoid, and Reacher are popular baseline environments for RL research.

Brax also includes three novel environments: dexterous manipulation of an object (a popular challenge in robotics), generalized locomotion (an agent that goes to a target placed anywhere around it), and a simulation of an industrial robot arm.

   
   
Left: Grasp, a claw hand that learns dexterous manipulation. Middle: Fetch, a toy, box-like dog learns a general goal-based locomotion policy. Right: Simulation of UR5e, an industrial robot arm.

Performance Benchmarks
The first step for analyzing Brax’s performance is to measure the speed at which it can simulate large batches of environments, because this is the critical bottleneck to overcome in order for the learner to consume enough experience to learn quickly.

These two graphs below show how many physics steps (updates to the state of the environment) Brax can produce as it is tasked with simulating more and more environments in parallel. The graph on the left shows that Brax scales the number of steps per second linearly with the number of parallel environments, only hitting memory bandwidth bottlenecks at 10,000 environments, which is not only enough for training single agents, but also suitable for training entire populations of agents. The graph on the right shows two things: first, that Brax performs well not only on TPU, but also on high-end GPUs (see the V100 and P100 curves), and second, that by leveraging JAX’s device parallelism primitives, Brax scales seamlessly across multiple devices, reaching hundreds of millions of physics steps per second (see the TPUv3 8×8 curve, which is 64 TPUv3 chips directly connected to each other over a high speed interconnect) .

Left: Scaling of the simulation steps per second for each Brax environment on a 4×2 TPU v3. Right: Scaling of the simulation steps per second for several accelerators on the Ant environment.

Another way to analyze Brax’s performance is to measure its impact on the time it takes to run a reinforcement learning experiment on a single workstation. Here we compare Brax training the popular Ant benchmark environment to its OpenAI counterpart, powered by the MuJoCo physics engine.

In the graph below, the blue line represents a standard workstation setup, where a learner runs on the GPU and the simulator runs on the CPU. We see that the time it takes to train an ant to run with reasonable proficiency (a score of 4000 on the y axis) drops from about 3 hours for the blue line, to about 10 seconds using Brax on accelerator hardware. It’s interesting to note that even on CPU alone (the grey line), Brax performs more than an order of magnitude faster, benefitting from learner and simulator both sitting in the same process.

Brax’s optimized PPO versus a standard GPU-backed PPO learning the MuJoCo-Ant-v2 environment, evaluated for 10 million steps. Note the x-axis is log-wallclock-time in seconds. Shaded region indicates lowest and highest performing seeds over 5 replicas, and solid line indicates mean.

Physics Fidelity
Designing a simulator that matches the behavior of the real world is a known hard problem that this work does not address. Nevertheless, it is useful to compare Brax to a reference simulator to ensure it is producing output that is at least as valid. In this case, we again compare Brax to MuJoCo, which is well-regarded for its simulation quality. We expect to see that, all else being equal, a policy has a similar reward trajectory whether trained in MuJoCo or Brax.

MuJoCo-Ant-v2 vs. Brax Ant, showing the number of environment steps plotted against the average episode score achieved for the environment. Both environments were trained with the same standard implementation of SAC. Shaded region indicates lowest and highest performing seeds over five runs, and solid line indicates the mean.

These curves show that as the reward rises at about the same rate for both simulators, both engines compute physics with a comparable level of complexity or difficulty to solve. And as both curves top out at about the same reward, we have confidence that the same general physical limits apply to agents operating to the best of their ability in either simulation.

We can also measure Brax’s ability to conserve linear momentum, angular momentum, and energy.

Linear momentum (left), angular momentum (middle), and energy (right) non-conservation scaling for Brax as well as several other physics engines. The y-axis indicates drift from the expected calculation (higher is smaller drift, which is better), and the x axis indicates the amount of time being simulated.

This measure of physics simulation quality was first proposed by the authors of MuJoCo as a way to understand how the simulation drifts off course as it is tasked with computing larger and larger time steps. Here, Brax performs similarly as its neighbors.

Conclusion
We invite researchers to perform a more qualitative measure of Brax’s physics fidelity by training their own policies in the Brax Training Colab. The learned trajectories are recognizably similar to those seen in OpenAI Gym.

Our work makes fast, scalable RL and robotics research much more accessible — what was formerly only possible via large compute clusters can now be run on workstations, or for free via hosted Google Colaboratory. Our Github repository includes not only the Brax simulation engine, but also a host of reference RL algorithms for fast training. We can’t wait to see what kind of new research Brax enables.

Acknowledgements
We’d like to thank our paper co-authors: Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. We also thank Erwin Coumans for advice on building physics engines, Blake Hechtman and James Bradbury for providing optimization help with JAX and XLA, and Luke Metz and Shane Gu for their advice. We’d also like to thank Vijay Sundaram, Wright Bagwell, Matt Leffler, Gavin Dodd, Brad Mckee, and Logan Olson, for helping to incubate this project.


1 Due to the complexity of the real world, there is also ongoing research exploring the physics of deformable bodies

Categories
Misc

NVIDIA Deep Learning Institute Public Workshop Summer Series Extended

NVIDIA DLI Summer WorkshopsWorkshops are conducted live in a virtual classroom environment with expert guidance from NVIDIA-certified instructors.NVIDIA DLI Summer Workshops

The NVIDIA Deep Learning Institute (DLI) extended its popular public workshop summer series through September 2021. These workshops are conducted live in a virtual classroom environment with expert guidance from NVIDIA-certified instructors. Participants have access to fully configured GPU-accelerated servers in the cloud to perform hands-on exercises. 

To register, visit our website. Space is limited so we encourage you to sign up early.

Here is our current public workshop schedule:

August

Fundamentals of Deep Learning
Wed, August 25, 9:00 a.m. to 5:00 p.m. PDT (NALA)

Fundamentals of Accelerated Computing with CUDA Python
Thu, August 26, 9:00 a.m. to 5:00 p.m. CEST (EMEA)

September

Fundamentals of Deep Learning
Thu, September 16, 9:00 a.m. to 5:00 p.m. CEST (EMEA)

Deep Learning for Industrial Inspection
Tue, September 21, 9:00 a.m. to 5:00 p.m. CEST (EMEA)
Tue, September 21, 9:00 a.m. to 5:00 p.m. PDT (NALA)

Applications of AI for Anomaly Detection
Wed, September 22, 9:00 a.m. to 5:00 p.m. CEST (EMEA)
Wed, September 22, 9:00 a.m. to 5:00 p.m. PDT (NALA)

Building Transformer-Based Natural Language Processing Applications 
Thu, September 23, 9:00 a.m. to 5:00 p.m. PDT (NALA)

Visit the DLI website for details on each course and the full schedule of upcoming instructor-led workshops, which is regularly updated with new training opportunities.

For more information, e-mail nvdli@nvidia.com.

Categories
Misc

A Sparkle in Their AIs: Students Worldwide Rev Robots with Jetson Nano

Every teacher has a story about the moment a light switched on for one of their students. David Tseng recalls a high school senior in Taipei excited at a summer camp to see a robot respond instantly when she updated her software. After class, she had a lot of questions and later built an AI-powered Read article >

The post A Sparkle in Their AIs: Students Worldwide Rev Robots with Jetson Nano appeared first on The Official NVIDIA Blog.

Categories
Misc

Deploying and Accelerating AI at the Edge with the NVIDIA EGX Platform

Edge computing has been around for a long time, but has recently become a hot topic because of the convergence of three major trends – IoT, 5G, and AI.

Edge computing has been around for a long time, but has recently become a hot topic because of the convergence of three major trends – IoT, 5G, and AI. 

IoT devices are becoming smarter and more capable, increasing the breadth of applications that can be deployed on them and the environments they can be deployed in. Simultaneously, recent advancements in 5G capabilities give confidence that this technology will soon be able to connect IoT devices wirelessly anywhere they are deployed. In fact, analysts predict that there will be over 1 billion 5G connected devices by 2023. Lastly, AI successfully moved from research projects into practical applications, changing the landscape for retailers, factories, hospitals, and many more. 

So what does the convergence of these trends mean? An explosion in the number of IoT devices deployed. 

Experts estimate there are over 30 billion IoT devices installed today, and Arm predicts that by 2035, there will be over 1 trillion devices. With that many IoT devices deployed, the amount of data collected skyrocketed, putting strain on current cloud infrastructures. Organizations soon found themselves in a position where the AI applications they deployed needed large amounts of data to generate compelling insights, but the latency for their cloud infrastructure to process data and send insights back to the edge were unsustainable. So they turned to edge computing. 

By putting the processing power at the location that sensors are collecting data, organizations reduce the latency for applications to deliver insights. For some situations, such as autonomous machines at factories, the latency reduction represents a critical safety component. 

That is where NVIDIA comes in. The NVIDIA Edge AI solution offers a complete end-to-end AI platform for deploying AI at the edge. It starts with NVIDIA-Certified Systems. 

NVIDIA-Certified Systems combine the computing power of NVIDIA GPUs with secure high-bandwidth, low-latency networking solutions from NVIDIA. Validated for performance, functionality, scalability, and security – IT teams ensure AI workloads deployed from the NGC catalog, NVIDIA’s GPU-optimized hub of HPC and AI software, run at full performance. These servers are backed by enterprise-grade support, including direct access to NVIDIA experts, minimizing system downtime and maximizing user productivity. 

To build and accelerate applications running on NVIDIA-Certified Systems, NVIDIA offers an extensive toolkit of SDKs, application frameworks, and other tools designed to help developers build AI applications for every industry. These include pretrained models, training scripts, optimized framework containers, inference engines, and more. With these tools, organizations get a head start on building unique AI applications regardless of workload or industry. 

Once organizations have the hardware to accelerate AI and an AI application to deploy, the next step is to ensure that there is infrastructure in place to manage and scale the application. Without a platform to manage AI at the edge, organizations face the difficult and costly task of manually updating systems at edge locations every time a new software update is released. 

NVIDIA Fleet Command is a cloud service that securely deploys, manages, and scales AI applications across distributed edge infrastructure. Purpose-built for AI, Fleet Command is a turnkey solution for AI lifecycle management, offering streamlined deployments, layered security, and detailed monitoring capabilities — so organizations can go from zero to AI in minutes.

The complete edge AI solution gives organizations the tools needed to build an end-to-end edge deployment. KION Group, the number one global supply chain solutions provider, uses NVIDIA solutions to fulfill order faster and more efficiently. 

To learn more about NVIDIA edge AI solutions, check out Deploying and Accelerating AI at the Edge With the NVIDIA EGX Platform.

Categories
Misc

A Grand Slam: GFN Thursday Scores 1,000th PC Game, Streaming Instantly on GeForce NOW

This GFN Thursday marks a new millennium for GeForce NOW. By adding 13 games this week, our cloud game-streaming service now offers members instant access to 1,000 PC games. That’s 1,000 games that members can stream instantly to underpowered PCs, Macs, Chromebooks, SHIELD TVs, Android devices, iPhones and iPads. Devices that otherwise wouldn’t dream of Read article >

The post A Grand Slam: GFN Thursday Scores 1,000th PC Game, Streaming Instantly on GeForce NOW appeared first on The Official NVIDIA Blog.

Categories
Misc

Starting to think about AI Fairness

The topic of AI fairness metrics is as important to society as it is confusing. Confusing it is due to a number of reasons: terminological proliferation, abundance of formulae, and last not least the impression that everyone else seems to know what they’re talking about. This text hopes to counteract some of that confusion by starting from a common-sense approach of contrasting two basic positions: On the one hand, the assumption that dataset features may be taken as reflecting the underlying concepts ML practitioners are interested in; on the other, that there inevitably is a gap between concept and measurement, a gap that may be bigger or smaller depending on what is being measured. In contrasting these fundamental views, we bring together concepts from ML, legal science, and political philosophy.

Categories
Offsites

Starting to think about AI Fairness

The topic of AI fairness metrics is as important to society as it is confusing. Confusing it is due to a number of reasons: terminological proliferation, abundance of formulae, and last not least the impression that everyone else seems to know what they’re talking about. This text hopes to counteract some of that confusion by starting from a common-sense approach of contrasting two basic positions: On the one hand, the assumption that dataset features may be taken as reflecting the underlying concepts ML practitioners are interested in; on the other, that there inevitably is a gap between concept and measurement, a gap that may be bigger or smaller depending on what is being measured. In contrasting these fundamental views, we bring together concepts from ML, legal science, and political philosophy.

Categories
Misc

ValueError: Dimensions must be equal

ValueError: Dimensions must be equal

This is the error message I got.

ValueError: Dimensions must be equal but are 9 and 4 for ‘{{node mean_absolute_percentage_error/sub}} = Sub[T=DT_FLOAT](IteratorGetNext:1, sequential_5/dense_11/Sigmoid)’ with input shapes: [32,9,4,4], [32,4,9,4].

But my output and my y are the same shapes

https://preview.redd.it/whz4e8oy78b71.png?width=2212&format=png&auto=webp&s=82f480c5e51a307de44dd219aae721d3810ce9b6

submitted by /u/Striking-Warning9533
[visit reddit] [comments]

Categories
Misc

Uses for image_dataset_from_directory?

I have my file structure set up so that each class of images has its own directory.

I’m a bit confused on how to use image_dataset_from_directory to separate the data into train and val. If I set the subset parameter to train, then will the subset parameter tell it the fraction to use for train, and the same for if I set the subset parameter to validation?

Thanks!

submitted by /u/llub888
[visit reddit] [comments]

Categories
Misc

Build a Fully Interactive Dashboard in a Few Lines of Python

Work continues on improving the UX and capabilities of our GPU cross-filter dashboard library, cuxfilter. Here is a quick recap of its latest features.  First, it is as easy as ever to access cuxfilter. Just run a standard RAPIDS install as shown in the getting started page. Additionally, you can try it online at PaperSpace. … Continued

Work continues on improving the UX and capabilities of our GPU cross-filter dashboard library, cuxfilter. Here is a quick recap of its latest features. 

First, it is as easy as ever to access cuxfilter. Just run a standard RAPIDS install as shown in the getting started page. Additionally, you can try it online at PaperSpace. One of the powerful benefits of a full RAPIDS installation means you can work on your data and visualize it within a single jupyter notebook or lab instance.

 Figure 1: Using cuxfilter to build a full interactive dashboard in a few lines of Python.

Here is a list of some of the major feature highlights:

  • High-density scatter, line, heatmap, and graph charts through Datashader. Also, choropleth maps from Deck.gl, and bar and line charts from bokeh.
  • A fully responsive, customizable layout with a widget side panel.
  • Themes, such as the dark one shown preceding. 
  • A preview feature using await d.preview() that generates a .png image of the full dashboard inline with your notebook.
  • Ability to export the selected data in an active dashboard by using the d.export() call. 
  • Ability to deploy as a standalone application (outside of a notebook), as explained in our documentation for deploying multiuser dashboards.

You can try all these and more features in our tutorial notebook, and follow along in our tutorial video. The screenshot shown in figure 2 is one of the dashboards created. A compelling example of how to combine RAPIDS libraries together to quickly create powerful cross-filterable dashboards in just a few lines of python.

Figure 2: Screenshot of double graph dashboard from gist below

Going forward, we continue to improve cuxfilter and use it to collaborate with the enormous python viz community, such as bokeh, holoviews, panel, and datashader projects. We encourage you to try it out, and as always, if you have any issues with feature requests, let us know on our GitHub. Happy cross-filtering!

Resources: