The CMake package and build system for both libraries continue to improve with add_subdirectory support, installation rules, status messages, and other features that make these libraries easier to use from CMake projects.
Thrust 1.11.0 is a major release providing bug fixes and performance enhancements. It includes a new sort algorithm that provides up to 2x more performance from thrust::sort when used with certain key types and hardware. The new thrust::shuffle algorithm has been tweaked to improve the randomness of the output.
CUB 1.11.0 is a major release providing bug fixes and performance enhancements. It includes a new DeviceRadixSort backend that improves performance by up to 2x on supported keys and hardware.
The CMake package and build system for both libraries continue to improve with add_subdirectory support, installation rules, status messages, and other features that make these libraries easier to use from CMake projects.
About Thrust and CUB
Thrust provides STL-like templated interfaces to several algorithms and data structures designed for high performance heterogeneous parallel computing. Thrust abstractions are agnostic of any particular parallel framework. CUB is a library of collective primitives and utilities. CUB is specific to CUDA C++ and its interfaces explicitly accommodate CUDA-specific features.
Thrust and CUB are complementary and are often used together.
Determined AI’s application available in the NVIDIA NGC catalog, a GPU-optimized hub for AI applications, provides an open-source platform that enables deep learning engineers to focus on building models and not managing infrastructure.
As AI becomes universal, enterprise leaders are looking to empower their AI teams with fully integrated and automated development environments.
Determined AI’s application available in the NVIDIA NGC catalog, a GPU-optimized hub for AI applications, provides an open-source platform that enables deep learning engineers to focus on building models and not managing infrastructure.
Determined AI is a member of NVIDIA Inception AI and startup incubator.
Users can train models faster using state-of-the-art distributed training, without changing their model code. With built-in state-of-the-art hyperparameter tuning, deep learning engineers working on use-cases such Computer Vision or Natural Language Processing (NLP), can find high-quality models up to 100x faster than conventional tools.
Determined includes built-in experiment tracking, a lightweight model registry, and smart GPU scheduling, allowing deep learning engineers to get models from idea to production dramatically more quickly and at lower cost.
The product is packaged as user-managed software delivered via Helm charts for deployment to Kubernetes, or as a set of Docker containers for both on-premise or cloud based instances. Every GPU node runs an agent, and a central control node schedules workloads and coordinates work between the agents. Users submit deep learning workloads to the system and the automated system handles everything from job scheduling, resource provisioning to distributed training.
Jobs can be submitted by developers directly or programmatically via APIs, as well as via easy integrations with other ML workflow systems like Kubeflow Pipelines and Airflow. The system tracks metadata, model checkpoints, and metrics, and allows models to easily be exported to downstream serving systems like Seldon and TensorFlow Serving. Users can interact with the system’s web UI to monitor job or cluster status, and debug and interact with live or historical experiments.
This article covers an interview with OIa Stalmach, co-creator of House or Crickets, a 2020 DXR Spotlight Contest Winner.
An interview with OIa Stalmach, co-creator of House or Crickets, a 2020 DXR Spotlight Contest Winner
During the cold of winter, you may find the video for House of Crickets (below) comforting. Ola Stalmach and Jakub Lesniak have used real-time ray tracing techniques to create a warm, inviting summertime space that looks real enough to get you checking how much a rental might cost.
NVIDIA: What is the development team size for House of Crickets?
Ola: It’s pretty small, only me and my partner (Jakub Lesniak). I’m working mostly on the artistic side of projects like interior design, creating moods and environments. Kuba helps me to improve the technical side of it and make the Unreal magic work in a hundred percent.
NVIDIA: Your demonstration looked so real, I wanted to move into that space! Is the visualization based on a real location?
Ola: I am so happy that you like it. Yes, the visualization is based on a real location. This is actually a project of my small apartment settled in attached houses near Krakow in Poland. The whole point was to make something beautiful in a normal, very standard house with a tiny garden.
NVIDIA: What real-time ray tracing features did you build into House of Crickets, and why?
Ola: I’ve used all available features (the less visible is Raytraced Ambient Occlusion). Since the iteration of ray tracing in Unreal Engine 4.25 it finally started working for me, especially Raytraced Indirect Illumination.
This technology is the future of graphics, so it is good to start getting to know it if you plan to visualize amazing worlds in the future. What I really love to see in architecture visualisations are great accurate reflections and shadows, which were really hard to achieve for me with previous real-time technology.
NVIDIA: A lot of developers starting out with DXR struggle with performance because they try to make everything reflective. Do you have any advice on materials to use when building an environment that will be ray traced?
Ola: There is that balance between roughness, noise reduction algorithm and number of reflection samples. I would try to avoid middle-reflective surfaces, high values of reflection samples and bounces. It is all about tweaking and playing with materials and sacrificing something in the way.
NVIDIA: How long did it take to develop this demo?
Ola: I think I spent a total 1 month, maybe more just learning what is possible. It was done in my free time for my small Studio OS.
NVIDIA: What’s next for you with real-time ray tracing?
Ola: I already use ray tracing (or the hybrid with lightbaking) in at least 80% of my projects. The plans are not only for architecture designs, but also to use it in Virtual Production, which I’ve gotten deeply into in the past two years. I’m looking forward to new improvements in both Engine and Hardware. Ray tracing will allow me to focus only on the artistic side of my creations and work with just intuition taken from the real World.
Announcing a preview release of Isaac Gym – NVIDIA’s physics simulation environment for reinforcement learning research.
For several years, NVIDIA’s research teams have been working to leverage GPU technology to accelerate reinforcement learning (RL). As a result of this promising research, NVIDIA is pleased to announce a preview release of Isaac Gym – NVIDIA’s physics simulation environment for reinforcement learning research. RL-based training is now more accessible as tasks that once required thousands of CPU cores can now instead be trained using a single GPU.
RL has become one of the most promising research areas in machine learning and has demonstrated great potential for solving complex problems. RL-based systems have achieved superhuman performance in very challenging tasks, ranging from classic strategy games such as Go and Chess, to real-time computer games like StarCraft and DOTA.
Isaac Gym and NVIDIA GPUs, a reinforcement learning supercomputer
Until now, most RL robotics researchers were forced to use clusters of CPU cores for the physically accurate simulations needed to train RL algorithms. In one of the more well-known projects, the OpenAI team used almost 30,000 CPU cores (920 computers with 32 cores each) to train their robot in the Rubik’s Cube task.
In a similar task, Learning Dexterous In-Hand Manipulation, OpenAI used a cluster of 384 systems with 6144 CPU cores, plus 8 Volta V100 GPUs and required close to 30 hours of training to achieve its best results. This in-hand cube object orientation task is a challenging dexterous manipulation task, with complex physics and dynamics, many contacts, and a high-dimensional continuous control space.
Isaac Gym includes an example of this cube manipulation task for researchers to recreate the OpenAI experiment. The example supports training both recurrent and feed-forward neural networks, as well as domain randomization of physics properties that help with sim-to-real transfer. With Isaac Gym, researchers can achieve the same level of success as OpenAI’s supercomputer — on a single A100 GPU — in about 10 hours!
End to End GPU RL
Isaac Gym achieves these results by leveraging NVIDIA’s PhysX GPU-accelerated simulation engine, allowing it to gather the experience data required for robotics RL.
In addition to fast physics simulations, Isaac Gym also enables observation and reward calculations to take place on the GPU, thereby avoiding significant performance bottlenecks. In particular, costly data transfers between the GPU and the CPU are eliminated.
Implemented this way, Isaac Gym enables a complete end-to-end GPU RL pipeline.
Isaac Gym
Isaac Gym provides a basic API for creating and populating a scene with robots and objects, supporting loading data from URDF and MJCF file formats. Each environment is duplicated as many times as needed, and can be simulated simultaneously without interaction with other environments.
Isaac Gym provides a PyTorch tensor-based API to access the results of physics simulation work, allowing RL observation and reward calculations to be built using the PyTorch JIT runtime system, which dynamically compiles the python code that does these calculations into CUDA code, running on the GPU.
Observation tensors can be used as inputs to a policy inference network, and the resulting action tensors can be directly fed back into the physics system. Rollouts of observation, reward, and action buffers can stay on the GPU for the entire learning process eliminating the need to read data back from the CPU.
This set-up permits tens of thousands of simultaneous environments on a single GPU, allowing researchers to easily run experiments locally on their desktops that previously required an entire data center.
Isaac Gym also includes a basic Proximal Policy Optimization (PPO) implementation and a straightforward RL task system, but users may substitute alternative task systems or RL algorithms as desired. Also, while the included examples use PyTorch, users should also be able to integrate with TensorFlow based RL systems with some further customization.
Some additional features of Isaac Gym include:
Support for a variety of environment sensors – position, velocity, force, torque, etc.
Runtime domain randomization of physics parameters
Jacobian / inverse kinematics support
Research Results
NVIDIA’s research team has been applying Isaac Gym to a wide variety of projects. You can take a sneak-peek at some of these below, but stay tuned to https://developer.nvidia.com/blog/ for more details on these projects.
Get Started Today
Are you a researcher or academic interested in RL for robotics applications? Please download and try Isaac Gym.
Future Plans
The core functionality of Isaac Gym will be made available as part of the NVIDIA Omniverse Platform and NVIDIA’s Isaac Sim, a robotics simulation platform built on Omniverse. Until then we are making this standalone preview release available to researchers and academics to show the possibilities of end-to-end GPU-based RL and help accelerate your work in this arena.
To help AI research like this make the leap from academia to commercial or government deployment, NVIDIA recently announced the Applied Research Accelerator Program. The program supports applied research on NVIDIA platforms for GPU-accelerated application deployments.
To help AI research like this make the leap from academia to commercial or government deployment, NVIDIA today announced the Applied Research Accelerator Program. The program supports applied research on NVIDIA platforms for GPU-accelerated application deployments.
Learn more about breakthrough NVIDIA technologies and dive into our expansive selection of graphics and simulation sessions.
Our GTC Fall 2020 virtual event featured a record breaking number of sessions, podcasts, demos, research posters, and more. We are now opening access to all the great content shared at the conference through the new NVIDIA On-Demand catalog. Learn more about breakthrough NVIDIA technologies and dive into our expansive selection of graphics and simulation sessions.
Ampere Architecture
The launch of our new Ampere architecture was a long anticipated event of this year. Designed for the age of elastic computing, it delivers the next giant leap by providing unmatched acceleration at every scale, enabling these innovators to do their life’s work. Learn about the architecture and its benefits with these sessions:
NVIDIA Ampere for Professional Workflows: Learn about these new GPUs for professional visual computing and how they provide the power of the next generation of RTX from the desktop to the data center.
Rendering at the Speed of Light on NVIDIA Ampere GPUs: Explore hardware improvements over the previous generation, best practices for application developers, and tooling improvements that will help you write high-performance graphics code.
Discover new pipelines and tools that are emerging in the graphics industry, and learn how professionals are using the newest technologies to enhance content creation.
Rendering Games With Millions of Ray-Traced Lights: Hear from NVIDIA experts to learn about the latest research in the area of many-light sampling, plus implications of many-light rendering on game content creation pipelines.
Bringing Ray-Traced Visualization to Collaborative Workflows: Omniverse XR: See how augmented reality is integrated within the Omniverse rendering pipeline, and how Omniverse AR is applied across a range of use cases. Plus, get an inside look at the different strategies created in Omniverse Kit to bring its ray tracing engine to virtual reality.
What’s New in Optix 7.2: Learn strategies to achieve optimal ray tracing performance with OptiX 7.2.
Learn about the advanced tools that help create high-quality immersive environments, and see why virtual and augmented reality are one of the most anticipated forms of content to arrive over 5G networks.
From asset creation to RTX acceleration, check out the innovative techniques that are transforming the future of graphic workflows across all industries.
Virtual Production with Cine Tracer: Hear how live action cinematographer Matt Workman has created a previsualization and virtual production app for the film industry using Unreal Engine.
Unreal Engine + RTX from a Filmmaker’s Perspective: See how today’s real-time tools with RTX power can enable indie filmmakers to tell high-concept stories without huge budgets, resources, and a big rendering farm.
Real-Time and Production Ray Tracing with V-Ray: Learn about advances in real-time ray tracing and production rendering for V-Ray workflows, including RTX acceleration and the use of CUDA, DXR, and OptiX.
Now available to all NVIDIA Developers, NVIDIA On-Demand is a catalog of technical sessions, podcasts, past keynotes, demos, research posters and more from NVIDIA GPU Technology Conferences across the global, as well as leading industry events.
Now available to all NVIDIA developers, NVIDIA On-Demand is a catalog of technical sessions, podcasts, past keynotes, demos, research posters, and more from NVIDIA GPU Technology Conferences across the global, as well as leading industry events. This will enable developers to learn at their own time, at their own pace, anywhere.
Through NVIDIA On-Demand, developers can also create playlists of their favorite content, which they can share digitally.
To access NVIDIA On-Demand, log in to your developer account. If you’re not already a member of the NVIDIA Developer Program, join for free here. In addition to on-demand content, NVIDIA Developer Program members have access to all the tools and training needed to build on NVIDIA’s technology platforms.
Watch part 1 of our popular NVIDIA On-Demand session, Rendering Game With Millions of Ray Traced Lights, where NVIDIA’s Chris Wyman provides an overview on reservoir spatiotemporal importance resampling (ReSTIR).
In this video, NVIDIA’s Chris Wyman provides an overview on reservoir spatiotemporal importance resampling (ReSTIR). This algorithm makes it possible to effectively sample from millions of area lights with just a few rays per pixel and produce results that can be easily denoised.
He also explains why now is the right time for game developers to move on from traditional lighting techniques to real-time ray tracing. The incredible visuals on display in the Marbles at Night demo show what’s possible with real-time ray tracing today. “If there’s one thing you take from this talk, I hope it is that ray tracing enables new lighting that exceeds the limits imposed by rasterization,” says Wyman.
This is a segment from a two-part video available on NVIDIA On-Demand, entitled “Rendering Game With Millions of Ray Traced Lights”. We encourage you to check out the remainder of the talk, in which NVIDIA’s Alexey Panteleev describes light resampling in practice, and details the roadmap for RTXDI, our new direct illumination SDK.
Researchers, developers, and engineers from all over the world are gathering virtually this year for the 2020 Neural Information Processing Systems (NeurlPS). NVIDIA Research will present its research through spotlight and posters.
Researchers, developers, and engineers from all over the world are gathering virtually this year for the 2020 Neural Information Processing Systems (NeurlPS). NVIDIA Research will present its research through spotlight and posters.
NVIDIA’s accepted papers at this year’s online NeurIPS feature a range of groundbreaking research in the field of neural information processing systems.
At the conference NVIDIA announced the upcoming availability of the Omniverse Kaolin App that allows high fidelity rendering and interactive visualization of 3D data and training results.
Explore the work NVIDIA is bringing to the NeurIPS community. You can find the full event overview of NVIDIA’s activities here.
Data augmentation technique enables AI model to emulate artwork from a small dataset from the Metropolitan Museum of Art — and opens up new potential applications in fields like healthcare.