Categories
Misc

Popular Open Source Thrust and CUB Libraries Updated

The CMake package and build system for both libraries continue to improve with add_subdirectory support, installation rules, status messages, and other features that make these libraries easier to use from CMake projects.

Thrust 1.11.0 is a major release providing bug fixes and performance enhancements. It includes a new sort algorithm that provides up to 2x more performance from thrust::sort when used with certain key types and hardware. The new thrust::shuffle algorithm has been tweaked to improve the randomness of the output. 

CUB 1.11.0 is a major release providing bug fixes and performance enhancements. It includes a new DeviceRadixSort backend that improves performance by up to 2x on supported keys and hardware. 

The CMake package and build system for both libraries continue to improve with add_subdirectory support, installation rules, status messages, and other features that make these libraries easier to use from CMake projects. 

About Thrust and CUB

Thrust provides STL-like templated interfaces to several algorithms and data structures designed for high performance heterogeneous parallel computing. Thrust abstractions are agnostic of any particular parallel framework.  CUB is a library of collective primitives and utilities. CUB is specific to CUDA C++ and its interfaces explicitly accommodate CUDA-specific features.

Thrust and CUB are complementary and are often used together. 

Learn more:

Categories
Misc

Determined AI Deep Learning Application now on the NGC Catalog

Determined AI’s application available in the NVIDIA NGC catalog, a GPU-optimized hub for AI applications, provides an open-source platform that enables deep learning engineers to focus on building models and not managing infrastructure.

As AI becomes universal, enterprise leaders are looking to empower their AI teams with fully integrated and automated development environments.

Determined AI’s application available in the NVIDIA NGC catalog, a GPU-optimized hub for AI applications, provides an open-source platform that enables deep learning engineers to focus on building models and not managing infrastructure.

Determined AI is a member of NVIDIA Inception AI and startup incubator.

Users can train models faster using state-of-the-art distributed training, without changing their model code. With built-in state-of-the-art hyperparameter tuning, deep learning engineers working on use-cases such Computer Vision or Natural Language Processing (NLP), can find high-quality models up to 100x faster than conventional tools.

Determined includes built-in experiment tracking, a lightweight model registry, and smart GPU scheduling, allowing deep learning engineers to get models from idea to production dramatically more quickly and at lower cost.

The product is packaged as user-managed software delivered via Helm charts for deployment to Kubernetes, or as a set of Docker containers for both on-premise or cloud based instances. Every GPU node runs an agent, and a central control node schedules workloads and coordinates work between the agents. Users submit deep learning workloads to the system and the automated system handles everything from job scheduling, resource provisioning to distributed training. 

Jobs can be submitted by developers directly or programmatically via APIs, as well as via easy integrations with other ML workflow systems like Kubeflow Pipelines and Airflow. The system tracks metadata, model checkpoints, and metrics, and allows models to easily be exported to downstream serving systems like Seldon and TensorFlow Serving. Users can interact with the system’s web UI to monitor job or cluster status, and debug and interact with live or historical experiments. 

To get started, download the Helm chart from the NGC catalog.  

Categories
Misc

Building a Dream Home with Real-Time Ray Tracing

This article covers an interview with OIa Stalmach, co-creator of House or Crickets, a 2020 DXR Spotlight Contest Winner.

An interview with OIa Stalmach, co-creator of House or Crickets, a 2020 DXR Spotlight Contest Winner

During the cold of winter, you may find the video for House of Crickets (below) comforting. Ola Stalmach and Jakub Lesniak have used real-time ray tracing techniques to create a warm, inviting summertime space that looks real enough to get you checking how much a rental might cost. 

NVIDIA: What is the development team size for House of Crickets?

Ola: It’s pretty small, only me and my partner (Jakub Lesniak). I’m working mostly on the artistic side of projects like interior design, creating moods and environments. Kuba helps me to improve the technical side of it and make the Unreal magic work in a hundred percent.

NVIDIA: Your demonstration looked so real, I wanted to move into that space! Is the visualization based on a real location?

Ola: I am so happy that you like it. Yes, the visualization is based on a real location. This is actually a project of my small apartment settled in attached houses near Krakow in Poland. The whole point was to make something beautiful in a normal, very standard house with a tiny garden.

NVIDIA: What real-time ray tracing features did you build into House of Crickets, and why?

Ola: I’ve used all available features (the less visible is Raytraced Ambient Occlusion). Since the iteration of ray tracing in Unreal Engine 4.25 it finally started working for me, especially Raytraced Indirect Illumination. 

This technology is the future of graphics, so it is good to start getting to know it if you plan to visualize amazing worlds in the future. What I really love to see in architecture visualisations are great accurate reflections and shadows, which were really hard to achieve for me with previous real-time technology.

NVIDIA: A lot of developers starting out with DXR struggle with performance because they try to make everything reflective. Do you have any advice on materials to use when building an environment that will be ray traced?

Ola: There is that balance between roughness, noise reduction algorithm and number of reflection samples. I would try to avoid middle-reflective surfaces, high values of reflection samples and bounces. It is all about tweaking and playing with materials and sacrificing something in the way.

NVIDIA: How long did it take to develop this demo?

Ola: I think I spent a total 1 month, maybe more just learning what is possible. It was done in my free time for my small Studio OS.

NVIDIA: What’s next for you with real-time ray tracing?

Ola: I already use ray tracing  (or the hybrid with lightbaking) in at least 80% of my projects. The plans are not only for architecture designs, but also to use it in Virtual Production, which I’ve gotten deeply into in the past two years. I’m looking forward to new improvements in both Engine and Hardware. Ray tracing will allow me to focus only on the artistic side of my creations and work with just intuition taken from the real World.

To learn more about Ola’s work, you can check out her social handles: Twitter, Facebook, Artstation, and Instagram pages.

Learn about NVIDIA’s tools for Game Developers here.

Categories
Misc

Introducing NVIDIA Isaac Gym: End-to-End Reinforcement Learning for Robotics

Announcing a preview release of Isaac Gym – NVIDIA’s physics simulation environment for reinforcement learning research.

For several years, NVIDIA’s research teams have been working to leverage GPU technology to accelerate reinforcement learning (RL). As a result of this promising research, NVIDIA is pleased to announce a preview release of Isaac Gym – NVIDIA’s physics simulation environment for reinforcement learning research. RL-based training is now more accessible as tasks that once required thousands of CPU cores can now instead be trained using a single GPU.

A cube manipulation task trained by Isaac Gym on a single A100 and rendered in Omniverse

RL has become one of the most promising research areas in machine learning and has demonstrated great potential for solving complex problems. RL-based systems have achieved superhuman performance in very challenging tasks, ranging from classic strategy games such as Go and Chess, to real-time computer games like StarCraft and DOTA.

RL based approaches also hold promise for robotics applications, such as solving a Rubik’s Cube, or learning locomotion by imitating animals.

Isaac Gym and NVIDIA GPUs, a reinforcement learning supercomputer 

Until now, most RL robotics researchers were forced to use clusters of CPU cores for the physically accurate simulations needed to train RL algorithms. In one of the more well-known projects, the OpenAI team used almost 30,000 CPU cores (920 computers with 32 cores each) to train their robot in the Rubik’s Cube task. 

In a similar task, Learning Dexterous In-Hand Manipulation, OpenAI used a cluster of 384 systems with 6144 CPU cores, plus 8 Volta V100 GPUs and required close to 30 hours of training to achieve its best results. This in-hand cube object orientation task is a challenging dexterous manipulation task, with complex physics and dynamics, many contacts, and a high-dimensional continuous control space. 

Isaac Gym includes an example of this cube manipulation task for researchers to recreate the OpenAI experiment. The example supports training both recurrent and feed-forward neural networks, as well as domain randomization of physics properties that help with sim-to-real transfer. With Isaac Gym, researchers can achieve the same level of success as OpenAI’s supercomputer — on a single A100 GPU — in about 10 hours! 

End to End GPU RL

Isaac Gym achieves these results by leveraging NVIDIA’s PhysX GPU-accelerated simulation engine, allowing it to gather the experience data required for robotics RL.

In addition to fast physics simulations, Isaac Gym also enables observation and reward calculations to take place on the GPU, thereby avoiding significant performance bottlenecks. In particular, costly data transfers between the GPU and the CPU are eliminated.

Implemented this way, Isaac Gym enables a complete end-to-end GPU RL pipeline.

Isaac Gym

Isaac Gym provides a basic API for creating and populating a scene with robots and objects, supporting loading data from URDF and MJCF file formats.  Each environment is duplicated as many times as needed, and can be simulated simultaneously without interaction with other environments.

Isaac Gym provides a PyTorch tensor-based API to access the results of physics simulation work, allowing RL observation and reward calculations to be built using the PyTorch JIT runtime system, which dynamically compiles the python code that does these calculations into CUDA code, running on the GPU.  

Observation tensors can be used as inputs to a policy inference network, and the resulting action tensors can be directly fed back into the physics system. Rollouts of observation, reward, and action buffers can stay on the GPU for the entire learning process eliminating the need to read data back from the CPU.

This set-up permits tens of thousands of simultaneous environments on a single GPU, allowing researchers to easily run experiments locally on their desktops that previously required an entire data center.

Isaac Gym also includes a basic Proximal Policy Optimization (PPO) implementation and a straightforward RL task system, but users may substitute alternative task systems or RL algorithms as desired. Also, while the included examples use PyTorch, users should also be able to integrate with TensorFlow based RL systems with some further customization.

Some additional features of Isaac Gym include:

  • Support for a variety of environment sensors – position, velocity, force, torque, etc.
  • Runtime domain randomization of physics parameters
  • Jacobian / inverse kinematics support

Research Results    

NVIDIA’s research team has been applying Isaac Gym to a wide variety of projects. You can take a sneak-peek at some of these below, but stay tuned to https://developer.nvidia.com/blog/ for more details on these projects.   

Get Started Today

Are you a researcher or academic interested in RL for robotics applications? Please download and try Isaac Gym

Future Plans

The core functionality of Isaac Gym will be made available as part of the NVIDIA Omniverse Platform and NVIDIA’s Isaac Sim, a robotics simulation platform built on Omniverse. Until then we are making this standalone preview release available to researchers and academics to show the possibilities of end-to-end GPU-based RL and help accelerate your work in this arena.

Categories
Misc

NVIDIA Boosts Academic AI Research

To help AI research like this make the leap from academia to commercial or government deployment, NVIDIA recently announced the Applied Research Accelerator Program. The program supports applied research on NVIDIA platforms for GPU-accelerated application deployments.

To help AI research like this make the leap from academia to commercial or government deployment, NVIDIA today announced the Applied Research Accelerator Program. The program supports applied research on NVIDIA platforms for GPU-accelerated application deployments.

Categories
Misc

Chalk and Awe: Studio Crafts Creative Battle Between Stick Figures with Real-Time Rendering

It’s time to bring krisp graphics to stick figure drawings. Creative studio SoKrispyMedia, started by content creators Sam Wickert and Eric Leigh, develops short videos blended with high-quality visual effects. Since publishing one of their early works eight years ago on YouTube, Chalk Warfare 1, the team has regularly put out short films that showcase Read article >

The post Chalk and Awe: Studio Crafts Creative Battle Between Stick Figures with Real-Time Rendering appeared first on The Official NVIDIA Blog.

Categories
Misc

Big Wheels Keep on Learnin’: Einride’s AI Trucks Advance Capabilities with NVIDIA DRIVE AGX Orin

Swedish startup Einride has rejigged the big rig for highways around the world. The autonomous truck maker launched the next generation of its cab-less autonomous truck, known as the Pod, with new, advanced functionality and pricing. The AI vehicles, which will be commercially available worldwide, will be powered by the latest in high-performance, energy-efficient compute Read article >

The post Big Wheels Keep on Learnin’: Einride’s AI Trucks Advance Capabilities with NVIDIA DRIVE AGX Orin appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA Boosts Academic AI Research for Business Innovation

Academic researchers are developing AI to solve challenging problems with everything from agricultural robotics to autonomous flying machines. To help AI research like this make the leap from academia to commercial or government deployment, NVIDIA today announced the Applied Research Accelerator Program. The program supports applied research on NVIDIA platforms for GPU-accelerated application deployments. The Read article >

The post NVIDIA Boosts Academic AI Research for Business Innovation appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA Research Achieves AI Training Breakthrough Using Limited Datasets

NVIDIA Research’s latest AI model is a prodigy among generative adversarial networks. Using a fraction of the study material needed by a typical GAN, it can learn skills as complex as emulating renowned painters and recreating images of cancer tissue. By applying a breakthrough neural network training technique to the popular NVIDIA StyleGAN2 model, NVIDIA Read article >

The post NVIDIA Research Achieves AI Training Breakthrough Using Limited Datasets appeared first on The Official NVIDIA Blog.

Categories
Misc

Faster Physics: How AI and NVIDIA A100 GPUs Automate Particle Physics

What are the fundamental laws that govern our universe? How did the matter in the universe today get there? What exactly is dark matter? The questions may be eternal, but no human scientist has an eternity to answer them. Now, thanks to NVIDIA technology and cutting-edge AI, the more than 1,000 collaborators from 26 countries Read article >

The post Faster Physics: How AI and NVIDIA A100 GPUs Automate Particle Physics appeared first on The Official NVIDIA Blog.