In this new course learn about creating software-defined, cloud-native, DPU-accelerated services with zero-trust protection for increasing the performance and security demands of modern data centers.
In this new course learn about creating software-defined, cloud-native, DPU-accelerated services with zero-trust protection for increasing the performance and security demands of modern data centers.
Ken Jee, a data science professional, shares insights on leveraging university resources, benefits of content creation, and useful learning methods for AI topics.
Ken Jee is a data scientist and YouTube content creator who has quickly become known for creating engaging and easy-to-follow videos. Jee has helped countless people learn about data science, machine learning, and AI and is the initiator of the popular #66daysofdata movement.
Currently, Jee works as the Head of Data Science at Scouts Consulting Group. In this post, he discusses his work as a data scientist and offers advice for anyone looking to enter the field. We explore the importance of university education, the relevancy of math for data scientists, creating visibility within the industry, and the value of an open mind when it comes to new technologies.
This post is a transcription of bits and pieces of Jee’s wisdom with which I had the pleasure to speak to on my podcast. At the conclusion of this article, you’ll find a link to the entire discussion. While there has been numerous editing in the answers provided by Jee to ensure brevity and conciseness, the intentions of his answers are maintained.
Why did you start making data science videos on YouTube?
I started making data science videos on YouTube because I didn’t see the resources that I was looking for when I was trying to learn data science.
I also saw making videos as the best way to improve my communication skills. Creating content has given me a competitive advantage because it has attracted employers to me rather than going out to get them. I usually refer to this as the concept of content gravity. The more content that I create, the more pull I have on employers and opportunities coming to me.
I love working on interesting data projects and creating easy-to-digest content that can help others learn and grow. I believe that data science skills are valuable and shareable and that data-driven content has a great potential to go viral. Companies should encourage their employees to have side hustles and be public about them, as it looks good for the company.
I see a future where everyone uses social media to share their work and ideas and where this is accepted and expected in most roles. In some of my previous job roles, I’ve been referred to as “the guy who makes YouTube videos.” My external efforts outside of work have aided my internal visibility within companies.
How did you become interested in data science?
I became interested in data science because I wanted to improve my golfing skills. I started to explore how data could help me analyze my performance and find ways to improve. I soon discovered that I had a unique advantage: the ability to analyze data and create data-driven actions to improve my golfing abilities. This led me to explore further other performance improvement methods supported by data and intelligence.
How essential is mathematics in data science?
I believe that mathematics is less important when breaking into the data science field. What’s important is getting your hands dirty and coding. I recommend that people get their hands dirty by building projects and coding, as this will help them intuitively find where the math is valuable and important.
I also recommend reviewing calculus, linear algebra, and discrete math, but only once you have a reason to do so and understand how they are relevant to data science. As you continue to progress within the field, you will gradually learn where math skills are important and relevant. And once you see the value that they bring, you will be more motivated to learn them.
Is self-directed learning more important than a formal degree when entering the data science field?
One of the primary reasons I encourage people to investigate unusual learning methods, as opposed to attending a university, is that many students underutilize the resources available at institutions. I used all of my office hours with professors and asked questions from PhDs who knew a lot about subjects, but I discovered very few students did the same.
In my opinion, having a degree is only useful if you put in the effort and make the most of the available opportunities. I recommend taking advantage of other options available at university, such as side projects. Doing so can help students get the most out of their education and give them an edge in the job market. However, I warn that simply getting a degree does not guarantee a successful career.
Editor’s Note: Jee contributes to the data science learning platform365DataScience, educating learners on starting a successful data science career. He also has a master’s degree in computer science and another in business, marketing, and management. Jee holds a bachelor’s degree in economics.
Obtaining a master’s degree in an advanced subject such as data science is not always the best method to stand out. Having an impressive portfolio, unique work or volunteer experience can be more valuable.
It is worth considering if you can invest the time and money into obtaining a master’s degree as it is undoubtedly a viable resource. But it’s also important to consider the opportunity cost of returning to school to land a job. So, it’s financially practical to view attending graduate school to obtain a particular role within AI as an opportunity cost. You essentially must determine if attending grad school will provide a good return on investment.
How do you learn?
I learn best by struggling through something on my own at my own pace, rereading the same thing over and over again until I understand it. In grad school, I fell in love with reading, and the majority of my knowledge came from textbooks.
I recommend looking at things from different angles to get a diverse understanding of a topic. One of the most important keys to accelerating learning is finding a suitable medium that explains the topic in a way that makes sense to you, this could be reading a blog post, watching a video, or listening to a podcast.
Although my primary method of obtaining knowledge in grad school was through books, I admit that my learning of data science concepts and topics today involves videos and YouTube tutorials. Specifically, I want to mention the popular data science YouTube channel StatQuest with Josh Starmer.
What are the best skills to differentiate yourself as a data scientist?
Data scientists have to learn coding, math, and business in order to be successful. I differentiated myself from the competition with my unique combination of skills. My business knowledge and ability to meet the strategic requirement for coding and data science made me a highly desirable candidate. My resume and portfolio stood out from the competition. Additionally, my communication skills and business knowledge gave me a distinct edge in job interviews.
How did you become the head of data science at your current company?
I discovered that I didn’t fit well into corporate bureaucracy very early on. My focus was on creating value, getting noticed for adding value and getting work satisfaction. My title has progressed from data scientist to head of data science. I am now responsible for all data-related work and have taken on the role of Director of Data Science.
This change reflects the increased responsibilities that I have taken very early on within my current company, from solely being responsible for all data science activities to managing teams of data scientists. If you are looking for a job, I recommend that you create your opportunities by reaching out to potential employers.
You may be surprised at how open they are to hiring you if they see that you are willing and able to do the work. I advise data science practitioners to find a position that doesn’t yet exist or make one for themselves. This way, you can skip the line and get to where you want to be without waiting for opportunities.
What is your advice to entry-level data scientists?
Entry-level data scientists should share their work and journey with others. People are hesitant to produce content because they are afraid of being judged, but this is not usually the case. People are more likely to be positive and supportive. I also recommend learning to code first, as this is a valuable skill for data scientists. However, I recognize that everyone learns differently, so this is not a one-size-fits-all approach.
Summary from the author
Jee’s journey within data science is unique, but the steps that led to his success are replicable and adaptable to your data science career. My discussion with him revealed the importance of using digital content to communicate your expertise and presence within the data science field, which can sometimes be filled with noise. His advice to data science practitioners is to focus on creating value and making sure that you’re learning continuously to keep up with the rapidly changing field. So whatever your goals are for your data science career, don’t forget to enjoy the journey and document it along the way!
You can watch or listen to the entire conversation with Ken Jee on YouTube or Spotify.
Learn how NVIDIA and Azure together enable global on-demand access to the latest GPUs and developer solutions to build, deploy, and scale AI-powered services.
Learn how NVIDIA and Azure together enable global on-demand access to the latest GPUs and developer solutions to build, deploy, and scale AI-powered services.
Engineers are using the NVIDIA Omniverse 3D simulation platform as part of a proof of concept that promises to become a model for putting green energy to work around the world. Dubbed Gigastack, the pilot project — led by a consortium that includes Phillips 66 and Denmark-based renewable energy company Ørsted — will create low-emission Read article >
Announcing our first Omniverse developer contest for building an Omniverse Extension. Show us how you’re extending Omniverse to transform 3D workflows and virtual worldbuilding.
Developers across industries are building 3D tools and applications to help teams create virtual worlds in art, design, manufacturing, and more. NVIDIA Omniverse, an extensible platform for full fidelity design, simulation, and developing USD-based workflows, has an ever-growing ecosystem of developers building Python-based extensions. We’ve launched contests in the past for building breathtaking 3D simulations using the Omniverse Create app.
Today, we’re announcing our first NVIDIA Omniverse contest specifically for developers, engineers, technical artists, hobbyists, and researchers to develop Python tools for 3D worlds. The contest runs from July 11 to August 19, 2022. The overall winner will be awarded an NVIDIA RTX A6000, and the runners-up in each category will win a GeForce RTX 3090 Ti.
The challenge? Build an Omniverse Extension using Omniverse Kit and the developer-centric Omniverse application Omniverse Code. Contestants can create Python extensions in one of the following categories for the Extend the Omniverse contest:
Layout and scene authoring tools
Omni.ui with Omniverse Kit
Scene modifier and manipulator tools
Layout and scene authoring tools
The demand for 3D content and environments is growing exponentially. Layout and scene authoring tools help scale workflows for world-building, leveraging rules-based algorithms and AI to generate assets procedurally.
Instead of tediously placing every component by hand, creators can paint in broader strokes and automatically generate physical objects like books, lamps, or fences to populate a scene. With the ability to iterate layout and scenes more freely, creators can accelerate their workflows and free up time to focus on creativity.
Universal Scene Description (USD) is at the foundation of layout and scene authoring tools contestants can develop in Omniverse. The powerful, easily extensible scene description handles incredibly large 3D datasets without skipping a beat—enabling creating, editing, querying, rendering, and collaboration in 3D worlds.
Omni.ui with Omniverse Kit
Well-crafted user interfaces provide a superior experience for artists and developers alike. They can boost productivity and enable nontechnical and technical users to harness the power of complex algorithms.
Building custom user interfaces has never been simpler than with Omni.ui, Omniverse’s UI toolkit for creating beautiful and flexible graphical UI design. Omni.ui was designed using modern asynchronous technologies and UI design patterns to be reactive and responsive.
Using Omniverse Kit, you can deeply customize the final look of applications with widgets for creating visual components, receiving user input, and creating data models. With its style sheet architecture that feels akin to HTML or CSS, you can change the look of your widgets or create a new color scheme for an entire app.
Existing widgets can be combined and new ones can be defined to build the interface that you’ve always wanted. These extensions can range from floating panels in the navigation bar to markup tools in Omniverse View and Showroom. You can also create data models, views, and delegates to build robust and flexible interfaces.
Scene modifier and manipulator tools
Scene modifier and manipulator tools offer new ways for artists to interact with their scenes. Whether it’s changing the geometry of an object, the lighting of a scene, or creating animations, these tools enable artists to modify and manipulate scenes with limited manual work.
Using omni.ui.scene, Omniverse’s low-code module for building UIs in 3D space, you can develop 3D widgets and manipulators to create and move shapes in a 3D projected scene with Python. Many primitive objects are available, including text, image, rectangle, arc, line, curve, and mesh, with more regularly being added.
We can’t wait to see what extensions you’ll create to contribute to the ecosystem of extensions that are expanding what’s possible in the Omniverse. Read more about the contest, or watch the video below for a step-by-step guide on how to enter. You can also visit the GitHub contest page for sample code and other resources to get started.
Don’t miss these upcoming events:
Join the Omniverse community on Discord July 13, 2022 for the Getting Started – #ExtendOmniverse Developer Contest livestream.
Join us at SIGGRAPH for hands-on developer labs where you can learn how to build extensions in Omniverse.
Learn more in the Omniverse Resource Center, which details how developers can build custom applications and extensions for the platform.
A breakthrough in the simulation and learning of contact-rich interactions provides tools and methods to accelerate robotic assembly and simulation research.
NVIDIA robotics and simulation researchers presented Factory: Fast Contact for Robotic Assembly at the 2022 Robotics: Science and Systems (RSS) conference. This work is a novel breakthrough in the simulation and learning of contact-rich interactions, which are ubiquitous in robotics research. Its aim is to greatly accelerate research and development in robotic assembly, as well as serve as a powerful tool for contact-rich simulation of any kind.
Robotic assembly: What, why, and challenges
Assembly is essential across the automotive, aerospace, electronics, and medical industries. Examples include tightening nuts and bolts, soldering, peg insertion, and cable routing.
However, robotic assembly remains one of the oldest and most challenging tasks in robotics. It has been exceptionally difficult to automate because of the physical complexity, high reliability, part variability, and high accuracy requirements.
In industry, robotic assembly methods may achieve high precision, accuracy, and reliability but often require expensive equipment and custom fixtures that can be time-consuming to set up and maintain (preprogrammed trajectories and careful tuning, for example). Tasks that involve robustness to variation (part types, appearance, and locations) and complex manipulation are frequently done using manual labor.
Research methods may achieve lower cost, higher adaptivity, and improved robustness but are often less reliable and slower.
Simulation: A tool for solving the challenges in robotic assembly
Simulation has been used for decades to verify, validate, and optimize robot designs and algorithms in robotics. This includes ensuring the safety of deploying these algorithms. It has also been used to generate large-scale datasets for deep learning, perform system identification, and develop planning and control methods.
In reinforcement learning (RL) research, we have recently seen how simulation results can be transferred to a real system. The importance of accurate physics simulation for robotics development cannot be overemphasized.
Physics-based simulators like MuJoCo and NVIDIA Isaac Gym have been used to train virtual agents to perform manipulation and locomotion tasks, such as solving a Rubik’s Cube or walking on uneven terrain using ANYmal. The policies have successfully transferred to real-world robots.
However, the power of a fast and accurate simulator has not substantially impacted robotic assembly. Developing such simulators for complex bodies with different variations and motions is a difficult task.
For example, a simple nut-and-bolt assembly requires more than pure helical motion. There are finite clearances between the threads of the nut and bolt, which allow the nut to move with six degrees of freedom. Even humans require some level of carefulness to ensure that the nut has proper initial alignment with the bolt and does not get stuck during tightening.
However, simulating the task with traditional methods may require meshes with tens of thousands of triangles. Detecting collisions between these meshes, generating contact points and normals, and solving non-penetration constraints are major computational challenges.
Despite the fact that there is an abundance of threaded fasteners in the world, no existing robotics simulator is able to simulate even a single nut-and-bolt assembly in real time at the same rate as the underlying physical dynamics.
In Factory, the researchers developed methods to overcome the challenges in robotic assembly and other contact-rich interactions.
What is Factory?
Factory (Fast Contact for Robotic Assembly) is a set of physics simulation methods and robot learning tools for achieving real-time and faster simulation of a wide range of contact-rich interactions. One of the Factory applications is robotic assembly.
Factory offers the following central contributions:
A set of methods for fast, accurate physical simulation of contact-rich interactions through a novel GPU-based synthesis of signed distance function (SDF)-based collisions, contact reduction, and a Gauss-Seidel solver.
A robot learning suite consisting of:
60 high-quality assets, including a Franka robot and all rigid-body assemblies from the NIST Assembly Task Board 1, the established benchmark for robotic assembly
Three Isaac Gym-style learning environments for robotic assembly
Seven classical robot controllers
Proof-of-concept reinforcement learning policies for robots performing contact-rich tasks (a simulated Franka Robot solving the most contact-rich task on the NIST board, nut-and-bolt assembly)
The physics simulation methods in the Factory paper have been integrated into the PhysX physics engine used by Isaac Gym. The asset suite and reinforcement learning policies are available with the latest version of Isaac Gym and the Isaac Gym Environments GitHub repo. The simulation methods are also available in the Omniverse Isaac Sim simulator, with reinforcement learning examples coming later this summer.
Simulation methods and results
Using the fast GPU-based implementations of SDF collisions for objects, contact reduction algorithms for reducing contacts from the SDF collisions, and custom numerical solvers, the researchers were not only able to simulate a single M16 nut and bolt in real time but 1,024 in parallel environments and real time. This is essentially 20,000x faster than the prior state-of-the-art.
The researchers demonstrated the simulator’s performance in a wide range of challenging scenes, including the following:
512 bowls falling into a pile in the same environment
A pile of nuts fed into a feeder mechanism vibrating at 60 Hz
A Franka robot executing a hand-scripted trajectory to grasp and tighten a nut onto a bolt, with 128 instances of this environment executing in real time
Robot learning tools
The most established benchmark for robotic assembly is the NIST assembly task board, the focus of an annual robotics competition since 2017. The NIST Task Board 1 consists of 38 unique parts. However, the CAD models provided are not ideal for physics simulations due to a lack of real-world clearances, interferences between parts, hand-derived measurements, and so on. Realistic models are hard to find.
Factory uses 60 high-quality, simulation-ready part models, each with an Onshape CAD model, one or more OBJ meshes, a URDF description, and estimated material properties that conform to international standards (ISO 724, ISO 965, and ISO 286) or which are based on models sourced from manufacturers. These models include all parts on the NIST assembly Task Board 1 with dimensional variations that span real-world tolerance bands. Clearance between parts ranges from 0 to a maximum of 2.66 mm, with many parts within the 0.1-0.5 mm range.
Factory provides three robotic assembly scenes for Isaac Gym that can be used for developing planning and control algorithms, collecting simulated sensor data for supervised learning, and training RL agents. Each scene contains a Franka robot and disassembled assemblies from the NIST Task Board 1.
The assets can be randomized in types and locations across all environments. All scenes have been tested with up to 128 simultaneous environments on an NVIDIA RTX 3090 GPU. The scenes are shown below:
The seven robot controllers available in the learning environments include a joint-space inverse differential kinematics (IK) motion controller, a joint-space inverse dynamics (ID) controller, a task-space impedance controller, an operational space motion controller, an open-loop force controller, a closed-loop proportional force controller, and a hybrid force-motion controller.
The researchers intend that the models, environments, and controllers continuously grow with contributions from them and the community.
Proof-of-concept RL policies
Factory employs GPU-accelerated on-policy RL to solve the most contact-rich task on NIST Task Board 1: assembling a nut onto a bolt. Like many assembly tasks, such a procedure is long-horizon and challenging to learn end-to-end. The problem was separated into three phases:
Pick: The robot grasps the nut with a parallel-jaw gripper from a random location on a work surface.
Place: The robot transports the nut to the top of a bolt fixed to the surface.
Screw: The robot brings the nut into contact with the bolt, engages the mating threads, and tightens the nut until it contacts the base of the bolt head.
The training was done on a single GPU. Large randomizations were applied to the initial position and orientation of the objects with a batch of 3-4 policies trained simultaneously using proximal policy optimization (PPO). Each batch takes 1-1.5 hours to train and each subpolicy is trained in over 128 environments with a maximum of 1,024 policy updates for rapid experimentation. The success rate was 98.4% at test time.
Finally, to evaluate the potential for sim-to-real transfer (transferring the policy learned in simulation to real-world robotics systems), the researchers compared the contact forces generated during these interactions in simulation to contact forces measured in the real world by humans performing the same task with a wrench. For more information, see the R-PAL Daily Interactive Manipulation (DIM) dataset.
The figure below shows that the histogram of the simulation Fasten Nut lies in the middle of the histogram of the Real Fasten Nut, which shows a strong consistency with the real-world values.
Conclusion and future directions
Although Factory was developed with robotic assembly as a motivating application, there are no limitations on using the methods for entirely different tasks within robotics, such as grasping complex non-convex shapes in home environments, locomotion on uneven outdoor terrain, and non-prehensile manipulation of aggregates of objects.
The future direction of this work is to realize full end-to-end simulation for complex physical interactions, including techniques for efficiently transferring the trained policies to real-world robotic systems. This can potentially minimize cost and risk, improve safety, and achieve efficient behaviors.
One day, every advanced industrial manufacturing robot might be trained in simulation using such techniques for seamless transfer to the real world.
Towards this end, NVIDIA developers are working to refine the physics simulation methods used by the Factory research so that they can be used within Omniverse Isaac Sim. Limited functionality is already present, and will become more robust over time.
Posted by Danijar Hafner, Student Researcher, Google Research
Research into how artificial agents can make decisions has evolved rapidly through advances in deep reinforcement learning. Compared to generative ML models like GPT-3 and Imagen, artificial agents can directly influence their environment through actions, such as moving a robot arm based on camera inputs or clicking a button in a web browser. While artificial agents have the potential to be increasingly helpful to people, current methods are held back by the need to receive detailed feedback in the form of frequently provided rewards to learn successful strategies. For example, despite large computational budgets, even powerful programs such as AlphaGo are limited to a few hundred moves until receiving their next reward.
In contrast, complex tasks like making a meal require decision making at all levels, from planning the menu, navigating to the store to pick up groceries, and following the recipe in the kitchen to properly executing the fine motor skills needed at each step along the way based on high-dimensional sensory inputs. Hierarchical reinforcement learning (HRL) promises to automatically break down such complex tasks into manageable subgoals, enabling artificial agents to solve tasks more autonomously from fewer rewards, also known as sparse rewards. However, research progress on HRL has proven to be challenging; current methods rely on manually specified goal spaces or subtasks, and no general solution exists.
To spur progress on this research challenge and in collaboration with the University of California, Berkeley, we present the Director agent, which learns practical, general, and interpretable hierarchical behaviors from raw pixels. Director trains a manager policy to propose subgoals within the latent space of a learned world model and trains a worker policy to achieve these goals. Despite operating on latent representations, we can decode Director’s internal subgoals into images to inspect and interpret its decisions. We evaluate Director across several benchmarks, showing that it learns diverse hierarchical strategies and enables solving tasks with very sparse rewards where previous approaches fail, such as exploring 3D mazes with quadruped robots directly from first-person pixel inputs.
Director learns to solve complex long-horizon tasks by automatically breaking them down into subgoals. Each panel shows the environment interaction on the left and the decoded internal goals on the right.
How Director Works Director learns a world model from pixels that enables efficient planning in a latent space. The world model maps images to model states and then predicts future model states given potential actions. From predicted trajectories of model states, Director optimizes two policies: The manager chooses a new goal every fixed number of steps, and the worker learns to achieve the goals through low-level actions. However, choosing goals directly in the high-dimensional continuous representation space of the world model would be a challenging control problem for the manager. Instead, we learn a goal autoencoder to compress the model states into smaller discrete codes. The manager then selects discrete codes and the goal autoencoder turns them into model states before passing them as goals to the worker.
Left: The goal autoencoder (blue) compresses the world model (green) state (st) into discrete codes (z). Right: The manager policy (orange) selects a code that the goal decoder (blue) turns into a feature space goal (g). The worker policy (red) learns to achieve the goal from future trajectories (s1, …, s4) predicted by the world model.
All components of Director are optimized concurrently, so the manager learns to select goals that are achievable by the worker. The manager learns to select goals to maximize both the task reward and an exploration bonus, leading the agent to explore and steer towards remote parts of the environment. We found that preferring model states where the goal autoencoder incurs high prediction error is a simple and effective exploration bonus. Unlike prior methods, such as Feudal Networks, our worker receives no task reward and learns purely from maximizing the feature space similarity between the current model state and the goal. This means the worker has no knowledge of the task and instead concentrates all its capacity on achieving goals.
Benchmark Results Whereas prior work in HRL often resorted to custom evaluation protocols — such as assuming diverse practice goals, access to the agents’ global position on a 2D map, or ground-truth distance rewards — Director operates in the end-to-end RL setting. To test the ability to explore and solve long-horizon tasks, we propose the challenging Egocentric Ant Maze benchmark. This challenging suite of tasks requires finding and reaching goals in 3D mazes by controlling the joints of a quadruped robot, given only proprioceptive and first-person camera inputs. The sparse reward is given when the robot reaches the goal, so the agents have to autonomously explore in the absence of task rewards throughout most of their learning.
The Egocentric Ant Maze benchmark measures the ability of agents to explore in a temporally-abstract manner to find the sparse reward at the end of the maze.
We evaluate Director against two state-of-the-art algorithms that are also based on world models: Plan2Explore, which maximizes both task reward and an exploration bonus based on ensemble disagreement, and Dreamer, which simply maximizes the task reward. Both baselines learn non-hierarchical policies from imagined trajectories of the world model. We find that Plan2Explore results in noisy movements that flip the robot onto its back, preventing it from reaching the goal. Dreamer reaches the goal in the smallest maze but fails to explore the larger mazes. In these larger mazes, Director is the only method to find and reliably reach the goal.
To study the ability of agents to discover very sparse rewards in isolation and separately from the challenge of representation learning of 3D environments, we propose the Visual Pin Pad suite. In these tasks, the agent controls a black square, moving it around to step on differently colored pads. At the bottom of the screen, the history of previously activated pads is shown, removing the need for long-term memory. The task is to discover the correct sequence for activating all the pads, at which point the agent receives the sparse reward. Again, Director outperforms previous methods by a large margin.
The Visual Pin Pad benchmark allows researchers to evaluate agents under very sparse rewards and without confounding challenges such as perceiving 3D scenes or long-term memory.
In addition to solving tasks with sparse rewards, we study Director’s performance on a wide range of tasks common in the literature that typically require no long-term exploration. Our experiment includes 12 tasks that cover Atari games, Control Suite tasks, DMLab maze environments, and the research platform Crafter. We find that Director succeeds across all these tasks with the same hyperparameters, demonstrating the robustness of the hierarchy learning process. Additionally, providing the task reward to the worker enables Director to learn precise movements for the task, fully matching or exceeding the performance of the state-of-the-art Dreamer algorithm.
Director solves a wide range of standard tasks with dense rewards with the same hyperparameters, demonstrating the robustness of the hierarchy learning process.
Goal Visualizations While Director uses latent model states as goals, the learned world model allows us to decode these goals into images for human interpretation. We visualize the internal goals of Director for multiple environments to gain insights into its decision making and find that Director learns diverse strategies for breaking down long-horizon tasks. For example, on the Walker and Humanoid tasks, the manager requests a forward leaning pose and shifting floor patterns, with the worker filling in the details of how the legs need to move. In the Egocentric Ant Maze, the manager steers the ant robot by requesting a sequence of different wall colors. In the 2D research platform Crafter, the manager requests resource collection and tools via the inventory display at the bottom of the screen, and in DMLab mazes, the manager encourages the worker via the teleport animation that occurs right after collecting the desired object.
Left: In Egocentric Ant Maze XL, the manager directs the worker through the maze by targeting walls of different colors. Right: In Visual Pin Pad Six, the manager specifies subgoals via the history display at the bottom and by highlighting different pads.
Left: In Walker, the manager requests a forward leaning pose with both feet off the ground and a shifting floor pattern, with the worker filling in the details of leg movement. Right: In the challenging Humanoid task, Director learns to stand up and walk reliably from pixels and without early episode terminations.
Left: In Crafter, the manager requests resource collection via the inventory display at the bottom of the screen. Right: In DMLab Goals Small, the manager requests the teleport animation that occurs when receiving a reward as a way to communicate the task to the worker.
Future Directions We see Director as a step forward in HRL research and are preparing its code to be released in the future. Director is a practical, interpretable, and generally applicable algorithm that provides an effective starting point for the future development of hierarchical artificial agents by the research community, such as allowing goals to only correspond to subsets of the full representation vectors, dynamically learning the duration of the goals, and building hierarchical agents with three or more levels of temporal abstraction. We are optimistic that future algorithmic advances in HRL will unlock new levels of performance and autonomy of intelligent agents.
A one-of-a-kind electric race car revved to life before it was manufactured — or even prototyped — thanks to GPU-powered extended reality technology. At the Automotive Innovation Forum in May, NVIDIA worked with Autodesk VRED to showcase a photorealistic Porsche electric sports car in augmented reality, with multiple attendees collaborating in the same immersive environment. Read article >
Learn how NVIDIA researchers use AI to design better arithmetic circuits that power our AI chips.
As Moore’s law slows down, it becomes increasingly important to develop other techniques that improve the performance of a chip at the same technology process node. Our approach uses AI to design smaller, faster, and more efficient circuits to deliver more performance with each chip generation.
Vast arrays of arithmetic circuits have powered NVIDIA GPUs to achieve unprecedented acceleration for AI, high-performance computing, and computer graphics. Thus, improving the design of these arithmetic circuits would be critical in improving the performance and efficiency of GPUs.
What if AI could learn to design these circuits? In PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning, we demonstrate that not only can AI learn to design these circuits from scratch, but AI-designed circuits are also smaller and faster than those designed by state-of-the-art electronic design automation (EDA) tools. The latest NVIDIA Hopper GPU architecture has nearly 13,000 instances of AI-designed circuits.
In Figure 1, the circuit corresponds to the (31.4µm², 0.186ns) point in the PrefixRL curve in Figure 5.
The circuit design game
Arithmetic circuits in computer chips are constructed using a network of logic gates (like NAND, NOR, and XOR) and wires. The desirable circuit should have the following characteristics:
Small: A lower area so that more circuits can fit on a chip.
Fast: A lower delay to improve the performance of the chip.
Consume less power: A lower power consumption of the chip.
In our paper, we focus on circuit area and delay. We find that power consumption is well-correlated with area for our circuits of interest. Circuit area and delay are often competing properties, so we want to find the Pareto frontier of designs that effectively trades off these properties. Put simply, we desire the minimum area circuit at every delay.
In PrefixRL, we focus on a popular class of arithmetic circuits called (parallel) prefix circuits. Various important circuits in the GPU such as adders, incrementors, and encoders are prefix circuits that can be defined at a higher level as prefix graphs.
In this work, we specifically ask the question: can an AI agent design good prefix graphs? The state-space of all prefix graphs is large O(2^n^n) and cannot be explored using brute force methods.
A prefix graph is converted into a circuit with wires and logic gates using a circuit generator. These generated circuits are then further optimized by a physical synthesis tool using physical synthesis optimizations such as gate sizing, duplication, and buffer insertion.
The final circuit properties (delay, area, and power) do not directly translate from the original prefix graph properties, such as level and node count, due to these physical synthesis optimizations. This is why the AI agent learns to design prefix graphs but optimizes for the properties of the final circuit generated from the prefix graph.
We pose arithmetic circuit design as a reinforcement learning (RL) task, where we train an agent to optimize the area and delay properties of arithmetic circuits. For prefix circuits, we design an environment where the RL agent can add or remove a node from the prefix graph, after which the following steps happen:
The prefix graph is legalized to always maintain a correct prefix sum computation.
A circuit is generated from the legalized prefix graph.
The circuit undergoes physical synthesis optimizations using a physical synthesis tool.
The area and delay properties of the circuit are measured.
During an episode, the RL agent builds up the prefix graph step-by-step by adding or removing nodes. At each step, the agent receives the improvement in the corresponding circuit area and delay as rewards.
State and action representation and the deep reinforcement learning model
We use the Q-learning algorithm to train the circuit designer agent. We use a grid representation for prefix graphs where each element in the grid uniquely maps to a prefix node. This grid representation is used at both the input and output of the Q-network. Each element in the input grid represents whether a node is present or absent. Each element in the output grid represents the Q-values for adding or removing a node.
We use a fully convolutional neural network architecture for the agent as the input and output of the Q-learning agent are grid representations. The agent separately predicts the Q values for the area and delay properties because the rewards for area and delay are separately observable during training.
Distributed training with Raptor
PrefixRL is a computationally demanding task: physical simulation required 256 CPUs for each GPU and training the 64b case took over 32,000 GPU hours.
We developed Raptor, an in-house distributed reinforcement learning platform that takes special advantage of NVIDIA hardware for this kind of industrial reinforcement learning (Figure 4).
Raptor has several features that enhance scalability and training speed such as job scheduling, custom networking, and GPU-aware data structures. In the context of PrefixRL, Raptor makes the distribution of work across a mix of CPUs, GPUs, and Spot instances possible.
Networking in this reinforcement learning application is diverse and benefits from the following.
Raptor’s ability to switch between NCCL for point-to-point transfer to transfer model parameters directly from the learner GPU to an inference GPU.
Redis for asynchronous and smaller messages such as rewards or statistics.
A JIT-compiled RPC to handle high volume and low latency requests such as uploading experience data.
Finally, Raptor provides GPU-aware data structures such as a replay buffer that has a multithreaded server to receive experience from multiple workers, and batches data in parallel and prefetches it onto the GPU.
Figure 4 shows that our framework powers concurrent training and data collection, and takes advantage of NCCL to efficiently send actors the latest parameters.
Reward computation
We use a tradeoff weight w from [0,1] to combine the area and delay objectives. We train various agents with various weights to obtain a Pareto frontier of designs that balance the tradeoff between area and delay.
The physical synthesis optimizations in the RL environment can generate various solutions to tradeoff between area and delay. We should drive the physical synthesis tool with the same tradeoff weight for which a particular agent is trained.
Performing physical synthesis optimizations in the loop for reward computation has several advantages.
The RL agent learns to directly optimize the final circuit properties for a target technology node and library.
The RL agent can optimize the properties of the target arithmetic circuit and its surrounding logic jointly by including the surrounding logic during physical synthesis.
However, performing physical synthesis is a slow process (~35 seconds for 64b adders), which can greatly slow RL training and exploration.
We decouple reward calculation from state update as the agent only needs the current prefix graph state to take actions, and not circuit synthesis nor previous rewards. Thanks to Raptor, we can offload the lengthy reward calculation onto a pool of CPU workers to perform physical synthesis in parallel, while actor agents step through the environment without needing to wait.
When rewards are returned by the CPU workers, the transitions can then be inserted into the replay buffer. Synthesis rewards are cached to avoid redundant computation whenever a state is reencountered.
Results
The RL agents learn to design circuits tabula rasa purely through learning with feedback from synthesized circuit properties. Figure 5 shows the latest results* that use 64b adder circuits designed by PrefixRL, Pareto-dominated adder circuits from a state-of-the-art EDA tool in area and delay.
The best PrefixRL adder achieved a 25% lower area than the EDA tool adder at the same delay. These prefix graphs that map to Pareto optimal adder circuits after physical synthesis optimizations have irregular structures.
Conclusion
To the best of our knowledge, this is the first method using a deep reinforcement learning agent to design arithmetic circuits. We hope that this method can be a blueprint for applying AI to real-world circuit design problems: constructing action spaces, state representations, RL agent models, optimizing for multiple competing objectives, and overcoming slow reward computation processes such as physical synthesis.
NVIDIA’s latest corporate responsibility report shares our efforts in empowering employees and putting to work our technologies for the benefit of humanity. Amid ongoing global economic concerns and pandemic challenges, this year’s report highlights our ability to attract and retain talent that come here to do their life’s work while tackling some of the world’s Read article >