I show how NVIDIA Isaac Sim and NVIDIA Isaac ROS GEMs can be used with the Nav2 navigation stack for robotic navigation.
NVIDIA Isaac ROS GEMs are ROS packages that optimize AI-based robotics applications to run on NVIDIA GPUs and the Jetson platform. There is a growing interest in integrating these packages with the Nav2 project to help autonomous robots successfully navigate around dynamic environments.
This work is done entirely in simulation and can be used as a starting point for transferring robotic capabilities from simulation to the real world (Sim2Real).
In this post, I focus on a real-world problem where robots are damaged due to collisions with forklift tines in warehouses. A forklift is an industrial truck used to move heavy objects over short distances. It has extensions called tines (or forks), which slide under and lift objects.
Primarily used robot sensors (lidar) can detect the body of a forklift, but not the tines, which are close to the ground. There is also a need for other sensors in this scenario that can detect these tines. In this project, you use two RGB cameras on the robot in simulation. The images from these cameras are used to calculate disparity using the Isaac ROS stereo GEM.
From disparity, the stereo GEM generates a point cloud, which contains information about where all the objects that are in the cameras’ field of view are in the environment. This information is used to update the navigation node such that the robot’s path is changed if a collision can potentially occur.
Figure 2. Reproducing the real-world problem in NVIDIA Isaac Sim: robots colliding with forklift tines
Figure 3 shows the fundamental workflow of the project.
For more information, see the NVIDIA-AI-IOT/Nav2-with-Isaac-ROS-GEMs GitHub repo.
NVIDIA Isaac Sim set up
You use a warehouse environment in NVIDIA Isaac Sim, which includes the Carter robot and forklifts. Following the ROS2 Navigation example, generate an occupancy map that is used by the Nav2 stack to avoid static obstacles like shelves. Dynamic or moving obstacles including forklifts and trolleys are added to the environment after creating the occupancy map. This is to mimic the real world, where objects change in the environment without the robot’s knowledge.
It is important to note the offset between the Carter robot’s left and right stereo cameras in NVIDIA Isaac Sim for the NVIDIA Isaac ROS Stereo GEM to generate disparity correctly. Ensure that the ROS2 bridge is enabled in NVIDIA Isaac Sim before starting simulation so that ROS2 messages can be communicated outside NVIDIA Isaac Sim.
NVIDIA Isaac ROS stereo GEM and Nav2
The Nav2 stack uses global and local costmaps to steer the robot clear of obstacles. The local costmap is updated based on new, moving obstacles in the environment and can take laser scans and point clouds as input from robotic sensors.
As laserscan from lidar fails to pick up forklift tines in real scenarios, you can address this problem by using point clouds from stereo images, which are passed to Nav2. These point clouds are generated using the NVIDIA Isaac ROS Stereo GEM.
On the right side of Figure 4, the light blue region under the tines shows that the Nav2 local costmap has been updated to represent an obstacle there, which the robot can now avoid. The average rate of images from NVIDIA Isaac Sim is 20 FPS and that of the point cloud from the stereo GEM is 16 FPS.
The stereo GEM generates a disparity image and then a point cloud for all objects that are seen in the left and right images from the robot’s cameras. Using the Isaac ROS segmentation GEM, this disparity can be filtered to generate a point cloud that includes only points belonging to objects of interest, for instance, forklift tines.
The next section explains this filtering in more detail.
Disparity filtering using the NVIDIA Isaac ROS segmentation GEM
Here’s how deep learning models trained on synthetically generated data can be used with NVIDIA Isaac ROS inference GEMs. You achieve the same goal: Help robots avoid forklift tines in simulation using the GEMs along with the Nav2 stack.
However, instead of generating a point cloud for all objects in the robot’s cameras’ field of view, you filter and generate a focused point cloud only for forklift tines.
I used a segmentation model trained on images of forklift tines. The NVIDIA Isaac ROS segmentation GEM takes RGB images from the robot in simulation and generates the corresponding segmentation images using the given model.
Figure 6. Colored segmentation image generated at 39 FPS by the trained model on images from the robot’s cameras. (top left) The robot’s left camera view. (top right) the segmentation mask; (bottom) the filtered point cloud for forklift tines.
Each pixel in the raw segmentation image represents the class label for the object at that location in the image. Knowing the label of interest, for instance, if 2 represents forklift tines, set the non-interest points to invalid in the corresponding disparity image generated by the Stereo GEM. The point cloud generated as a result does not include these points. This is helpful to reduce noise in the point cloud.
Unlike the point cloud in Figure 4, this only has points belonging to forklift tines.
ROS domain IDs
As the NVIDIA Isaac ROS GEMs run within a container and NVIDIA Isaac Sim runs on the host, you must make sure that ROS topics can be communicated between the host and container.
For this, set the ROS domain ID of all processes to the same number. All ROS2 nodes using the same domain ID can communicate; those using different domain IDs cannot. For more information, see The ROS_DOMAIN_ID.
The workflow explained in this project avoids obstacles that can be detected by lidar and cameras. For obstacles that are too small or occluded, explore other sensors.
The approach is sensitive to disparity calculation and the quality of point cloud produced as a result. As calculating disparity is a challenging task, it is possible to get noisy point clouds that cause Nav2 to update the costmap incorrectly.
Disparity filtering is dependent on the performance of the segmentation model. A model that cannot produce accurate segmentation masks results in poorly filtered disparity and point clouds.