DataBloom - Part 306

Misc

Optimizing Your Data Center Network

Post author By
Post date May 24, 2022
No Comments on Optimizing Your Data Center Network

This post covers how network professionals can update their data center network infrastructure and protocol stack.

Data centers can be optimized by updating key network architectures in two ways: through networking technologies or operational efficiency in NetDevOps. In this post, we identify and evaluate technologies that you can apply to your network architecture to optimize your network.

We address five updates that you should consider for improving your data center:

Replace layer 2 VLANs with VXLAN.
Use Address Resolution Protocol (ARP) suppression to reduce broadcast propagation.
Replace multi-chassis link aggregation group (MLAG) with EVPN multihoming.
Handle traffic balancing with equal-cost multi-path (ECMP) routing and UCMP.
Accommodate traffic polarization with adaptive routing.

Replace VLANs with VXLANs

VXLAN is an overlay technology that uses encapsulation to allow a layer 2 overlay VLANs to span across layer 3 networks. Layer 2 networks have some inherent disadvantages:

Because they rely on spanning tree protocol (STP), the capability for redundancy and multiple paths is limited by the functionality of spanning tree.
They can only operate within one subnet, and redundancy is normally only limited to two devices due to MLAG.
Any path-level redundancy requires the Link Aggregation Control Protocol (LACP), the standard redundancy technology for ports.

VXLAN overcomes these deficiencies and allows the network operator to optimize on a layer 3 routed fabric. A layer 2 overlay can still be accomplished, but no longer requires spanning tree for control plane convergence due to the reliance of EVPN as the control plane.

EVPN exchanges MAC information through a BGP address family, instead of relying on the inefficiencies of broadcast flood and learn. Plus, VXLAN uses a 24-bit ID that can define up to 16 million virtual networks, whereas VLAN only has a 12-bit ID and is limited to 4094 virtual networks.

Use ARP suppression to reduce broadcast propagation

Broadcast traffic in data centers with VXLAN can be further optimized with ARP suppression. ARP suppression helps reduce traffic by using EVPN to proxy responses to ARP requests directly to clients from the ToR virtual tunnel end point (VTEP).

Without ARP suppression, all ARP requests are broadcast throughout the entire VXLAN fabric, sent to every VTEP that has a VNI for the network.
With ARP suppression enabled, MAC addresses learned over EVPN are passed down to the ARP control plane.

The leaf switch, which acts as the VTEP, responds directly back to the ARP requester through a proxy ARP reply.

Because the IP-to-MAC mappings are already communicated through the VXLAN control plane using EVPN type 2 messages, implementing ARP suppression enables optimization for faster resolution of the overlay control plane. It also reduces the amount of broadcast traffic in the fabric, as ARP suppression reduces the need for flooding ARP requests to every VTEP in the VXLAN infrastructure.

Replace MLAG with EVPN multihoming

Sometimes MLAG is still required in VXLAN environments for redundant host connectivity. EVPN multihoming is an opportunity to move off proprietary MLAG solutions that do not scale beyond one level of device redundancy.

As I mentioned earlier, VXLAN helps remove the need for back-to-back leaf-to-spine switch connections as required by MLAG. EVPN multihoming goes one step further and eliminates any need for MLAG in server-to-leaf connectivity.

Multihoming uses EVPN messages to communicate host connectivity, and it dynamically builds L2 adjacency to servers using host connectivity information. Where MLAG requires LAG IDs, multihoming uses Ethernet segment IDs. Interfaces are mapped to segments that act like logical connections to the same end host.

Additionally, moving to multihoming improves network vendor interoperability by using a protocol standard form of redundancy in the switch. Because multihoming uses BGP, an open standard protocol, any vendor implementing multihoming through the RFC specification can be part of the Ethernet segment.

ECMP and UCMP to handle traffic balancing

ECMP is a standard function in most layer 3 routing protocols where equal-cost routes are balanced across all available next-hop uplinks. Layer 2 control plane technologies like spanning-tree only allow equal-cost balancing by relying on external technologies like LACP.

ECMP is a native functionality in layer 3 routing, which enables you to get more efficiency out of your network devices.

There are cases where ECMP may lead to inefficient forwarding, specifically when doing a full layer 3 solution where point-to-point L3 links are used everywhere in the fabric, even to the host. In this case, you may want to balance traffic on a metric other than the number of links. UCMP can be useful here, as it uses BGP tags to create a distribution of traffic across hops to align better to your application distribution.

Accommodate traffic polarization with adaptive routing

Adaptive routing is an existing InfiniBand technology adopted by Ethernet switching. Adaptive routing monitors link bandwidth, link utilization, switch buffers, and ECN/PFC to understand when traffic on a specific path has become congested and would benefit from being dynamically rerouted through a less congested path.

Based on meeting these metric’s thresholds, the switch can redirect traffic from one egress interface to another egress interface in the ECMP group. This helps in fully leveraging all links on the switch equally, without the threat of polarization creating an inefficient traffic flow.

The goal of adaptive routing is to take any manual tuning intervention out of the hands of a network admin and let the infrastructure handle the optimizations for aggregate flow balancing.

Conclusion

In this post, we covered some concepts available in data center networking that can help you optimize a network infrastructure by focusing on the protocol stack and data plane. These optimizations provide better network virtualization, help reduce unnecessary control traffic on the infrastructure, and balance traffic across existing layer 1 links to fully use all the bandwidth available.

For more information, see the following resources:
NVIDIA Networking Solutions
EVPN Enhancements – ARP Suppression
EVPN Multihoming
Equal-Cost Multi-path Load Sharing – Hardware ECMP
Unequal-Cost Multi-path with BGP Link Bandwidth

Misc

TensorFlow Releases TensorFlow v2.9 With New Features

Post author By
Post date May 24, 2022
No Comments on TensorFlow Releases TensorFlow v2.9 With New Features

TensorFlow has announced the release of version 2.9 just three months after the release of version 2.8. OneDNN, a novel model distribution API, and DTensor, an API for smooth data and model parallelism migration, are the key highlights of this release.

OneDNN

The oneDNN performance package was added to TensorFlow to improve Intel CPUs’ performance. The experimental support for oneDNN in TensorFlow has been available since version 2.5, delivering a four-fold increase in speed. Linux x86 packages and CPUs with neural-network-focused hardware capabilities like AVX512 VNNI, AVX512 BF16, AMX, and others found on Intel Cascade Lake and newer CPUs, oneDNN optimizations will be turned on by default.

Dtensor

Dtensor is a new API for disseminating models and is one of the most notable features of this edition. DTensorflow allows shifting from data parallelism to single program multiple data (SPMD) based model parallelism, including spatial partitioning. Model inputs that are too massive for a single device can now be trained using new tools available to developers. A model code can be utilized on CPU, GPU, or TPU, regardless of the device, because it is a device-agnostic API. This job likewise gets rid of the coordinator and instead uses the task’s local devices to control them all. Model scaling can be accomplished without affecting startup time.

TF Blog: https://blog.tensorflow.org/2022/05/whats-new-in-tensorflow-29.html

submitted by /u/No_Coffee_4638
[visit reddit] [comments]

Misc

Adding new block/inputs to non-sequential network

Post author By
Post date May 24, 2022
No Comments on Adding new block/inputs to non-sequential network

I am designing a progressive GAN and I have been stuck on an issue for a couple days now. I have successfully made my generator grow, but increasing the size of my discriminator is not that easy. in my discriminator, I decided to try implementing an ADA layer (like in the generator in StyleGAN3) However I have so far been unsuccessful in connecting the old layers with an input from a new block. The main problem is the non-sequential nature of the discriminator, as I need to give multiple inputs for the multiplication and addition layers. I will give the code to construct my discriminator, however I believe my code to add a block to the discriminator is wholly non-functioning, so that will not be included.

def construct_disc(label_dim=50): # Kernel Init init = tf.keras.initializers.HeUniform(seed=1) # Create Discriminator Inputs im_in = tf.keras.layers.Input(shape = (8,8,3)) lab_in = tf.keras.layers.Input(shape = (label_dim,)) # Style Vector Describer D = tf.keras.layers.Dense(3 * 3 * 3, activation=tf.keras.layers.LeakyReLU(alpha=0.2), kernel_initializer=init)(lab_in) D = tf.keras.layers.Dense(3 * 3 * 3, activation=tf.keras.layers.LeakyReLU(alpha=0.2), kernel_initializer=init)(D) D = tf.keras.layers.Dense(3 * 3 * 3, activation=tf.keras.layers.LeakyReLU(alpha=0.2), kernel_initializer=init)(D) # Conv Block Begins G = tf.keras.layers.Conv2D(128,1, padding = 'same', activation=tf.keras.layers.LeakyReLU(alpha=0.2), kernel_initializer=init)(im_in) G = tf.keras.layers.Conv2D(128,3, padding = 'same', activation = tf.keras.layers.LeakyReLU(alpha=0.2), kernel_initializer=init)(G) # Create Dense Style Interpreter (This Is Part Of The Block) W = tf.keras.layers.Dense(1)(D) W = W[:,:,tf.newaxis,tf.newaxis] B = tf.keras.layers.Dense(1)(D) B = B[:,:,tf.newaxis,tf.newaxis] G = tf.math.multiply(W, G) G = tf.add(G, B) G = tf.keras.layers.Conv2D(128,3, padding = 'same', activation = tf.keras.layers.LeakyReLU(alpha=0.2), kernel_initializer=init)(G) # Block Ends Here ^ G = tf.keras.layers.AveragePooling2D(2)(G) G = tf.keras.layers.Conv2D(128,3, padding = 'same', activation = tf.keras.layers.LeakyReLU(alpha=0.2), kernel_initializer=init)(G) G = tf.keras.layers.Flatten()(G) out = tf.keras.layers.Dense(1, activation='sigmoid', kernel_initializer=init)(G) model = tf.keras.Model([im_in, lab_in], out) # Compile Model opt = tf.keras.optimizers.Adam(lr=0.0002, beta_1=0.5) model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), optimizer=opt) return model

submitted by /u/Yo1up
[visit reddit] [comments]

Misc

NVIDIA Experts Explore Robotics, GNNs, and NLP Advancements at the WeAreDevelopers World Congress

Post author By
Post date May 24, 2022
No Comments on NVIDIA Experts Explore Robotics, GNNs, and NLP Advancements at the WeAreDevelopers World Congress

Join expert speakers and the developer community at WeAreDevelopers World Congress June 14-15, to exchange ideas, share knowledge, and facilitate networking.

Join NVIDIA speakers in person at the WeAreDevelopers World Congress, held June 14-15 in Berlin, Germany. Over 2 days, the event brings together the developer world with over 200 speakers and 5,000 developers gracing the show floor.

Speakers and thought leaders from companies like GitHub, Google, and Stripe will discuss a multitude of developer topics. From programming languages and frameworks to DevOps and containers to machine learning and smart devices, there is a session for every developer.

Take a look at the WeAreDevelopers World Congress schedule.

Learn from NVIDIA experts

Below is a preview of what NVIDIA experts will be speaking about at the conference.

Graph Neural Networks: What’s Behind the Hype?

Stage 5 | Wednesday, June 15 | 9:50 – 10:20 a.m.

Ekaterina Sirazitdinova, data scientist at NVIDIA

Graph neural networks (GNNs) are AI models designed to derive insights from unstructured data described by graphs. For different segments and industries, GNNs find suitable applications such as molecular analysis, drug discovery, prediction of developments in stock market, thermodynamics analysis, and even modeling of the human brain. Unlike conventional CNNs, GNNs address the challenge of working with data in irregular domains.

In this talk, Ekaterina will provide an introductory overview of the theory behind GNNs, take a closer look at the types of problems that GNNs are well suited for, and discuss several approaches to model unstructured problems as classification or regression at various levels.

Enhancing AI-based Robotics with Simulation Workflows

Stage 5 | Wednesday, June 15 | 11:50 a.m. – 12:20 p.m.

Teresa Conceicao, senior solutions architect at NVIDIA

With an ever-growing AI and data-centric world, robots will become more intelligent, flexible, and robust. However, with these new paradigms, some challenges arise. AI Robots need data, training, and testing—and lots of it. Acquiring real data can be costly and laborious, training in the real world is not scalable, and finding real-world scenarios for difficult use cases can be near impossible. While testing cycles are time consuming and slow down development iterations.

In this session, Teresa will go over how Simulation can play a key role in enabling AI-based Robotics, and how to get started enhancing AI workflows with the NVIDIA Omniverse and NVIDIA Isaac platforms.

Natural Language Processing: Changing the Way We Tell Machines What We Want Them To Do

Available on-demand during the conference

Adam Grzywaczewski, senior deep learning data scientist at NVIDIA

Software development processes involve describing the desired functionality using a formal programming language, such as C++ or Python. Recent progress in NLP brings us closer to a point where formal programming languages are no longer necessary, and can be replaced by a plain text description of program’s behavior. An early example of this capability can be seen in GitHub coPilot.

In this talk, Adam will discuss the most recent breakthroughs in NLP and the reasons for their success. He’ll focus on how advances in AI have changed the software development process for NLP models to learn new, previously unseen tasks. He’ll conclude by sharing how tools such as NVIDIA Megatron-LM can be used to build SOTA NLP models like GPT-3 and deploy them to production.

WeAreDevelopers job board

Browse over 2,900 jobs across Europe, covering every type of developer role. Frontend, backend, full-stack, software, JavaScript, PHP, C#… whatever your skill set, they have a job listing for it.

See the WeAreDevelopers job board. >>

Hands-on labs, tutorials, and resources from NVIDIA

Join the NVIDIA Developer Program for free and exclusive access to SDKs, technical documentation, peer, and domain expert help. NVIDIA offers tools and training to accelerate AI, HPC, and graphics applications.

Discover the NVIDIA Deep Learning Institute for resources covering diverse learning needs—from learning materials to self-paced and live training to educator programs—giving individuals, teams, organizations, educators, and students tools to advance AI, accelerated computing, data science, graphics, and simulation knowledge.

Connect with NVIDIA at the WeAreDevelopers World Congress 2022. >>

Offsites

Image-Text Pre-training with Contrastive Captioners

Post author By
Post date May 24, 2022
No Comments on Image-Text Pre-training with Contrastive Captioners

Posted by Zirui Wang and Jiahui Yu, Research Scientists, Google Research, Brain Team

Oftentimes, machine learning (ML) model developers begin their design using a generic backbone model that is trained at scale and with capabilities transferable to a wide range of downstream tasks. In natural language processing, a number of popular backbone models, including BERT, T5, GPT-3 (sometimes also referred to as “foundation models”), are pre-trained on web-scale data and have demonstrated generic multi-tasking capabilities through zero-shot, few-shot or transfer learning. Compared with training over-specialized individual models, pre-training backbone models for a large number of downstream tasks can amortize the training costs, allowing one to overcome resource limitations when building large scale models.

In computer vision, pioneering work has shown the effectiveness of single-encoder models pre-trained for image classification to capture generic visual representations that are effective for other downstream tasks. More recently, contrastive dual-encoder (CLIP, ALIGN, Florence) and generative encoder-decoder (SimVLM) approaches trained using web-scale noisy image-text pairs have been explored. Dual-encoder models exhibit remarkable zero-shot image classification capabilities but are less effective for joint vision-language understanding. On the other hand, encoder-decoder methods are good at image captioning and visual question answering but cannot perform retrieval-style tasks.

In “CoCa: Contrastive Captioners are Image-Text Foundation Models”, we present a unified vision backbone model called Contrastive Captioner (CoCa). Our model is a novel encoder-decoder approach that simultaneously produces aligned unimodal image and text embeddings and joint multimodal representations, making it flexible enough to be directly applicable for all types of downstream tasks. Specifically, CoCa achieves state-of-the-art results on a series of vision and vision-language tasks spanning vision recognition, cross-modal alignment, and multimodal understanding. Furthermore, it learns highly generic representations so that it can perform as well or better than fully fine-tuned models with zero-shot learning or frozen encoders.

Overview of Contrastive Captioners (CoCa) compared to single-encoder, dual-encoder and encoder-decoder models.

Method
We propose CoCa, a unified training framework that combines contrastive loss and captioning loss on a single training data stream consisting of image annotations and noisy image-text pairs, effectively merging single-encoder, dual-encoder and encoder-decoder paradigms.

To this end, we present a novel encoder-decoder architecture where the encoder is a vision transformer (ViT), and the text decoder transformer is decoupled into two parts, a unimodal text decoder and a multimodal text decoder. We skip cross-attention in unimodal decoder layers to encode text-only representations for contrastive loss, and cascade multimodal decoder layers with cross-attention to image encoder outputs to learn multimodal image-text representations for captioning loss. This design maximizes the model’s flexibility and universality in accommodating a wide spectrum of tasks, and at the same time, it can be efficiently trained with a single forward and backward propagation for both training objectives, resulting in minimal computational overhead. Thus, the model can be trained end-to-end from scratch with training costs comparable to a naïve encoder-decoder model.

Illustration of forward propagation used by CoCa for both contrastive and captioning losses.

Benchmark Results
The CoCa model can be directly fine-tuned on many tasks with minimal adaptation. By doing so, our model achieves a series of state-of-the-art results on popular vision and multimodal benchmarks, including (1) visual recognition: ImageNet, Kinetics-400/600/700, and MiT; (2) cross-modal alignment: MS-COCO, Flickr30K, and MSR-VTT; and (3) multimodal understanding: VQA, SNLI-VE, NLVR2, and NoCaps.

Comparison of CoCa with other image-text backbone models (without task-specific customization) and multiple state-of-the-art task-specialized models.

It is noteworthy that CoCa attains these results as a single model adapted for all tasks while often lighter than prior top-performing specialized models. For example, CoCa obtains 91.0% ImageNet top-1 accuracy while using less than half the parameters of prior state-of-the-art models. In addition, CoCa also obtains strong generative capability of high-quality image captions.

Image classification scaling performance comparing fine-tuned ImageNet top-1 accuracy versus model size.

Text captions generated by CoCa with NoCaps images as input.

Zero-Shot Performance
Besides achieving excellent performance with fine-tuning, CoCa also outperforms previous state-of-the-art models on zero-shot learning tasks, including image classification,and cross-modal retrieval. CoCa obtains 86.3% zero-shot accuracy on ImageNet while also robustly outperforming prior models on challenging variant benchmarks, such as ImageNet-A, ImageNet-R, ImageNet-V2, and ImageNet-Sketch. As shown in the figure below, CoCa obtains better zero-shot accuracy with smaller model sizes compared to prior methods.

Image classification scaling performance comparing zero-shot ImageNet top-1 accuracy versus model size.

Frozen Encoder Representation
One particularly exciting observation is that CoCa achieves results comparable to the best fine-tuned models using only a frozen visual encoder, in which features extracted after model training are used to train a classifier, rather than the more computationally intensive effort of fine-tuning a model. On ImageNet, a frozen CoCa encoder with a learned classification head obtains 90.6% top-1 accuracy, which is better than the fully fine-tuned performance of existing backbone models (90.1%). We also find this setup to work extremely well for video recognition. We feed sampled video frames into the CoCa frozen image encoder individually, and fuse output features by attentional pooling before applying a learned classifier. This simple approach using a CoCa frozen image encoder achieves video action recognition top-1 accuracy of 88.0% on Kinetics-400 dataset and demonstrates that CoCa learns a highly generic visual representation with the combined training objectives.

Comparison of Frozen CoCa visual encoder with (multiple) best-performing fine-tuned models.

Conclusion
We present Contrastive Captioner (CoCa), a novel pre-training paradigm for image-text backbone models. This simple method is widely applicable to many types of vision and vision-language downstream tasks, and obtains state-of-the-art performance with minimal or even no task-specific adaptations.

Acknowledgements
We would like to thank our co-authors Vijay Vasudevan, Legg Yeung, Mojtaba Seyedhosseini, and Yonghui Wu who have been involved in all aspects of the project. We also would like to thank Yi-Ting Chen, Kaifeng Chen, Ye Xia, Zhen Li, Chao Jia, Yinfei Yang, Zhengdong Zhang, Wei Han, Yuan Cao, Tao Zhu, Futang Peng, Soham Ghosh, Zihang Dai, Xin Li, Anelia Angelova, Jason Baldridge, Izhak Shafran, Shengyang Dai, Abhijit Ogale, Zhifeng Chen, Claire Cui, Paul Natsev, Tom Duerig for helpful discussions, Andrew Dai for help with contrastive models, Christopher Fifty and Bowen Zhang for help with video models, Yuanzhong Xu for help with model scaling, Lucas Beyer for help with data preparation, Andy Zeng for help with MSR-VTT evaluation, Hieu Pham and Simon Kornblith for help with zero-shot evaluations, Erica Moreira and Victor Gomes for help with resource coordination, Liangliang Cao for proofreading, Tom Small for creating the animations used in this blogpost, and others in the Google Brain team for support throughout this project.

Misc

Improve Perception Performance for ROS 2 Applications with NVIDIA Isaac Transport for ROS

Post author By
Post date May 24, 2022
No Comments on Improve Perception Performance for ROS 2 Applications with NVIDIA Isaac Transport for ROS

Announcing NVIDIA Isaac Transport for ROS (NITROS) pipelines that use new ROS Humble features developed jointly with Open Robotics.

Working in collaboration since October 2021, NVIDIA and Open Robotics are introducing two important changes, now available in the Humble ROS 2 release for improved performance on compute platforms that offer hardware accelerators.

The new ROS 2 Humble hardware-acceleration features are called type adaptation and type negotiation. NVIDIA will release a software package-implementing type adaptation and type negotiation in the next NVIDIA Isaac ROS release (late June 2022).

These simple but powerful additions to the framework will significantly increase performance for developers seeking to incorporate AI/machine learning and computer vision functionality into their ROS-based applications.

“As ROS developers add more autonomy to their robot applications, the on-robot computers are becoming much more powerful. We have been working to evolve the ROS framework to make sure that it can take advantage of high-performance hardware resources in these edge computers,” said Brian Gerkey, CEO of Open Robotics.

“Working closely with the NVIDIA robotics team, we are excited to share new features (type adaptation and negotiation) in the Humble release that will benefit the entire ROS community’s efforts to embrace hardware acceleration.”

Eliminating overhead of hardware acceleration

Type adaptation

It is common for hardware accelerators to require a different data format to deliver optimal performance. Type adaptation (REP-2007) can now be used for ROS nodes to work in the format better suited for the hardware. Processing pipelines can eliminate memory copies between the CPU and the memory accelerator using the adapted type. Unnecessary memory copies consume CPU compute, waste power, and slow down performance, especially as the size of the images increases.

Type negotiation

Another new innovation is type negotiation (REP-2009). Different ROS nodes in a processing pipeline can advertise their supported types, so that formats yielding ideal performance are chosen. The ROS framework performs this negotiation process and maintains compatibility with legacy nodes that don’t support negotiation.

Accelerating processing pipelines using type adaptation and negotiation makes hardware accelerator zero-copy possible. This reduces software/CPU overhead and unlocks the potential of the underlying hardware. As roboticists migrate to more powerful compute platforms like NVIDIA Jetson Orin, they can expect to realize more of the performance gains enabled by the hardware.

These changes are done completely inside of ROS 2, which ensures compatibility with existing tools, workflows, and codebases.

Two examples of hardware accelerated compute graphs. — *Figure 1. Comparing hardware accelerated pipelines with and without type adaptation and negotiation*

Type adaptation and negotiation have shown promising results. A benchmark consisting of a graph of ROS nodes, with minimal compute in each node, was run on ROS 2 Foxy and ROS 2 Humble so that we could observe the underlying framework performance. We ran this benchmark on Jetson AGX Xavier and the new Jetson AGX Orin. We observed a 3x improvement on Xavier and an impressive 7x improvement on Orin.

Bar Chart of Framework Performance — *Figure 2. Type adaptation framework benchmark performance comparing ROS 2 Foxy and ROS 2 Humble on Jetson AGX Xavier and Jetson AGX Orin*

Introducing NVIDIA Isaac for Transport for ROS

The NVIDIA implementation of type adaption and negotiation are called NITROS. These are ROS processing pipelines made up of Isaac ROS hardware accelerated modules (a.k.a. GEMs). These pipelines will be available in Isaac ROS Developer Preview (DP) scheduled for late June 2022. The first release of NITROS will include three pipelines and more are planned for later in the year.

NITROS Pipeline	ROS 2 Nodes in Pipeline
AprilTag Detection Pipeline	ArgusCameraMono (Raw Image) – Rectify – (Rectified Image) – AprilTag (AprilTag Detection)
Stereo Disparity Pipeline	ArgusCameraStereo (Raw Image) – Rectify – (Rectified Image) – ESSDisparity (DNN Inference) – PointCloud (Point Cloud Output)
Image Segmentation Pipeline	ArgusCameraMono (Raw Image) – Rectify – (Rectified Image) – DNNImageEncode (DNN Pre-Processed Tensors) – Triton (DNN Inference) – UNetDecode (Segmentation Image)

Table 1. NITROS pipelines in DP release

Powerful new GEMs aid robotics perception

In addition to the NITROS accelerated pipelines, the Isaac ROS DP release contains two new DNN-based GEMs designed to help roboticists with common perception tasks.

The first GEM, ESS, is a DNN for stereo camera disparity prediction. The network provides vision-based continuous depth perception for robotics applications.

The other GEM, Bi3D, is a DNN for vision-based obstacle prediction. The DNN, based on groundbreaking work from NVIDIA Research, is enhanced to detect free space with obstacle predictions simultaneously. The network predicts if an obstacle is within one of four programmable proximity fields from a stereo camera.

Bi3D is optimized to run on NVIDIA DLA hardware. Leveraging the DLA, both GPU and CPU compute resources are preserved.

Both Bi3D and ESS are pretrained for robotics applications using synthetic and real data and are intended for commercial use. These two new Isaac ROS GEMs join stereo_image_proc, a classic computer vision stereo depth disparity routine previously released, to offer three diverse, independent functions for stereo camera depth perception.

Comparing results of depth perception on real and synthetic images

Isaac ROS 2 GEM	Description
Image Pipeline	Camera Image Processing
NVBlox	3D Scene Reconstruction
Visual SLAM	VSLAM and Stereo Odometry
AprilTags	Apriltag Detection and Pose Estimation
Pose Estimation	3D Object Pose Estimation
Image Segmentation	Semantic Image Segmentation
Object Detection	DNN for Object Detection using DetectNet
DNN Inference	DNN Node for using Triton/TensorRT
Argus Camera	CSI/GSML Camera Support

Table 2. Available ROS GEM packages

Getting started

ROS developers interested in integrating NVIDIA AI Perception to their products should get started today with Isaac ROS.

Misc

Dots and boxes game

hey dear, Can you please help me? I saw that you made dots and boxes game in your school project. Actually I have also to make this game and I have the same issue as you. Could you please show me how you did it? Thank you in advance.

submitted by /u/Successful-Increase6
[visit reddit] [comments]

Misc

Build 3D Virtual Worlds at Human Scale with the NVIDIA Omniverse XR App

Post author By
Post date May 24, 2022
No Comments on Build 3D Virtual Worlds at Human Scale with the NVIDIA Omniverse XR App

Users can now produce 3D virtual worlds at human scale with the new Omniverse XR App available in beta from the NVIDIA Omniverse launcher.

At Computex this week, NVIDIA announced new releases and expansions to the collaborative graphic platform’s growing ecosystem, including the beta release of Omniverse XR. With this Omniverse app, creators and developers can drop into 3D scenes in VR and AR with full RTX ray tracing. Users can also review, manipulate, and annotate high-poly production assets without preprocessing.

Effortless, ray-traced VR

With the Omniverse XR App users can load Universal Scene Description (USD) production assets directly into the editor. You can jump directly into VR or AR and view or interact with them in ray-traced VR without preprocessing.

Unlike today’s interactive VR engines, Omniverse XR loads USD scenes directly for editing. This removes the need for pregenerated levels of detail since ray tracing is less sensitive to geometry complexity.

The scene below was modeled with 70 million polygons; with instancing, the total scene contains 18 billion polygons.

Figure 1. Production-quality USD-based assets load directly into a scene in Omniverse XR.

Realism out of the box

Fully raytraced VR accounts for a sizable contribution to realism. The human eye notices when contact shadows don’t reflect a scene’s geometry or when reflections don’t move with body motion.

With Omniverse XR, engineers, designers, and researchers are able to see soft shadows, translucency, and real reflections out of the box, making the experience feel truly immersive and realistic.

Figure 2. View a complete 3D scene in VR with the Omniverse XR App.

Navigate and manipulate in VR

Omniverse XR users can teleport and manipulate scene objects in VR. Currently, Oculus and HTC Vive controllers are supported. Since the Omniverse Kit uses real-time ray tracing, ray casting can be done with ease, leading to higher precision in close-range interactions.

Figure 3. Achieve precise object manipulation in Omniverse XR.

Continuous foveated rendering

Omniverse XR uses foveated rendering. This technique samples a region of an HMD screen at a higher shading rate and shrinks the number of pixels rendered to 30% of the image plane. These are pixels that a user won’t see in full resolution.

The foveation technique that comes with Omniverse XR is built specifically for real-time ray tracing. This removes the boundaries between resolution regions, avoiding the need to reshape an image plane when the fovea is off center.

Interior kitchen scene with sunlight and shadow. — *Figure 4. Example of foveated rendering off (left) and on (right).*

One-step AR streaming

With Omniverse XR, you can also experience tablet AR streaming, with the NVIDIA CloudXR streaming platform. This mode opens a virtual camera on the tablet, linked to the Omniverse XR session.

To use Tablet AR mode, content creators and developers can download the Omniverse Streaming Client for I/Os. It is available from the Apple Store for iPad I/Os 14.5 and later, or with an APK and code examples available for Android Tablets.

Figure 5. Experience tablet AR streaming.

Explore Omniverse XR

The Omniverse XR App is now available in beta from the Omniverse Launcher. To learn more, check out the tutorial video or attend the Q&A livestream on May 25 at 11 a.m. PDT / 8 p.m. CET.

Additional resources

Learn more by diving into the Omniverse Resource Center, which details how developers can build custom applications and extensions for the platform.

For additional support, explore the Omniverse forums and Medium channel, tutorials on Twitter and YouTube, and join our Discord server to chat with the community.

Misc

Master of Arts: NVIDIA RTX GPUs Accelerate Creative Ecosystems, Delivering Unmatched AI and Ray-Tracing Performance

Post author By
Post date May 24, 2022
No Comments on Master of Arts: NVIDIA RTX GPUs Accelerate Creative Ecosystems, Delivering Unmatched AI and Ray-Tracing Performance

The future of content creation was on full display during the virtual NVIDIA keynote at COMPUTEX 2022, as the NVIDIA Studio platform expands with new Studio laptops and RTX-powered AI apps — all backed by the May Studio Driver released today.

The post Master of Arts: NVIDIA RTX GPUs Accelerate Creative Ecosystems, Delivering Unmatched AI and Ray-Tracing Performance appeared first on NVIDIA Blog.

Misc

NVIDIA Partners Announce Wave of New Jetson AGX Orin Servers and Appliances at COMPUTEX

Post author By
Post date May 24, 2022
No Comments on NVIDIA Partners Announce Wave of New Jetson AGX Orin Servers and Appliances at COMPUTEX

More than 30 leading technology partners worldwide announced this week the first wave of NVIDIA Jetson AGX Orin-powered production systems at COMPUTEX in Taipei. New products are coming from a dozen Taiwan-based camera, sensor and hardware providers for use in edge AI, AIoT, robotics and embedded applications. Available worldwide since GTC in March, the NVIDIA Read article >

The post NVIDIA Partners Announce Wave of New Jetson AGX Orin Servers and Appliances at COMPUTEX appeared first on NVIDIA Blog.