Categories
Misc

A Devotion to Emotion: Hume AI’s Alan Cowen on the Intersection of AI and Empathy

Can machines experience emotions? They might, according to Hume AI, an AI research lab and technology company that aims to “ensure artificial intelligence is built to serve human goals and emotional well-being.” So how can AI genuinely understand how we are feeling, and respond appropriately? On this episode of NVIDIA’s AI Podcast, host Noah Kravitz Read article >

The post A Devotion to Emotion: Hume AI’s Alan Cowen on the Intersection of AI and Empathy appeared first on NVIDIA Blog.

Categories
Misc

Deciphering the Future: HPE Switches on AI Supercomputer in France

Recalling the French linguist who deciphered the Rosetta Stone 150 years ago, Hewlett Packard Enterprise today switched on a tool to unravel its customers’ knottiest problems. The Champollion AI supercomputer takes its name from Jean-François Champollion (1790-1832), who decoded hieroglyphics that opened a door to study of ancient Egypt’s culture. Like Champollion, the mega-system resides Read article >

The post Deciphering the Future: HPE Switches on AI Supercomputer in France appeared first on NVIDIA Blog.

Categories
Misc

Making Robotics Easier with BenchBot and NVIDIA Isaac Sim

We built BenchBot to allow roboticists to spend more time researching the exciting and interesting problems in robotics. This post tells BenchBot’s story.

Working on robotics is full of exciting and interesting problems but also days lost to humbling problems like sensor calibration, building transform trees, managing distributed systems, and debugging bizarre failures in brittle systems.

We built the BenchBot platform at QUT’s Centre for Robotics (QCR) to enable roboticists to focus their time on researching the exciting and interesting problems present in robotics.

We also recently upgraded to the new NVIDIA Omniverse-powered NVIDIA Isaac Sim, which has bought a raft of significant improvements to the BenchBot platform. Whether robotics is your hobby, academic pursuit, or job, BenchBot along with NVIDIA Isaac Sim capabilities enables you to jump into the wonderful world of robotics with only a few lines of Python. In this post, we share how we created BenchBot, what it enables, where we plan to take it in the future, and where you can take it in your own work. Our goal is to give you the tools to start working on your own robotics projects and research by presenting ideas about what you can do with BenchBot. We also share what we learned when integrating with the new NVIDIA Isaac Sim.

A 5x5 collage of the starting locations BenchBot’s 25 available environments.
Figure 1. BenchBot gives users access to 25 photorealistic environments out of the box, with variations in lighting and objects present (see the BEAR dataset)

This post also supplies context for our Robotic Vision Scene Understanding (RVSU) challenge, currently in its third iteration. The RVSU challenge is a chance to get hands-on in trying to solve a fundamental problem for domestic robots: how can they understand what is in their environment, and where. By competing, you can win a share in prizes including NVIDIA A6000 GPUs and $2,500 USD cash.

The story behind BenchBot

BenchBot addressed a need in our semantic scene understanding research. We’d hosted an object detection challenge and produced novel evaluation metrics but needed to expand this work to the robotics domain:

  • What is understanding a scene?
  • How can the level of understanding be evaluated?
  • What role does agency play in understanding a scene?
  • Can understanding in simulation transfer to the real world?
  • What’s required of a simulation for understanding to transfer to the real world?

We made the BenchBot platform to enable you to focus on these big questions, without becoming lost in the sea of challenges typically thrown up by robotic systems. BenchBot consists of many moving parts that abstract these operational complexities away (Figure 2).

Complex workflow diagram with phases denoting the user submission, evaluation, and backend HTTP calls.
Figure 2. System overview of the BenchBot platform

Here are some of the key components and features of the BenchBot architecture:

  • You create solutions to robotics problems by writing a Python script that calls the BenchBot API.
  • You can easily understand how well your solution performed a given robotics task using customisable evaluation tools.
  • The supervisor brokers communications between the high-level Python API and low-level interfaces of typical robotic systems.
  • The supervisor is backend-agnostic. The robot can be real or simulated, it just needs to be running ROS.
  • All configurations live in a modular add-on system, allowing you to easily extend the system with your own tasks, robots, environments, evaluation methods, examples, and more.

All our code is open source under an MIT license. For more information, see BenchBot: Evaluating Robotics Research in Photorealistic 3D Simulation and on Real Robots (PDF).

A lot of moving parts isn’t necessarily a good thing if they complicate the user experience, so designing the user experience was also a central focus in developing BenchBot.

There are three basic commands for controlling the system:

benchbot_install –help
benchbot_run –help
benchbot_submit –help

The following command helps builds powerful evaluation workflows across multiple environments:

benchbot_batch –help

Here’s a simple Python command for interacting with the sensorimotor capabilities of a robot:

python
from benchbot_api import Agent, BenchBot

class MyAgent(Agent):
    def is_done(self, action_result):
        …

    def pick_action(self, observations, action_list):
        …

    def save_result(self, filename, empty_results, results_format_fns):
        …

BenchBot(agent=MyAgent()).run()

With a simple Python API, world-class photorealistic simulation, and only a handful of commands needed to manage the entire system, we were ready to apply BenchBot to our first big output: the RVSU challenge.

RVSU challenge

The RVSU challenge prompts researchers to develop robotic vision systems that understand both the semantic and geometric aspects of the surrounding environment. The challenge consists of six tasks, featuring multiple difficulty levels for object-based, semantic, simultaneous localization and mapping (SLAM) and scene change detection (SCD).

The challenge also focuses on a core requirement for household robots: they need to understand what objects are in their environment, and where they are. This problem in itself is the first challenge captured in our semantic SLAM tasks, where a robot must explore an environment, find all objects of interest, and add them to a 3D map.

The SCD task takes this a step further, asking a robot to report changes to the objects in the environment at a different point in time. My colleague David Hall presented an excellent overview of the challenge in the following video.

Video 1. Scene Understanding Challenges

Bringing the RVSU challenge to life with NVIDIA Isaac Sim

Recently, we upgraded BenchBot from using the old Unreal Engine-based NVIDIA Isaac Sim to the new Omniverse-powered NVIDIA Isaac Sim. This brought a number of key benefits to BenchBot, leaving us excited about where we can go with Omniverse-powered simulations in the future. The areas in which we saw significant benefits included the following:

  • Quality: NVIDIA RTX rendering produced beautiful photorealistic simulations, all with the same assets that we were using before.
  • Performance: We accessed powerful dynamic lighting effects, with intricately mapped reflections, all produced in real-time for live simulation with realistic physics.
  • Customizability: The Python APIs for Omniverse and NVIDIA Isaac Sim give complete control of the simulator, allowing us to restart simulations, swap out environments, and move robots programmatically.
  • Simplicity: We replaced an entire library of C++ interfaces with a single Python file.

The qcr/benchbot_sim_omni repository captures our learnings in transitioning to the new NVIDIA Isaac Sim, and also works as a standalone package outside the BenchBot ecosystem. The package is a customizable HTTP API for loading environments, placing robots, and controlling simulations. It serves as a great starting point for programmatically running simulations with NVIDIA Isaac Sim.

We welcome pull requests and suggestions for how to expand the capabilities of this package. We also hope it can offer some useful examples for starting your own projects with NVIDIA Isaac Sim, such as the following examples.

Opening environments in NVIDIA Isaac Sim

Opening environments first requires a running simulator instance. A new instance is created by instantiating the SimulationApp class, with the open_usd option letting you pick an environment to open initially:

python
from omni.isaac.kit import SimulationApp

inst = SimulationApp({
            "renderer": "RayTracedLighting",
            "headless": False,
	“open_usd”: MAP_USD_PATH,
        })

It’s worth noting that only one simulation instance can run per Python script, and NVIDIA Isaac Sim components must be imported after initializing the instance.

Select a different stage at runtime by using helpers in the NVIDIA Isaac Sim API:

python
from omni.isaac.core.utils.stage import open_stage, update_stage

open_stage(usd_path=MAP_USD_PATH)
update_stage()

Placing a robot in the environment

Before starting a simulation, load and place a robot in the environment. Do this with the Robot class and the following lines of code:

python
from omni.isaac.core.robots import Robot
from omni.isaac.core.utils.stage import add_reference_to_stage, update_stage

add_reference_to_stage(usd_path=ROBOT_USD_PATH, prim_path=ROBOT_PRIM_PATH)
robot = Robot(prim_path=ROBOT_PRIM_PATH, name=ROBOT_NAME)
robot.set_world_pose(position=NP_XYZ * 100, orientation=NP_QUATERNION)
update_stage()

Controlling the simulation

Simulations in NVIDIA Isaac Sim are controlled by the SimulationContext class:

python
from omni.isaac.core import SimulationContext
sim = SimulationContext()
sim.play()

Then, the step method gives fine-grained control over the simulation, which runs at 60Hz. We used this control to manage our sensor publishing, transform trees, and state detection logic.

Another useful code example we stumbled upon was using the dynamic_control module to get the robot’s ground truth pose during a simulation:

python
from omni.isaac.dynamic_control import _dynamic_control
dc = _dynamic_control.acquire_dynamic_control_interface()
robot_dc = dc.get_articulation_root_body(dc.get_object(ROBOT_PRIM_PATH))
gt_pose = dc.get_rigid_body_pose(robot_dc)

Results

Hopefully these code examples are helpful to you in getting started with NVIDIA Isaac Sim. With not much more than these, we’ve had some impressive results:

  • A remarkable improvement in our photorealistic simulations
  • Powerful real-time lighting effects
  • Full customization through basic Python code

Figures 3, 4, and 5 show some of our favorite visual improvements from making the transition to Omniverse.

A collage of improved surface reflections in the new Omniverse-powered NVIDIA Isaac Sim.
Figure 3. Surfaces give more realistic reflections in the new NVIDIA Isaac Sim (bottom row) than the previous version (top row)
A collage of improved shadows and texture rendering in the new Omniverse-powered NVIDIA Isaac Sim.
Figure 4. Shadows and texture rendering feels more realistic in the new NVIDIA Isaac Sim (bottom row) than they previously did (top row)
A visual comparison of the overall impro
Figure 5. New NVIDIA Isaac Sim leads to environments that feel more immersive (right), with previous environments (left) feeling “flat” in comparison

Taking it further: BenchBot in other domains

Although semantic scene understanding is a focal point of our research and the origins of its use in research, BenchBot’s applications aren’t limited solely to this domain. BenchBot is built using a rich add-on architecture allowing modular additions and adaptations of the system to different problem domains.

The visual learning and understanding research program at QCR has started using this flexibility to apply BenchBot and its Omniverse-powered simulations to a range of interesting problems. Figure 6 shows a few areas where we’re looking at employing BenchBot:

Four applications of BenchBot: QuadricSLAM, data gathering, object-level NeRFs, and scene-level NeRFs
Figure 6. Future uses for BenchBot, clockwise from top-right: semantic mapping with quadrics, synthetic datasets with object-level segmentation, object-level NeRFs from noisy data, and understanding scenes with NeRFs

We’ve made BenchBot with a heavy focus on being able to fit it to your research problems. As much as we’re enjoying applying it to our research problems, we’re excited to see where others take it. Creating your own add-ons is documented in the add-ons repository, and we’d love to add some third-party add-ons to the official add-ons organization.

Conclusion

We hope this in-depth review has been insightful, and helps you step into robotics to work on the problems that excite us roboticists.

We welcome entries for the RVSU challenge, whether your interest in semantic scene understanding is casual or formal, academic or industrial, competitive or iterative. We think you’ll find competing in the challenge with the BenchBot system an enriching experience. You can register for the challenge, and submit entries through the EvalAI challenge page.

If you’re looking for where to go next with BenchBot and Omniverse, here are some suggestions:

At QCR, we’re excited to see where robotics is heading. With tools like BenchBot and the new Omniverse-powered NVIDIA Isaac Sim, there’s never been a better time to jump in and start playing with robotics.

Categories
Misc

TypeError: list indices must be integers or slices, not str

I am trying to do this tutorial (https://analyticsindiamag.com/a-hands-on-guide-to-automatic-music-generation-using-rnn/). I have fixed every error up until the actual creation of the first tensor.I run this code:

filenames = glob.glob(str(pathlib.Path('/content/maestro-v2.0.0/')/'**/*.mid*')) all_notes = [] for f in filenames[:5]: notes = midi_to_df(f) all_notes.append(notes) notes_df = pd.concat(all_notes) key_order = ['pitch', 'step', 'duration'] train_notes = np.stack([all_notes[key] for key in key_order], axis=1) notes_ds = tf.data.Dataset.from_tensor_slices(train_notes) notes_ds.element_spec 

Then get this error:

 Traceback (most recent call last) <ipython-input-70-eeb27cd23560> in <module>() 13 notes_df = pd.concat(all_notes) 14 key_order = ['pitch', 'step', 'duration'] ---> 15 train_notes = np.array([all_notes[key] for key in key_order], axis=1) 16 notes_ds = tf.data.Dataset.from_tensor_slices(train_notes) 17 notes_ds.element_spec <ipython-input-70-eeb27cd23560> in <listcomp>(.0) 13 notes_df = pd.concat(all_notes) 14 key_order = ['pitch', 'step', 'duration'] ---> 15 train_notes = np.array([all_notes[key] for key in key_order], axis=1) 16 notes_ds = tf.data.Dataset.from_tensor_slices(train_notes) 17 notes_ds.element_spec TypeError: list indices must be integers or slices, not str 

Any help would be appreciated. I’ve spent the last two weeks going through every tutorial I can find and it feels like literally not a single one works without an error every five lines that takes me two hours to research how to fix, some are easy, but all of them lead to something I can’t figure out, and I give up. I am determined to get one of these to work. And this one seems to simple but ofc I’ve reached a point I can’t figure out. PLEASE help me out here. thank you

submitted by /u/GoeticGlossolalia
[visit reddit] [comments]

Categories
Misc

Advancing Autonomous Valet Functionality with Parking Sign Assist

Autonomous parking involves complex perception algorithms. We present an AI-based parking sign assist system relying on live perception that can fuse to map systems.

Here’s the latest video in the NVIDIA DRIVE Labs series. These videos take an engineering-focused look at individual autonomous vehicle challenges and how the NVIDIA DRIVE team is mastering them. Catch up on more NVIDIA DRIVE posts.

Autonomous parking involves an array of complex perception and decision-making algorithms and traditionally relies on high-definition (HD) maps to retrieve parking information.

However, map coverage and poor or outdated localization information can limit such systems. Adding to this complexity, the system must understand and interpret parking rules that vary from region to region.

In this DRIVE Labs post, we show how AI-based live perception can help scale autonomous parking to regions across the globe.

Video 1. Advancing Autonomous Valet Functionality with Parking Sign Assist

Autonomous parking system overview

Understanding and interpreting parking rules can be more nuanced than it appears.

Different parking rules within the effective region can be overridden. For example, “No Stopping” can overwrite “No Parking.”

In addition, nonparking-related signs can infer parking rules. For example, in Germany, parking is not allowed within 15 meters of any bus stop signs. In the U.S., parking is illegal within 30 feet before a stop sign.

Finally, besides explicit clues like a physical sign, there are implicit signs that carry parking information. For example, in many areas, an intersection indicates the end of the previous active parking rule.

An advanced algorithm-based parking sign assist (PSA) system is critical for autonomous vehicles to understand the complexity of parking rules and react accordingly.

Traditional PSA systems rely on input from HD maps alone. However, the NVIDIA DRIVE AV software stack leverages state-of-the-art deep neural networks (DNNs) and computer vision algorithms to improve the coverage and robustness of autonomous parking in real-world scenarios. These techniques can detect, track, and classify a wide variety of parking traffic signs and road intersections in real time.

  • The WaitNet DNN detects traffic signs and intersections.
  • The wait perception stack tracks individual signs and intersections to provide 3D positions through triangulation.
  • SignNet DNN identifies traffic sign types.

The results from the modules are then fed into the PSA system, which uses the data to determine whether the car is in a parking strip, what the restrictions are, and whether the car is allowed to stop or park within the region.

Parking sign assist overview

After the PSA system receives the detected parking signs and road intersections, it abstracts the object into a Start Parking Sign or an End Parking Sign. This level of abstraction allows the system to scale worldwide.

A Start Parking Sign marks a potential start of a new parking strip and an End Parking Sign may close one or more existing parking strips. Figures 1 and 2 show how parking strips are formed.

Diagram of a vehicle alongside a no parking zone turning into a parking zone.
Figure 1. Forming parking strips

Figure 1 abstracts signs and road intersections to form parking strips. The diagram shows that a single sign can generate multiple virtual signs. For example, the sign in the middle serves as the “end” sign for the leftmost sign and it serves as the “start” for the rightmost sign.

One bus stop sign defines a complete no parking area, which consists of one virtual start sign and one virtual end sign.
Figure 2. A car alongside a no parking zone.

In addition to forming a parking strip, the PSA system uses the semantic meaning of signs to classify a parking strip as no-parking, no-stopping, parking-allowed, and unknown states. Then this information can be provided to the driver or any autonomous parking system.

Three frames of the PSA system identifying different types of parking signs.
Figure 3. The high-level working of a PSA

Figure 3 shows the main function workflow of the PSA system. In Frame A, the “Parking Area Start” sign is detected and a new parking strip is created. After the car drives a while, a “Parking Area End” sign is detected, which matches the start sign of that parking strip.

Finally, the PSA system stores all active parking strips in its memory and signals the driver the current parking status based on traffic rules implied by the parking strip in effect.

Conclusion

The PSA system achieves complex decision-making with remarkable accuracy, running in just a few milliseconds on NVIDIA DRIVE AGX. It is also compatible with any perception-only autonomous vehicle stack that uses live camera sensor input.

Our current SignNet DNN supports more than 20 parking signs in Europe, including bus stop signs, no parking signs, and no stopping signs, with coverage continuing to expand. We are also adding optical character recognition (OCR) and natural language processing (NLP) modules into the system to handle complex information carried by written texts on the signs.

To learn more about the software functionality that we are building, see the rest of the NVIDIA DRIVE Lab video series.

Categories
Misc

NVIDIA Announces Financial Results for First Quarter Fiscal 2023

NVIDIA today reported record revenue for the first quarter ended May 1, 2022, of $8.29 billion, up 46% from a year ago and up 8% from the previous quarter, with record revenue in Data Center and Gaming…

Categories
Offsites

Deep Learning with Label Differential Privacy

Over the last several years, there has been an increased focus on developing differential privacy (DP) machine learning (ML) algorithms. DP has been the basis of several practical deployments in industry — and has even been employed by the U.S. Census — because it enables the understanding of system and algorithm privacy guarantees. The underlying assumption of DP is that changing a single user’s contribution to an algorithm should not significantly change its output distribution.

In the standard supervised learning setting, a model is trained to make a prediction of the label for each input given a training set of example pairs {[input1,label1], …, [inputn, labeln]}. In the case of deep learning, previous work introduced a DP training framework, DP-SGD, that was integrated into TensorFlow and PyTorch. DP-SGD protects the privacy of each example pair [input, label] by adding noise to the stochastic gradient descent (SGD) training algorithm. Yet despite extensive efforts, in most cases, the accuracy of models trained with DP-SGD remains significantly lower than that of non-private models.

DP algorithms include a privacy budget, ε, which quantifies the worst-case privacy loss for each user. Specifically, ε reflects how much the probability of any particular output of a DP algorithm can change if one replaces any example of the training set with an arbitrarily different one. So, a smaller ε corresponds to better privacy, as the algorithm is more indifferent to changes of a single example. However, since smaller ε tends to hurt model utility more, it is not uncommon to consider ε up to 8 in deep learning applications. Notably, for the widely used multiclass image classification dataset, CIFAR-10, the highest reported accuracy (without pre-training) for DP models with ε = 3 is 69.3%, a result that relies on handcrafted visual features. In contrast, non-private scenarios (ε = ∞) with learned features have shown to achieve >95% accuracy while using modern neural network architectures. This performance gap remains a roadblock for many real-world applications to adopt DP. Moreover, despite recent advances, DP-SGD often comes with increased computation and memory overhead due to slower convergence and the need to compute the norm of the per-example gradient.

In “Deep Learning with Label Differential Privacy”, presented at NeurIPS 2021, we consider a more relaxed, but important, special case called label differential privacy (LabelDP), where we assume the inputs (input1, …, inputn) are public, and only the privacy of the training labels (label1, …, labeln) needs to be protected. With this relaxed guarantee, we can design novel algorithms that utilize a prior understanding of the labels to improve the model utility. We demonstrate that LabelDP achieves 20% higher accuracy than DP-SGD on the CIFAR-10 dataset. Our results across multiple tasks confirm that LabelDP could significantly narrow the performance gap between private models and their non-private counterparts, mitigating the challenges in real world applications. We also present a multi-stage algorithm for training deep neural networks with LabelDP. Finally, we are excited to release the code for this multi-stage training algorithm.

LabelDP
The notion of LabelDP has been studied in the Probably Approximately Correct (PAC) learning setting, and captures several practical scenarios. Examples include: (i) computational advertising, where impressions are known to the advertiser and thus considered non-sensitive, but conversions reveal user interest and are thus private; (ii) recommendation systems, where the choices are known to a streaming service provider, but the user ratings are considered sensitive; and (iii) user surveys and analytics, where demographic information (e.g., age, gender) is non-sensitive, but income is sensitive.

We make several key observations in this scenario. (i) When only the labels need to be protected, much simpler algorithms can be applied for data preprocessing to achieve LabelDP without any modifications to the existing deep learning training pipeline. For example, the classic Randomized Response (RR) algorithm, designed to eliminate evasive answer biases in survey aggregation, achieves LabelDP by simply flipping the label to a random one with a probability that depends on ε. (ii) Conditioned on the (public) input, we can compute a prior probability distribution, which provides a prior belief of the likelihood of the class labels for the given input. With a novel variant of RR, RR-with-prior, we can incorporate prior information to reduce the label noise while maintaining the same privacy guarantee as classical RR.

The figure below illustrates how RR-with-prior works. Assume a model is built to classify an input image into 10 categories. Consider a training example with the label “airplane”. To guarantee LabelDP, classical RR returns a random label sampled according to a given distribution (see the top-right panel of the figure below). The smaller the targeted privacy budget ε is, the larger the probability of sampling an incorrect label has to be. Now assume we have a prior probability showing that the given input is “likely an object that flies” (lower left panel). With the prior, RR-with-prior will discard all labels with small prior and only sample from the remaining labels. By dropping these unlikely labels, the probability of returning the correct label is significantly increased, while maintaining the same privacy budget ε (lower right panel).

Randomized response: If no prior information is given (top-left), all classes are sampled with equal probability. The probability of sampling the true class (P[airplane] ≈ 0.5) is higher if the privacy budget is higher (top-right). RR-with-prior: Assuming a prior distribution (bottom-left), unlikely classes are “suppressed” from the sampling distribution (bottom-right). So the probability of sampling the true class (P[airplane] ≈ 0.9) is increased under the same privacy budget.

A Multi-stage Training Algorithm
Based on the RR-with-prior observations, we present a multi-stage algorithm for training deep neural networks with LabelDP. First, the training set is randomly partitioned into multiple subsets. An initial model is then trained on the first subset using classical RR. Finally, the algorithm divides the data into multiple parts, and at each stage, a single part is used to train the model. The labels are produced using RR-with-prior, and the priors are based on the prediction of the model trained so far.

An illustration of the multi-stage training algorithm. The training set is partitioned into t disjoint subsets. An initial model is trained on the first subset using classical RR. Then the trained model is used to provide prior predictions in the RR-with-prior step and in the training of the later stages.

Results
We benchmark the multi-stage training algorithm’s empirical performance on multiple datasets, domains, and architectures. On the CIFAR-10 multi-class classification task for the same privacy budget ε, the multi-stage training algorithm (blue in the figure below) guaranteeing LabelDP achieves 20% higher accuracy than DP-SGD. We emphasize that LabelDP protects only the labels while DP-SGD protects both the inputs and labels, so this is not a strictly fair comparison. Nonetheless, this result demonstrates that for specific application scenarios where only the labels need to be protected, LabelDP could lead to significant improvements in the model utility while narrowing the performance gap between private models and public baselines.

Comparison of the model utility (test accuracy) of different algorithms under different privacy budgets.

In some domains, prior knowledge is naturally available or can be built using publicly available data only. For example, many machine learning systems have historical models which could be evaluated on new data to provide label priors. In domains where unsupervised or self-supervised learning algorithms work well, priors could also be built from models pre-trained on unlabeled (therefore public with respect to LabelDP) data. Specifically, we demonstrate two self-supervised learning algorithms in our CIFAR-10 evaluation (orange and green traces in the figure above). We use self-supervised learning models to compute representations for the training examples and run k-means clustering on the representations. Then, we spend a small amount of privacy budget (ε ≤ 0.05) to query a histogram of the label distribution of each cluster and use that as the label prior for the points in each cluster. This prior significantly boosts the model utility in the low privacy budget regime (ε < 1).

Similar observations hold across multiple datasets such as MNIST, Fashion-MNIST and non-vision domains, such as the MovieLens-1M movie rating task. Please see our paper for the full report on the empirical results.

The empirical results suggest that protecting the privacy of the labels can be significantly easier than protecting the privacy of both the inputs and labels. This can also be mathematically proven under specific settings. In particular, we can show that for convex stochastic optimization, the sample complexity of algorithms privatizing the labels is much smaller than that of algorithms privatizing both labels and inputs. In other words, to achieve the same level of model utility under the same privacy budget, LabelDP requires fewer training examples.

Conclusion
We demonstrated that both empirical and theoretical results suggest that LabelDP is a promising relaxation of the full DP guarantee. In applications where the privacy of the inputs does not need to be protected, LabelDP could reduce the performance gap between a private model and the non-private baseline. For future work, we plan to design better LabelDP algorithms for other tasks beyond multi-class classification. We hope that the release of the multi-stage training algorithm code provides researchers with a useful resource for DP research.

Acknowledgements
This work was carried out in collaboration with Badih Ghazi, Noah Golowich, and Ravi Kumar. We also thank Sami Torbey for valuable feedback on our work.

Categories
Misc

Training a State-of-the-Art ImageNet-1K Visual Transformer Model using NVIDIA DGX SuperPOD

This post shows how the SOTA Visual Transformer model, VOLO, is trained on the NVIDIA DGX SuperPOD. VOLO_D5 model.

Recent work has demonstrated that large transformer models can achieve or advance the SOTA in computer vision tasks such as semantic segmentation and object detection. However, unlike convolutional network models that can do it only with the standard public dataset, it takes a proprietary dataset that is magnitudes larger.

VOLO model architecture

The recent project VOLO (Vision Outlooker) from SEA AI Lab, Singapore showed an efficient and scalable vision transformer mode architecture that greatly closed the gap using only the ImageNet-1K dataset.

VOLO introduces a novel outlook attention and presents a simple and general architecture, termed Vision Outlooker. Unlike self-attention, which focuses on global dependency modeling at a coarse level, the outlook attention efficiently encodes finer-level features and contexts into tokens. This is shown to be critically beneficial to recognition performance but largely ignored by self-attention.

Experiments show that the VOLO achieves 87.1% top-1 accuracy on ImageNet-1K classification, which is the first model exceeding 87% accuracy on this competitive benchmark, without using any extra training data.

Chart shows that the VOLO model has outperformed the state of the art image recognition models at different model complexity levels respectively in terms of Top-1 accuracy. For example, VOLO-D5 achieved more than 87% Top-1 accuracy.
Figure 1. Top-1 Accuracy of VOLO models in different sizing levels

In addition, the pretrained VOLO transfers well to downstream tasks, such as semantic segmentation.

Settings LV-ViT  CaiT NFNet-F6 NFNNet-F5 VOLO-D5
Test Resolution 448×448 448×448 576×576 544×544 448×448/512×512
Model Size 140M 356M 438M 377M 296M
Computations 157B 330B 377B 290B 304B/412B
Architecture Vision Transformer Vision Transformer Convolutions Convolutions VOLO
Extra Augmentations Token Labeling Knowledge Distill SAM SAM+augmult Token Labeling
ImageNet Top-1 Acc. 86.4 86.5 86.5 86.8 87.0/87.1
Table 1. Overview of the compared ViT, CNN baseline models

Though VOLO models demonstrated outstanding computational efficiency, training the SOTA performance model is not trivial. 

In this post, we present the techniques and experience that we gained training the VOLO models on the NVIDIA DGX SuperPOD based on the NVIDIA ML software stack and Infiniband clustering technologies.

Training methods

Training VOLO models requires considering training strategy, infrastructure, and configuration planning.  In this section, we discuss some of the techniques applied in this solution.

Training strategy

Training the model using the original ImageNet sample quality data all the way and performing a neural network (NN) architecture search at a fine grain makes a more consolidated investigation in theory. However, this requires a large percentage of the computing resources budget.

In the scope of this project, we adopted a coarse-grained training approach that does not visit as many NN architecture possibilities as the fine-grained approach. However, it enables showing EIOFS with less time and a lower resource budget. In this alternative strategy, we first trained the potential neural network candidates using image samples with lower resolution and then performed fine-tuning using high-resolution images.

This approach has been proved to be efficient in earlier work in terms of cutting down the computational cost within marginal model performance lost.

Infrastructure

In practice, we used two types of clusters for this training:

  • One for base model pretraining, which is an NVIDIA DGX A100 based DGX POD that consists of 5x NVIDIA DGX A100 systems clustered using the NVIDIA Mellanox HDR Infiniband network.
  • One for fine-tuning, which is an NVIDIA DGX SuperPOD that consists of DGX A100 systems with the NVIDIA Mellanox HDR Infiniband network. 
Diagram shows the DGX POD/SuperPOD hardware and infiniband network, on the compute front, APEX for enabling scalable mixed precision compute and on the networking front, NVIDIA PyXis and NCCL are leveraged for best using the DGX A100 GPU networking capability.
Figure 2. NVIDIA technology-based software stack used in this project

Software infrastructure also played important role in this procedure. Figure 2 shows that, in addition to the underlying standard deep learning optimization CUDA  libraries such as cuDNN and cuBLAS, we leveraged NCCL, enroot, PyXis, APEX, and DALI  extensively to achieve the sub-linear scalability of the training performance.

The DGX A100 POD cluster is mainly used for base model pretraining using lower size image samples. This is because base model pretraining is less memory-bound and can leverage the compute power advantage of the NVIDIA A100 GPU.

In comparison, the fine-tuning was performed on an NVIDIA DGX SuperPOD of NVIDIA DGX-2 because the fine-tuning process uses bigger images, which requires more memory per compute power. 

Training configurations

NEED LEAD-IN SENTENCE

  D1 D2 D3 D4 D5
MLP Ratio 3 3 3 3 4
Optimizer AdamW
LR Scaling LR = LRbase x Batch_Size/1024,    where LRbase=8.0e-4
Weight Decay 5e-2
LRbase 1.6e-2 1e-3 1e-3 1e-3 1e-4
Stochastic Depth Rate 0.1 0.2 0.5 0.5 0.75
Crop Ratio 0.96 0.96 0.96 1.15 1.15

Table 2. Model settings (for all models, the batch size is set to 1024)

We evaluated our proposed VOLO models on the ImageNet dataset. During training, no extra training data was used. Our code was based on PyTorch, the Token Labeling toolbox, and PyTorch Image Models (timm). We used the LV-ViT-S model with Token Labeling as our baseline.

Setup notes

  • We used the AdamW optimizer with a linear learning rate scaling strategy LR = LRbase x Batch_Size/1024 and 5 ×10−2 weight decay rate as suggested by previous work, and LRbase are given in Table 3 for all VOLO models.
  • Stochastic Depth is used.
  • We trained our models on the ImageNet dataset for 300 epochs.
  • For data augmentation methods, we used CutOut, RandAug, and the Token Labeling objective with MixToken.
  • We did not use MixUp or CutMix as they conflict with MixToken.

Pretraining

In this section, we use VOLO-D5 as an example to demonstrate how the model is trained.

Figure 3 shows that the training throughput for VOLO-D5 using one single DGX A100 is about 500 image/sec. By estimation, it roughly takes about 170 hours to finish one full pretraining cycle, which needs 300 epochs with ImageNet-1K. This is equal to about one week for 1 million images.

To speed up a little bit, based on a simple parameter-server architecture cluster of five DGX A100 nodes, we roughly achieved a 2100 image/sec throughput, which can cut down the pretraining time to ~52 hours.

Chart shows that the training throughput of VOLO D1 to D5 models varies from 2300 img/sec to 500 img/sec on one single DGX A100 node when batch size is configured to 1024.
Figure 3. Training throughput of D1~D5 model on one single DGX A100 across one full epoch

The VOLO-D5 model pretraining can be started on one single node using the following code example:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet 
  --model volo_d5 --img-size 224 
  -b 44 --lr 1.0e-4 --drop-path 0.75 --apex-amp 
  --token-label --token-label-size 14 --token-label-data /path/to/token_label_data

For the MNMG training case, it requires training cluster details as part of the command line input. First, we set CPU, MEM, IB Binding according to the node and cluster architecture. The cluster for the pre-training phase was DGX A100 POD, which has four NUMA domains per CPU socket and 1 IB port per A100 GPU, therefore we bind each rank to all CPU cores in the NUMA node nearest its GPU.

  • For memory binding, we bind each rank to the nearest NUMA node.
  • For IB binding, we bind one IB card per GPU, or as close to such a setup as possible.

Because the VOLO model training is PyTorch-based, and simply leveraged on the default PyTorch distributed training approach, our multinode, multi-GPU training is based on a simple parameter-server architecture that fits into the fat-tree network topology of NVIDIA DGX SuperPOD.

To simplify the scheduling, the first node in the list of allocated nodes is always used as both parameter server and worker node, and all other nodes are worker nodes. To avoid the potential storage I/O overhead, the dataset, all code, intermediate/milestone checkpoints, and results are kept on a single high-performance DDN-based distributed storage backend. They are mounted to all the worker nodes through a 100G NVIDIA Mellanox EDR Infiniband network.

To accelerate the data preprocessing and pipelining data loading, NVIDIA DALI is configured to use one dedicated data loader per GPU process. 

Diagram shows the training throughput speed up using two different generations of GPU, which are NVIDIA A100 and V100 GPUs  in the model pre-training phase. The workload scales out linearly on both GPUs but A100s apparently scales faster.
Figure 4. Pretraining phase training throughput speed up against the number of A100 and V100 GPUs

Fine-tuning

Running VOLO-D5 model fine-tuning on one single node is quite straightforward using the following code example:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 /path/to/imagenet 
  --model volo_d5 --img-size 512 
  -b 4 --lr 2.3e-5 --drop-path 0.5 --apex-amp --epochs 30 
  --weight-decay 1.0e-8 --warmup-epochs 5  --ground-truth 
  --token-label --token-label-size 24 --token-label-data /path/to/token_label_data 
  --finetune /path/to/pretrained_224_volo_d5/

As we mentioned earlier, because the image size for fine-tuning is much larger than the one used in the pretraining phase, the batch size must be cut down accordingly. Get the workload to fit into the GPU memory, which makes further scaling out the training to larger numbers of GPUs in parallel mandatory.

Diagram shows the training throughput speed up using two different generations of GPU, which are NVIDIA A100 and V100 GPUs  in the model fine-tuning phase. DGX SuperPOD with DGX A100 provides significantly faster speed ramping up than the previous generation DGX SuperPOD.
Figure 5. Fine-tuning phase training throughput speed up against the number of A100 and V100 GPUs

Most of the fine-tuning configurations are similar to the pretraining phase.

Conclusion

In this post, we showed the main techniques and procedures for training the SOTA large-scale Visual Transformer models, such as VOLO_D5, on a large-scale AI supercomputer, such as NVIDIA DGX A100 based DGX SuperPOD. The trained VOLO_D5 model achieved the best Top-1 accuracy in the image classification model ranking without using any additional data beyond the ImageNet-1k dataset.

The code resource of this work including the Docker image for running the experiment and the Slurm scheduler script is open source in the sail-sg/volo GitHub repo to allow future work to be leveraged on VOLO_D5 for more extensive study. For more information, see VOLO: Vision Outlooker for Visual Recognition.

In the future, we are looking to scale this work further towards training more intelligent, self-supervised, larger-scale models with larger public datasets and more modern infrastructure, for example, NVIDIA DGX SuperPOD with NVIDIA H100 GPUs.

Categories
Misc

An Introduction to Edge Computing: Common Questions and Resources for Success

During a recent webinar, participants outlined common edge computing questions and challenges. This post provides NVIDIA resources to help beginners on their journey.

With the convergence of IoT and AI organizations are evaluating new ways of computing to keep up with larger data loads and more complicated use cases. For many, edge computing provides the right environment to successfully operate AI applications ingesting data from distributed IoT devices. 

But many organizations are still grappling with understanding edge computing. Partners and customers often ask about edge computing, reasons for its popularity in the AI space, and use cases compared to cloud computing. 

NVIDIA recently hosted an Edge Computing 101: An Introduction to the Edge webinar. The event provided an introduction to edge computing, outlined different types of edge, the benefits of edge computing, when to use it and why, and more. 

During the webinar, we surveyed the audience to understand their biggest questions about edge computing and how we could help. 

Below we provide answers to those questions, along with resources that could help you along your edge computing journey.  

What stage are you in, on your edge computing journey? 

About 51% of the audience answered that they are in the “learning” phase of their journey. At face value, this is not surprising given that the webinar was an introductory session. Of course most of the folks are at the learning phase, as opposed to implementing or scaling. This was also corroborated by the fact that many of the tools in the edge market are still new, meaning many vendors also have much to gain from learning more. 

To help in the learning journey refer to Considerations for Deploying AI at the Edge. This overview covers the major decision points for choosing the right components for an edge solution, security tips on edge deployments, and how to evaluate where edge computing fits into your existing environment. 

What is the top benefit you hope to gain by deploying applications at the edge? 

There are many benefits to deploying AI applications in edge computing environments, including real-time insights, reduced bandwidth, data privacy, and improved efficiency. For the participants in the session, 42% responded that latency (or real-time insights) was the top differentiator they were hoping to gain from deploying applications at the edge.

The four benefits to edge computing are real-time intelligence, reduced bandwidth, data privacy, and improved efficiency
Figure 1. Benefits of edge AI include reduced latency and bandwidth requirements, improved data sovereignty, and increased automation

Improving latency is a major benefit of edge computing since the processing power for an application sits physically closer to where data is collected. For many use cases, the low latency provided by edge computing is essential for success. 

For example, an autonomous forklift operating in a manufacturing environment has to be able to react instantaneously to its dynamic environment. It needs to be able to turn around tight corners, lift and deliver heavy loads, and stop in time to avoid colliding with moving workers in the facility. If the forklift is unable to make decisions with ultra-low latency, there is no guarantee it will operate effectively. For safety reasons, organizations must know that AI applications powering that autonomous forklift are able to return insights fast enough to keep the environment safe. 

Learn more about latency and the other benefits of edge AI

What is your biggest challenge designing an edge computing solution?

There are challenges associated with implementing any new technology. This audience gave an even spread of answers across the choices given, which is not surprising given the early nature of the edge computing market. Many organizations are still investigating how edge computing will work for them, and they are experiencing a variety of different challenges. 

The following lists six common challenges for this audience, along with resources that can help.

1. Unsure what components are needed

The three major components needed for any edge deployment are an application, infrastructure (including tools to manage applications remotely), and security protocols. 

Edge Computing 201: How to Build an Edge Solution will dive deep into each of these topics. The webinar will also provide specifics for what is needed to build an edge deployment, repurpose existing technology to be optimized for an edge deployment, and best practices for getting started.  

2. Implementation challenges

Many organizations are starting to implement edge AI, so it is important to understand the process and challenges involved. There are five main steps to implementing any edge AI solution:

  1. Identify a use case or challenge to be solved.
  2. Determine what data and application requirements exist.
  3. Evaluate existing edge infrastructure and what pieces must be added.
  4. Test solution and then roll out at scale.
  5. Share success with other groups to promote additional use cases.

Understanding these five steps is key to overcoming challenges that arise during implementation. 

Steps to Get Started With Edge AI dives into each of these steps, outlining best practices and pitfalls to avoid along the way. 

The five steps to get started with edge AI are Identify the problem to be solved, determine the data and application requirements, analyze the edge capabilities, roll out the edge solution, and celebrate the success
Figure 2. The five steps to get started with an edge AI project

3. Tuning an application for edge use cases

The most important aspects of an edge application are flexibility and performance. Organizations need to be able to deploy an application to edge sites that have specific requirements and sometimes different tools than other sites. They need an application that can handle volatile situations. Additionally, ensuring an application can provide the performance needed for ultra-low latency situations is critical to success. 

Cloud-native technology fulfills both of those requirements, and has many other added benefits. 

4. Scaling a solution across multiple sites

Seamlessly scaling one deployment to multiple (sometimes thousands) of deployments can be easy with the right technology. Tools to manage application deployments across distributed edge sites are critical for any organization looking to scale edge AI across their entire organization. Some examples of tools are Red Hat OpenShift, VMware Tanzu, and NVIDIA Fleet Command

Fleet Command is turnkey, secure, and can scale to thousands of devices. Check out the demo to learn more. 

5. Security of edge environments

Edge computing environments are very different from cloud computing environments, and have different security considerations. For instance, physical security of data and hardware is a consideration for edge sites that is not generally a consideration when deploying in the cloud. It is essential to find the right protocols to provide multilayer security for edge deployments that protects the entire workflow from cloud to edge. 

Check out Edge Computing: Considerations for Security Architects to learn more about how to secure edge environments. 

6. Justify the cost of an edge solution 

Justifying the cost of any technology boils down to understanding all of the cost factors and the value of the solution. For an edge computing solution, there are three main cost factors, the infrastructure costs, the application costs, and the management costs. The value of edge computing will vary by use case, and depends a lot on the ROI of the AI application deployed. 

Learn more about the costs associated with an edge deployment with Building an Edge Strategy: Cost Factors.

What is the next step in your edge computing journey? 

After the session, 49% responded that “learning more about edge AI use cases” was the next step in their edge computing journey. Many leading edge computing applications use computer vision to perceive objects in an environment, from pedestrians in a crosswalk to objects on a shelf at a retail store. Organizations rely on edge computing for computer vision because of the ultra-fast performance that edge computing delivers. This ensures objects are detected instantaneously. 

The NVIDIA AI for Smart Spaces Ebook covers several major vision AI use cases, all of which could be used in edge computing deployments. 

If you’re ready to get started working with edge computing solutions, check out NVIDIA LaunchPad. With LaunchPad, organizations can get immediate, short-term access to the necessary hardware and software stacks for an entire end-to-end flow deploying, managing, and validating an application at the edge. Hands-on labs walk users through the same workflow on the same technology that can be deployed in production, ensuring more confident software and infrastructure decisions can be made. With this free trial, organizations can see for themselves the types of use cases and applications that will work best in their environment to meet their goals. 

The edge computing industry is exciting and new. There are many emerging technologies that have a clear path to changing the way that organizations deploy and operate AI throughout their entire business. As organizations continue to adopt AI, infrastructure choices will continue to be paramount to innovative use cases. 

You can deep dive into how to assemble the components of an edge computing solution, including application, infrastructure, and security protocols in the Edge Computing 201 webinar: How to Build an Edge Solution

Categories
Misc

how to get coco evaluation metrics on a different dataset than train and test after doing inference on it? With tensorflow object detection api

I’ve trained a model with tensorflow object detection api with a dataset splited into train and test. After that I retrieved the coco evaluation metrics. But now I’d like to do inference on a different dataset, a validation dataset, and after that I want to get the coco metrics but I really don’t know how to do it, there’s not much information about this. If you could help me out I’d appreciate it

Thank you

submitted by /u/Emergency_Egg_9497
[visit reddit] [comments]