Categories
Misc

Latest Releases and Resources: March 3-9

Register for the Game Developer Conference; join DRIVE Developer Days; get DLI training at GTC; learn how Metropolis can grow your vision AI business; meet the Shell.AI Hackathon winners.

Our weekly roundup covers the most recent software updates, learning resources, events, and notable news. 



Events

NVIDIA at GDC: Advancing Innovations for Virtual Worlds in Game Development

At the Game Developer Conference, attendees will experience how the latest NVIDIA-powered solutions are enabling developers to create more realistic, immersive virtual worlds for players.

Register online: NVIDIA at GDC


Learning resources

Accelerate Autonomous Vehicle Development with DRIVE Developer Days at GTC

NVIDIA DRIVE Developer Days are March 22-23 and feature deep-dive sessions on safe and robust autonomous vehicle development. This special event showcases the latest innovations in autonomous driving and software-defined vehicle architectures. Special sessions led by the NVIDIA engineering team highlight the newest DRIVE solutions. Attendees will learn how to apply these technologies to their own autonomous vehicle development and have the opportunity to chat with engineers.

This virtual content is available to all GTC attendees and will be available on demand after the event.

Register online: DRIVE Developer Days

Hands-On DLI Training Labs Available at GTC 

Choose from 24 training labs taught by technical experts covering HPC, networking, deep learning, data science, conversational AI, computer vision, and more.

Register online: DLI Training Labs at GTC

Learn How Metropolis Can Boost Your Go-to-Market Efforts​

Tune in to this meetup replay and find out how the Metropolis program can grow your vision AI business and enhance go-to-market efforts​.

Learn how:

  • Metropolis Validation Labs optimize your applications and accelerate deployments.
  • NVIDIA Fleet Command simplifies provisioning and management of edge deployments accelerating the time to scale from POC to production.
  • NVIDIA Launchpad provides easy access to GPU instances for faster POCs and customer trial

Get started: Learn How Metropolis Can Boost Your Go-to-Market Efforts​ 


News

Meet the Winners of the Shell.AI Hackathon for Renewable and Sustainable Energy

Tackling climate change is an urgent challenge worldwide. Developing and delivering renewable energy sources is critical. To explore new opportunities for low-carbon energy, Shell collaborated with NVIDIA, OpenACC, and Microsoft, with support from SINE and NIRMAAN to launch the AI Solar Power Prediction Challenge. 

In this multi-stage hackathon, participants used historical sky camera images and weather data to predict cloud coverage cast over solar farms in Stage 1 and predict Global Horizontal Irradiance for any two-hour interval of the day. More than 2000 participants across 50+ countries rose to the challenge and over 6200 submissions were received. Selected Hackathon winners provided inspired solutions that were accurate, innovative and scalable, putting their passion to purpose to advance renewable and sustainable energy.

Meet the winners: Shell.AI Hackathon for Renewable and Sustainable Energy

Categories
Misc

GFN Thursday Marches Forward With 21 Games Coming to GeForce NOW This Month

A new month means a whole new set of games coming to GeForce NOW. Members can look forward to 21 titles joining the GeForce NOW library in March, including day-and-date releases like Shadow Warrior 3, with support for NVIDIA DLSS. Bring a Katana to a Gunfight Shoot, slash and slide into Shadow Warrior 3, new Read article >

The post GFN Thursday Marches Forward With 21 Games Coming to GeForce NOW This Month appeared first on NVIDIA Blog.

Categories
Misc

Im i doing this wrong? need to know so i dont waste 1h+ hrs

Im i doing this wrong? need to know so i dont waste 1h+ hrs

submitted by /u/SpencyDotRed
[visit reddit] [comments]

Categories
Misc

Reason for receiving KeyError: ‘[…] not in index’ while using tensorflow.compat.v1 to train and build a standard DNN model?

submitted by /u/professorDissociate
[visit reddit] [comments]

Categories
Misc

Tensorflow Lite on RPI

I am trying to run a tensorflow lite model in my RPI (3B+) and is following this blog post that I found on this sub

https://blog.paperspace.com/tensorflow-lite-raspberry-pi/

The article is fairly old so I have the lastest version of python and tflite-runtime compared the mentioned versions on the blog.

The blog used a MobileNetV1 while I used my own model which made use of MobileNetV2.

I copied the code for now and changed the directories to match my local.

The code does not have error or warnings but I always get the first item on my text file, and an accuracy of 0.0%.

What could be the issue here?

submitted by /u/clareeenceee
[visit reddit] [comments]

Categories
Misc

Obtaining information from text

Hello, I am a newbie and I have done some research before asking my question which leads me to confusion about what to use for my case. Simple definiton of my problem is, I have an input which consists of 5-6 sentences at max. In these sentences I have to obtain values of some terms. For example, sometimes the input is:

-” … Pg = 250 kN …”

-” …. dead load is 250 kN …”

-” …dead load on the system is given as 250 kN.” .

And this is not the only term to obtain, I have some more terms which could be symbolised with more than one way. I have read about the named entity recognition, rule based matcher, bi-directional lstm etc. but now I am lost and don’t know what to do and which method to use. I need someone to show me the correct way , and after that I can work on that topic and solve this problem. Any help much appreciated.

submitted by /u/freeman0694
[visit reddit] [comments]

Categories
Misc

Implementing ONNX models in Rails

Implementing ONNX models in Rails submitted by /u/Kagermanov
[visit reddit] [comments]
Categories
Misc

Beyond Be-leaf: Immersive 3D Experience Transports Audiences to Natural Worlds With Augmented Reality

Imagine walking through the bustling streets of London’s Piccadilly Circus, when suddenly you’re in a tropical rainforest, surrounded by vibrant flowers and dancing butterflies. That’s what audiences will see in the virtual world of The Green Planet AR Experience, an interactive, augmented reality experience that blends physical and digital worlds to connect people with nature. Read article >

The post Beyond Be-leaf: Immersive 3D Experience Transports Audiences to Natural Worlds With Augmented Reality appeared first on NVIDIA Blog.

Categories
Misc

Scaling Quantum Circuit Simulation with NVIDIA cuTensorNet

Icon of NVIDIA green circle with an arrow pointing diagonally up right.We present benchmarks and usage of cuTensorNet, a cuQuantum library providing high-performance tensor network computations for quantum circuit simulation.
Icon of NVIDIA green circle with an arrow pointing diagonally up right.

Quantum computing aspires to deliver more powerful computation in faster time for problems that cannot currently be addressed with classical computing. NVIDIA recently announced the cuQuantum SDK, a high-performance library for accelerating the development of quantum information science. cuQuantum recently was used to break the world record for the MaxCut quantum algorithm simulation running on the DGX SuperPOD, with 8x more qubits than prior work.

The initial target application for cuQuantum is acceleration of quantum circuit simulations, and it consists of two major libraries: 

  • cuStateVec: Accelerates state vector simulations.
  • cuTensorNet: Accelerates tensor network simulations.

In this post, we provide an overview of both libraries, with a more detailed discussion of cuTensorNet.

Why use cuStateVec?

The cuStateVec library from the cuQuantum SDK provides a high-performance solution for state vector-based simulation through optimized GPU kernels for most use cases that arise in simulators. While the state vector method is great for running deep quantum circuits, simulations of quantum circuits with large numbers of qubits, which grow exponentially, are impossible to run even on today’s largest supercomputers. 

Why use cuTensorNet?

As an alternative, the tensor network method is a technique that represents the quantum state of N qubits as a series of tensor contractions. This enables quantum circuit simulators to handle circuits with many qubits by trading space required by the algorithms with computation. Depending on circuit topology and depth, this can also get prohibitively expensive. Then, the main challenge is to compute these tensor contractions efficiently.

Diagram shows that cuTensorNet accepts a quantum circuit expressed as a tensor network, and offers both C and Python APIs with optimized performance for both pathfinding and contraction on NVIDIA GPU backends.
Figure 1. Schematic diagram of the software stack for quantum circuit simulation

The cuTensorNet library from the cuQuantum SDK provides a high-performance solution for these types of tensor network computations. 

The cuTensorNet library offers both C and Python APIs to provide access to high-performance tensor network computations for accelerating quantum circuit simulation. The APIs are flexible, enabling you to control, explore, and investigate each of the algorithmic techniques implemented.  

cuTensorNet algorithmic description 

In this section, we discuss the different algorithms and techniques used in cuTensorNet. It includes two main components: pathfinder and execution

The pathfinder provides an optimal contraction path with minimal cost in a short elapsed time and the execution step computes that path on the GPU using efficient kernels. These two components are independent of each other and are interoperable with any other external library providing similar functionality.  

Pathfinder

At a high level, the approach taken in cuTensorNet is hyper-optimization around a graph partitioning-based pathfinder. For more information, see Hyper-optimized tensor network contraction.

The role of a pathfinder is to find a contraction path that minimizes the cost of contracting the tensor network. Many algorithmic advancements and optimization were developed to make this step fast, and it will become even faster. 

Finding an optimal contraction path is strongly dependent on the size of the network. The larger the network, the more techniques and computational effort are needed to find the optimal contraction path.  

The cuTensorNet pathfinder consists of three algorithmic modules (Figure 2). 

Flowchart shows how a tensor network is first simplified, then run through a hyperoptimization loop to find the optimal path. Then that path is sent to the execution module for planning and contraction.
Figure 2. cuTensorNet flowchart for the pathfinding and contraction execution submodules
  • Simplification: A technique that preprocesses the tensor network to find all sets of obvious straightforward contractions. It removes them from the network and replaces each set by its final tensor. The result is a smaller network that is easier to process in the following modules.
  • Path computation: The heart of the pathfinder component. It is based on a graph-partitioning step, followed by a second step that uses a reconfiguration adjustment and slicing technique. The graph partitioning is called recursively to split the network and form a contraction path (for example, a pairwise contraction tree).
  • Hyper-optimizer: A loop over the path computation module where at each iteration a contraction path is formed. For each iteration, the hyper-optimizer creates a different configuration of parameters for the path computation while keeping track of the best path found. You change or fix any of these configuration parameters as you like. All configuration parameters can be set by cutensornetContractionOptimizerConfigSetAttribute. For more information, see the cuTensorNet documentation.

The generated path from the first step might not be close to optimal, so the reconfiguration adjustment is usually performed. Reconfiguration chooses several small subtrees within the overall contraction tree and attempts to improve their contraction cost, decreasing the overall cost if possible.  

Another feature of the path computation module is the slicing technique. The primary goal of slicing is to fit the network contraction process into the available device memory. Slicing accomplishes this by excluding certain tensor modes and explicitly unrolling their extents. This generates many similar contraction trees, or slices, where each corresponds to one of the excluded modes.

The contraction path, or tree, does not change. Only some modes are excluded in this case and the computation of each slice is independent from the others. Consequently, slicing can be considered as one of the best techniques to create independent work for different devices.  

Practical experience indicates that finding an optimal contraction path can be sensitive to the choice of configuration parameters of each of the techniques used here. To increase the probability of finding the best contraction path, we encapsulate this module inside a hyper-optimizer. 

Pathfinding performance

There are two relevant metrics when considering the performance of a pathfinder: the quality of the path found, and the time taken to find that path. The former is plotted in Figure 3, measured by the cost of the resulting contraction in FLOPS. The circuits used for benchmarking are random quantum circuits from Google Quantum AI’s 2019 quantum supremacy paper, at depth 12, 14, and 20. 

Bar chart shows that test circuits are random quantum circuits of depth 12, 14, and 20.
Figure 3. cuTensorNet pathfinding performance compared to similar packages, measured in FLOPs for the resulting contraction

cuTensorNet performs well compared to the opt_einsum library in finding an optimal path, and slightly better than Cotengra for these circuits.

cuTensorNet also finds a high-quality path quickly. The time taken to find a contraction for cuTensorNet compared to Cotengra is plotted in Figure 4, for the Sycamore quantum circuits problems with different depth. For the most complex problem with over 3,000 tensors in the network, cuTensorNet still finds its optimal path in just 40 seconds.

Bar chart shows that, for the largest instances of random quantum circuits, cuTensorNet finds an optimal path nearly 20x faster
Figure 4. Time to solution for cuTensorNet pathfinding, compared to Cotengra, for the Sycamore quantum circuits problems.

Execution

The execution component relies on the cuTENSOR library as the backend for efficient execution on the GPU. It consists of the following phases: 

  • Planning: The decision engine of the execution component. It analyzes the contraction path, deciding the best way to execute it on GPU using the minimal workspace. It also decides on the best kernels to be used for each of the pairwise contractions. 
  • Computation: This phase computes all the pairwise contractions using the cuTENSOR library.  
  • Autotuning: (Optional) Different kernels based on different heuristics are tried for pairwise contraction and the best is chosen. 

Execution performance

Figure 5 measures the speedup of the contraction execution for cuTensorNet compared to CuPy, for several different circuits. Depending on the circuit, cuTensorNet offers around an 8-20x speedup for the contraction execution.  

Bar chart shows that cuTensorNet offers roughly an 8–20x speedup in the contraction phase.
Figure 5. Contraction speedup of cuTensorNet vs. cuPy on a single A100 GPU, for several key quantum algorithms.

cuTensorNet example 

cuTensorNet provides both C and Python APIs that allow you to compute tensor network contractions efficiently without requiring any expertise on how to find the best contraction path or how to execute it on GPUs.  

High-level Python APIs 

cuTensorNet offers high-level Python APIs that are interoperable with NumPy and CuPy ndarrays and PyTorch tensors. For example, the einsum expression of a tensor network can be used in a single function call to contract. cuTensorNet performs all the required steps, returning the contracted network as a result. 

import cupy as cp
import cuquantum 
 
# Compute D_{m,x,n,y} = A_{m,h,k,n} B_{u,k,h} C_{x,u,y} 
# Create an array of extents (shapes) for each tensor 
extentA = (96, 64, 64, 96) 
extentB = (96, 64, 64) 
extentC = (64, 96, 64) 
extentD = (96, 64, 96, 64) 
 
# Generate input tensor data directly on GPU 
A_d = cp.random.random(extentA, dtype=cp.float32) 
B_d = cp.random.random(extentB, dtype=cp.float32) 
C_d = cp.random.random(extentC, dtype=cp.float32) 
 
# Set the pathfinder options 
options = cuquantum.OptimizerOptions() 
options.slicing.disable_slicing = 1  # disable slicing 
options.samples = 100                # number of hyper-optimizer samples 
 
# Run the contraction on a CUDA stream 
stream = cp.cuda.Stream() 
D_d, info = cuquantum.contract( 
    'mhkn,ukh,xuy->mxny', A_d, B_d, C_d, 
    optimize=options, stream=stream, return_info=True) 
stream.synchronize() 
 
# Check the optimizer info 
print(f"{info[1].opt_cost/1e9} GFLOPS") 

From this code example, you can see that all cuTensorNet operations are encapsulated in the single contract API. The output for this example is 14.495514624 GFLOPS: the number of floating-point operations estimated based on the contraction path found by the path finder. To perform the same steps manually, you can also use the cuQuantum.Network object.

Low-level APIs  

As previously discussed, the C and Python APIs are designed in a straightforward expressive fashion. You can call the pathfinder function to get an optimized path, followed by a call to perform the contraction on the GPU using that path.  

For advanced users, the cuTensorNet library API is designed to grant access to all algorithmic choices available to enable research in this field. For example, you can control how many hyper-optimizer samples the pathfinder can try to find the best contraction path.

There are dozens of parameters that you can modify or control. These are accessible through the helper functions and allow the simple functionalities API to remain unchanged. You are also allowed to provide your own path. For more information about the lower-level options and examples of how to use them, see cuquantum.Network.  

Summary

The cuTensorNet library of the NVIDIA cuQuantum SDK aims to accelerate tensor network computation on GPUs. In this post, we showed the speedup over state-of-the-art tensor network libraries on key quantum algorithms. 

There is extensive development to improve cuTensorNet and expand it with new algorithmic advancements as well as multi-node, multi-GPU execution.  

The cuTensorNet library goal is to provide a useful tool for groundbreaking developments in quantum computing. Have feedback and suggestions on how we can improve the cuQuantum libraries? Send email to cuquantum-feedback@nvidia.com

For more information, see the following resources: 

Categories
Misc

3 Methods for Speeding up AI Model Development with TAO Toolkit

How do you shorten your AI application’s TTM? Here are 3 methods to eliminate framework complexity and cut your training time in half using TAO Toolkit.

AI applications are powered by models. Deep learning models are built on mathematical algorithms and trained using data and human expertise. These models can accurately predict outcomes based on input data such as images, text, or speech.

Building, training, and optimizing these tasks are both critical and time-intensive. Domain expertise and countless hours of computation are needed to develop production-quality models. This is at odds with how quickly enterprises must operationalize their AI initiatives and reduce their time to market (TTM). 

Fine-tuning pretrained models without AI expertise

Fortunately, there is a solution: pretrained models. Using transfer learning, a pretrained model is one that has already been trained on representative datasets and fine-tuned with weights and biases. Unlike traditional AI algorithms that require significant time and resources to train, AI solutions built with pretrained models are delivered as fully operational, ready-to-use AI engines for a variety of use cases.

In most cases, an “out-of-box” pretrained model may not fit your use case or deliver the accuracy and performance you need. For these instances, you will have to modify or customize the pretrained model to fit your use-case needs.

Customizing pretrained models for different use cases

So how do you customize a pretrained model without spending too much time and effort? You can use NVIDIA TAO, an AI-model-adaptation framework, to simplify your development workflow. The TAO Toolkit is a CLI- and Jupyter notebook-based solution of NVIDIA TAO that makes it very easy to fine-tune pretrained models with your own data. No AI expertise is required. 

The TAO Toolkit is highly extensible and helps you adapt your model to new environments, augment your data, or add new classes.

Below are three examples highlighted in the NVIDIA whitepaper that explores proven methodologies for speeding up your AI workflow process.

  • Adapting to different camera types: Say you want to deploy a solution to infrared or thermal cameras. You can use the PeopleNet model that has already been trained on millions of images. Fine-tune it with only 2,500 images to achieve an mAP of almost 80%. 
  • Augmenting limited dataset: Data collection is time-consuming. With offline or online data augmentation, you can modify your dataset. Augmenting your dataset adds more variation and randomness that enables model generalization. This improves model accuracy on data the model has never seen before.
  • Adding new classes: Imagine that you have been asked to create an application that detects if people are wearing helmets while riding their bicycles. With the TAO Toolkit, you can use a model that detects people, and add a new “helmet class” to that model. Fine-tune it with the dataset that contains classes for both people and helmet. 

Put it into practice

When you eliminate AI framework complexity, you can focus on what matters: shortening your AI application’s TTM. The TAO Toolkit makes it incredibly easy for you to train, adapt and optimize pretrained models, without the need for large training datasets and AI expertise.

Visit the Github page to get started with the use cases described in the whitepaper. Learn more about the TAO Toolkit.