Categories
Offsites

Hybrid Quantum Algorithms for Quantum Monte Carlo

The intersection between the computational difficulty and practical importance of quantum chemistry challenges run on quantum computers has long been a focus for Google Quantum AI. We’ve experimentally simulated simple models of chemical bonding, high-temperature superconductivity, nanowires, and even exotic phases of matter such as time crystals on our Sycamore quantum processors. We’ve also developed algorithms suitable for the error-corrected quantum computers we aim to build, including the world’s most efficient algorithm for large-scale quantum computations of chemistry (in the usual way of formulating the problem) and a pioneering approach that allows for us to solve the same problem at an extremely high spatial resolution by encoding the position of the electrons differently.

Despite these successes, it is still more effective to use classical algorithms for studying quantum chemistry than the noisy quantum processors we have available today. However, when the laws of quantum mechanics are translated into programs that a classical computer can run, we often find that the amount of time or memory required scales very poorly with the size of the physical system to simulate.

Today, in collaboration with Dr. Joonho Lee and Professor David Reichmann at Colombia, we present the Nature publication “Unbiasing Fermionic Quantum Monte Carlo with a Quantum Computer”, where we propose and experimentally validate a new way of combining classical and quantum computation to study chemistry, which can replace a computationally-expensive subroutine in a powerful classical algorithm with a “cheaper”, noisy, calculation on a small quantum computer. To evaluate the performance of this hybrid quantum-classical approach, we applied this idea to perform the largest quantum computation of chemistry to date, using 16 qubits to study the forces experienced by two carbon atoms in a diamond crystal. Not only was this experiment four qubits larger than our earlier chemistry calculations on Sycamore, we were also able to use a more comprehensive description of the physics that fully incorporated the interactions between electrons.

Google’s Sycamore quantum processor. Photo Credit: Rocco Ceselin.

A New Way of Combining Quantum and Classical
Our starting point was to use a family of Monte Carlo techniques (projector Monte Carlo, more on that below) to give us a useful description of the lowest energy state of a quantum mechanical system (like the two carbon atoms in a crystal mentioned above). However, even just storing a good description of a quantum state (the “wavefunction”) on a classical computer can be prohibitively expensive, let alone calculating one.

Projector Monte Carlo methods provide a way around this difficulty. Instead of writing down a full description of the state, we design a set of rules for generating a large number of oversimplified descriptions of the state (for example, lists of where each electron might be in space) whose average is a good approximation to the real ground state. The “projector” in projector Monte Carlo refers to how we design these rules — by continuously trying to filter out the incorrect answers using a mathematical process called projection, similar to how a silhouette is a projection of a three-dimensional object onto a two-dimensional surface.

Unfortunately, when it comes to chemistry or materials science, this idea isn’t enough to find the ground state on its own. Electrons belong to a class of particles known as fermions, which have a surprising quantum mechanical quirk to their behavior. When two identical fermions swap places, the quantum mechanical wavefunction (the mathematical description that tells us everything there is to know about them) picks up a minus sign. This minus sign gives rise to the famous Pauli exclusion principle (the fact that two fermions cannot occupy the same state). It can also cause projector Monte Carlo calculations to become inefficient or even break down completely. The usual resolution to this fermion sign problem involves tweaking the Monte Carlo algorithm to include some information from an approximation to the ground state. By using an approximation (even a crude one) to the lowest energy state as a guide, it is usually possible to avoid breakdowns and even obtain accurate estimates of the properties of the true ground state.

Top: An illustration of how the fermion sign problem appears in some cases. Instead of following the blue line curve, our estimates of the energy follow the red curve and become unstable. Bottom: An example of the improvements we might see when we try to fix the sign problem. By using a quantum computer, we hope to improve the initial guess that guides our calculation and obtain a more accurate answer.

For the most challenging problems (such as modeling the breaking of chemical bonds), the computational cost of using an accurate enough initial guess on a classical computer can be too steep to afford, which led our collaborator Dr. Joonho Lee to ask if a quantum computer could help. We had already demonstrated in previous experiments that we can use our quantum computer to approximate the ground state of a quantum system. In these earlier experiments we aimed to measure quantities (such as the energy of the state) that are directly linked to physical properties (like the rate of a chemical reaction). In this new hybrid algorithm, we instead needed to make a very different kind of measurement: quantifying how far the states generated by the Monte Carlo algorithm on our classical computer are from those prepared on the quantum computer. Using some recently developed techniques, we were even able to do all of the measurements on the quantum computer before we ran the Monte Carlo algorithm, separating the quantum computer’s job from the classical computer’s.

A diagram of our calculation. The quantum processor (right) measures information that guides the classical calculation (left). The crosses indicate the qubits, with the ones used for the largest experiment shaded green. The direction of the arrows indicate that the quantum processor doesn’t need any feedback from the classical calculation. The red bars represent the parts of the classical calculation that are filtered out by the data from the quantum computer in order to avoid the fermion sign problem and get a good estimate of properties like the energy of the ground state.

This division of labor between the classical and the quantum computer helped us make good use of both resources. Using our Sycamore quantum processor, we prepared a kind of approximation to the ground state that would be difficult to scale up classically. With a few hours of time on the quantum device, we extracted all of the data we needed to run the Monte Carlo algorithm on the classical computer. Even though the data was noisy (like all present-day quantum computations), it had enough signal that it was able to guide the classical computer towards a very accurate reconstruction of the true ground state (shown in the figure below). In fact, we showed that even when we used a low-resolution approximation to the ground state on the quantum computer (just a few qubits encoding the position of the electrons), the classical computer could efficiently solve a much higher resolution version (with more realism about where the electrons can be).

Top left: a diagram showing the sixteen qubits we used for our largest experiment. Bottom left: an illustration of the carbon atoms in a diamond crystal. Our calculation focused on two atoms (the two that are highlighted in translucent yellow). Right: A plot showing how the error in the total energy (closer to zero is better) changes as we adjust the lattice constant (the spacing between the two carbon atoms). Many properties we might care about, such as the structure of the crystal, can be determined by understanding how the energy varies as we move the atoms around. The calculations we performed using the quantum computer (red points) are comparable in accuracy to two state-of-the-art classical methods (yellow and green triangles) and are extremely close to the numbers we would have gotten if we had a perfect quantum computer rather than a noisy one (black points). The fact that these red and black points are so close tells us that the error in our calculation comes from using an approximate ground state on the quantum computer that was too simple, not from being overwhelmed by noise on the device.

Using our new hybrid quantum algorithm, we performed the largest ever quantum computation of chemistry or materials science. We used sixteen qubits to calculate the energy of two carbon atoms in a diamond crystal. This experiment was four qubits larger than our first chemistry calculations on Sycamore, we obtained more accurate results, and we were able to use a better model of the underlying physics. By guiding a powerful classical Monte Carlo calculation using data from our quantum computer, we performed these calculations in a way that was naturally robust to noise.

We’re optimistic about the promise of this new research direction and excited to tackle the challenge of scaling these kinds of calculations up towards the boundary of what we can do with classical computing, and even to the hard-to-study corners of the universe. We know the road ahead of us is long, but we’re excited to have another tool in our growing toolbox.

Acknowledgements
I’d like to thank my co-authors on the manuscript, Bryan O’Gorman, Nicholas Rubin, David Reichman, Ryan Babbush, and especially Joonho Lee for their many contributions, as well as Charles Neill and Pedram Rousham for their help executing the experiment. I’d also like to thank the larger Google Quantum AI team, who designed, built, programmed, and calibrated the Sycamore processor.

Categories
Misc

What are these errors I’m new tensorflow so can someone help me out

What are these errors I’m new tensorflow so can someone help me out submitted by /u/RAIDAIN
[visit reddit] [comments]
Categories
Misc

please help quick (tensorflow object recognition)

I wanted to do a group project on tensorflow object detection for university, but, our teacher wanted us to use our own images instead of datasets. The project is a road safety app that can detect and label road signs, oncoming vehicles and pedestrians. Is this going to be too much to handle , is it possible to make a app that can detect these things using custom images or is it impossible?

(Side note : i can also reduce the project to just detection and recognition of road signs and vehicles)

submitted by /u/Tooxnottox
[visit reddit] [comments]

Categories
Offsites

Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion

People interact with the world through multiple sensory streams (e.g., we see objects, hear sounds, read words, feel textures and taste flavors), combining information and forming associations between senses. As real-world data consists of various signals that co-occur, such as video frames and audio tracks, web images and their captions and instructional videos and speech transcripts, it is natural to apply a similar logic when building and designing multimodal machine learning (ML) models.

Effective multimodal models have wide applications — such as multilingual image retrieval, future action prediction, and vision-language navigation — and are important for several reasons; robustness, which is the ability to perform even when one or more modalities is missing or corrupted, and complementarity between modalities, which is the idea that some information may be present only in one modality (e.g., audio stream) and not in the other (e.g., video frames). While the dominant paradigm for multimodal fusion, called late fusion, consists of using separate models to encode each modality, and then simply combining their output representations at the final step, investigating how to effectively and efficiently combine information from different modalities is still understudied.

In “Attention Bottlenecks for Multimodal Fusion”, published at NeurIPS 2021, we introduce a novel transformer-based model for multimodal fusion in video called Multimodal Bottleneck Transformer (MBT). Our model restricts cross-modal attention flow between latent units in two ways: (1) through tight fusion bottlenecks, that force the model to collect and condense the most relevant inputs in each modality (sharing only necessary information with other modalities), and (2) to later layers of the model, allowing early layers to specialize to information from individual modalities. We demonstrate that this approach achieves state-of-the-art results on video classification tasks, with a 50% reduction in FLOPs compared to a vanilla multimodal transformer model. We have also released our code as a tool for researchers to leverage as they expand on multimodal fusion work.

A Vanilla Multimodal Transformer Model
Transformer models consistently obtain state-of-the-art results in ML tasks, including video (ViViT) and audio classification (AST). Both ViViT and AST are built on the Vision Transformer (ViT); in contrast to standard convolutional approaches that process images pixel-by-pixel, ViT treats an image as a sequence of patch tokens (i.e., tokens from a smaller part, or patch, of an image that is made up of multiple pixels). These models then perform self-attention operations across all pairs of patch tokens. However, using transformers for multimodal fusion is challenging because of their high computational cost, with complexity scaling quadratically with input sequence length.

Because transformers effectively process variable length sequences, the simplest way to extend a unimodal transformer, such as ViT, to the multimodal case is to feed the model a sequence of both visual and auditory tokens, with minimal changes to the transformer architecture. We call this a vanilla multimodal transformer model, which allows free attention flow (called vanilla cross-attention) between different spatial and temporal regions in an image, and across frequency and time in audio inputs, represented by spectrograms. However, while easy to implement by concatenating audio and video input tokens, vanilla cross-attention at all layers of the transformer model is unnecessary because audio and visual inputs contain dense, fine-grained information, which may be redundant for the task — increasing complexity.

Restricting Attention Flow
The issue of growing complexity for long sequences in multimodal models can be mitigated by reducing the attention flow. We restrict attention flow using two methods, specifying the fusion layer and adding attention bottlenecks.

  • Fusion layer (early, mid or late fusion): In multimodal models, the layer where cross-modal interactions are introduced is called the fusion layer. The two extreme versions are early fusion (where all layers in the transformer are cross-modal) and late fusion (where all layers are unimodal and no cross-modal information is exchanged in the transformer encoder). Specifying a fusion layer in between leads to mid fusion. This technique builds on a common paradigm in multimodal learning, which is to restrict cross-modal flow to later layers of the network, allowing early layers to specialize in learning and extracting unimodal patterns.
  • Attention bottlenecks: We also introduce a small set of latent units that form an attention bottleneck (shown below in purple), which force the model, within a given layer, to collate and condense information from each modality before sharing it with the other, while still allowing free attention flow within a modality. We demonstrate that this bottlenecked version (MBT), outperforms or matches its unrestricted counterpart with lower computational cost.
The different attention configurations in our model. Unlike late fusion (top left), where no cross-modal information is exchanged in the transformer encoder, we investigate two pathways for the exchange of cross-modal information. Early and mid fusion (top middle, top right) is done via standard pairwise self attention across all hidden units in a layer. For mid fusion, cross-modal attention is applied only to later layers in the model. Bottleneck fusion (bottom left) restricts attention flow within a layer through tight latent units called attention bottlenecks. Bottleneck mid fusion (bottom right) applies both forms of restriction in conjunction for optimal performance.

Bottlenecks and Computation Cost
We apply MBT to the task of sound classification using the AudioSet dataset and investigate its performance for two approaches: (1) vanilla cross-attention, and (2) bottleneck fusion. For both approaches, mid fusion (shown by the middle values of the x-axis below) outperforms both early (fusion layer = 0) and late fusion (fusion layer = 12). This suggests that the model benefits from restricting cross-modal connections to later layers, allowing earlier layers to specialize in learning unimodal features; however, it still benefits from multiple layers of cross-modal information flow. We find that adding attention bottlenecks (bottleneck fusion) outperforms or maintains performance with vanilla cross-attention for all fusion layers, with more prominent improvements at lower fusion layers.

The impact of using attention bottlenecks for fusion on mAP performance (left) and compute (right) at different fusion layers on AudioSet. Attention bottlenecks (red) improve performance over vanilla cross-attention (blue) at lower computational cost. Mid fusion, which is in fusion layers 4-10, outperforms both early (fusion layer = 0) and late (fusion layer = 12) fusion, with best performance at fusion layer 8.

We compare the amount of computation, measured in GFLOPs, for both vanilla cross-attention and bottleneck fusion. Using a small number of attention bottlenecks (four bottleneck tokens used in our experiments) adds negligible extra computation over a late fusion model, with computation remaining largely constant with varying fusion layers. This is in contrast to vanilla cross-attention, which has a non-negligible computational cost for every layer it is applied to. We note that for early fusion, bottleneck fusion outperforms vanilla cross-attention by over 2 mean average precision points (mAP) on audiovisual sound classification, with less than half the computational cost.

Results on Sound Classification and Action Recognition
MBT outperforms previous research on popular video classification tasks — sound classification (AudioSet and VGGSound) and action recognition (Kinetics and Epic-Kitchens). For multiple datasets, late fusion and MBT with mid fusion (both fusing audio and vision) outperform the best single modality baseline, and MBT with mid fusion outperforms late fusion.

Across multiple datasets, fusing audio and vision outperforms the best single modality baseline, and MBT with mid fusion outperforms late fusion. For each dataset we report the widely used primary metric, i.e., Audioset: mAP, Epic-Kitchens: Top-1 action accuracy, VGGSound, Moments-in-Time and Kinetics: Top-1 classification accuracy.

Visualization of Attention Heatmaps
To understand the behavior of MBT, we visualize the attention computed by our network following the attention rollout technique. We compute heat maps of the attention from the output classification tokens to the image input space for a vanilla cross-attention model and MBT on the AudioSet test set. For each video clip, we show the original middle frame on the left with the ground truth labels overlayed at the bottom. We demonstrate that the attention is particularly focused on regions in the images that contain motion and create sound, e.g., the fingertips on the piano, the sewing machine, and the face of the dog. The fusion bottlenecks in MBT further force the attention to be localized to smaller regions of the images, e.g., the mouth of the dog in the top left and the woman singing in the middle right. This provides some evidence that the tight bottlenecks force MBT to focus only on the image patches that are relevant for an audio classification task and that benefit from mid fusion with audio.

Summary
We introduce MBT, a new transformer-based architecture for multimodal fusion, and explore various fusion approaches using cross-attention between bottleneck tokens. We demonstrate that restricting cross-modal attention via a small set of fusion bottlenecks achieves state-of-the-art results on a number of video classification benchmarks while also reducing computational costs compared to vanilla cross-attention models.

Acknowledgements
This research was conducted by Arsha Nagrani, Anurag Arnab, Shan Yang, Aren Jansen, Cordelia Schmid and Chen Sun. The blog post was written by Arsha Nagrani, Anurag Arnab and Chen Sun. Animations were created by Tom Small.

Categories
Misc

You can use almost any GPU for Deep Learning!

submitted by /u/limapedro
[visit reddit] [comments]

Categories
Misc

Expanding Hybrid-Cloud Support in Virtualized Data Centers with New NVIDIA AI Enterprise Integrations

Get the latest on NVIDIA AI Enterprise on VMware Cloud Foundation, VMware Cloud Director, and new curated labs with VMware Tanzu and Domino Data.

The new year has been off to a great start with NVIDIA AI Enterprise 1.1 providing production support for container orchestration and Kubernetes cluster management using VMware vSphere with Tanzu 7.0 update 3c, delivering AI/ML workloads to every business in VMs, containers, or Kubernetes.

New LaunchPad labs

New NVIDIA AI Enterprise labs for IT admins and MLOps are available on NVIDIA LaunchPad:

  • VMware vSphere with Tanzu
  • Domino Enterprise MLOps platform

NVIDIA AI Enterprise with VMware vSphere with Tanzu

Enterprises can get started quickly with NVIDIA AI Enterprise running on VMware vSphere with Tanzu through the free LaunchPad program that provides immediate, short-term access to NVIDIA AI running on private accelerated compute infrastructure.

A newly added curated lab gives you hands-on experience using VMware Tanzu Kubernetes Grid service to manage a containerized workload using the frameworks provided in the NVIDIA AI Enterprise software suite. In this lab, you can configure, optimize, and orchestrate resources for AI and data science workloads with VMware Tanzu. You get experience using the NGC registry, NVIDIA operators, Kubernetes, and server virtualization, all by running vSphere with Tanzu on NVIDIA-Certified Systems.

NVIDIA AI Enterprise with the Domino Enterprise MLOps platform

MLOps administrators also have something to get excited about, with the addition of another new curated lab available soon on NVIDIA LaunchPad.

NVIDIA AI Enterprise provides validation for the Domino Data Lab Enterprise MLOps Platform with VMware vSphere. Enterprises can run through the lab and get hands-on experience on how to scale data science workloads with the Domino MLOps platform. Data scientists and AI researchers will be able to  launch the Domino Workspaces on-demand with container images configured with the latest data science tools and frameworks, included in NVIDIA AI Enterprise, accelerated with NVIDIA GPUs.

The Domino MLOps Platform enables automatic storing and versioning of code, data, and results. When IT administrators try this lab, they use familiar management tools provided by VMware vSphere with Tanzu. Deployed on NVIDIA-Certified Systems, IT now has the confidence of enterprise-grade security, manageability, and support. 

New AI Enterprise integrations with VMware

Service providers, telcos, and hybrid cloud enterprises using VMware now have access to NVIDIA AI on the following products:

  • VMware Cloud Foundation
  • VMware Cloud Director
Architecture diagram of AI Enterprise layer stacked on Vmware VSphere with Tanzu or VMware Cloud Foundation with Tanzu, over the accelerated mainstream servers (DPU or GPU)

Enterprise AI together with VMware Cloud Foundation

NVIDIA and VMware have expanded support for NVIDIA AI Enterprise to VMware’s flagship hybrid-cloud platform, VMware Cloud Foundation 4.4. Further democratizing AI for every enterprise, the AI-Ready enterprise platform combines the benefits of the full stack VMware Cloud Foundation environment with the NVIDIA software suite running on GPU accelerated mainstream servers that are NVIDIA-Certified.  This provides you with the tools and frameworks you need for successfully developing and deploying AI, while enabling IT administrators with full control of infrastructure resources. 

The integration of VMware Cloud Foundation with Tanzu and NVIDIA AI Enterprise enables enterprises to extend their software-defined private cloud platform to support a flexible and easily scalable AI-ready infrastructure. Administrators can deploy, configure, and manage IT infrastructure in an automated fashion, using powerful tools like vSphere Distributed Resource Scheduler for initial placement and VMware vMotion for migration of VMs running NVIDIA GPUs.

IT admins benefit from this integration, but so do AI practitioners who can now easily consume admin-assigned resources for their AI and data analytics workloads to get them from development and deployment to scaling quickly.

Addressing the growing demand for AI in the cloud with VMware Cloud Director

As cloud providers are seeing increased demand from customers for modern applications that require accelerated compute, VMware Cloud Director 10.3.2 has added support for NVIDIA AI Enterprise. These service providers can now leverage vSphere support to run AI/ML workloads on NVIDIA Ampere-based GPUs with capabilities like vMotion, multi-tenancy GPU services combined with the most fundamental AI tools and frameworks all managed through VMware Cloud Director.

NVIDIA AI Enterprise with VMware Cloud Director enables industries like cloud service providers and telcos to provide new services for their customers. One example is intelligent video analytics (IVA) using GPU-accelerated computer vision to provide real-time insights for boosting public safety, lowering theft, and improving customer experiences.

Start your AI journey

Get started with NVIDIA AI Enterprise on LaunchPad for free. Apply now for immediate access to the software suite running on NVIDIA-Certified Systems.

Come join us at GTC, March 21-24 to hear and see NVIDIA, VMware, and our partners share more details about these new and exciting integrations. Register today.

Categories
Misc

What happens during validation in keras?

Hello I am training a model for semantic segmentation using dice loss and as a metric.

model.compile(...., loss=diceLoss, metrics=[dice_coef]) 

I noticed that in an epoch, the loss and dice_coef are equal. However, val_loss is never equal val_dice_coef. What is the reason behind that? I feel like something is being done (turned on/off) during validation.

I hope any of this makes sense. Thanks!

Edit: i am using segmentation-models library, using a unet with pretrained weights. I tried looking into the source code; but I couldn’t figure out what’s happening.

submitted by /u/potato-question-mark
[visit reddit] [comments]

Categories
Misc

Delivering One-Click VR Streaming Using Innoactive Portal and NVIDIA CloudXR

Building one-click delivery of PC-based VR applications to standalone devices by streaming from the Cloud using NVIDIA CloudXR SDK and Innoactive Portal.

Given how prevalent streaming technologies are today, it is easy to forget that on-demand, globally available applications used to be considered a radical idea. But as the popularity of AR and VR grows, the race is on to push streaming even further. With the demand for realistic content and consumer-grade hardware comes a critical challenge: How to build applications that look good on any device without compromising on the quality and performance requirements of immersive experiences?

To help solve this problem, the team at Innoactive has integrated the NVIDIA CloudXR streaming solution into their VR application deployment platform, Innoactive Portal. Customers like SAP, Deloitte, Linde Engineering, and Volkswagen Group are using the platform to deliver immersive training seamlessly to users whenever and wherever they are.

Portal is a user-centric, enterprise-grade deployment platform for managing and accessing VR content with a single click from any device. What makes Portal unique is the ability to deploy content across platforms and devices including PC VR, standalone VR, and PC 3D applications.

Using both the server and client components of the NVIDIA CloudXR SDK, with improvements in cloud computing frameworks, Portal makes it possible to build an application for PC VR and streaming to all-in-one headsets or other low-compute devices. All end users do is select the application from their library to stream it.

Overcoming the Distribution Bottlenecks of XR

Woman in VR headset interacting with the Innoactive Portal VR launcher
Figure 1. Innoactive Portal one-click user interface for launching to VR.

While there are several other enterprise VR platforms out there, Portal makes the process painless for both creators and end users by alleviating three of the most critical bottlenecks in distributing XR content:

  • The platform’s secure cloud infrastructure provides customers with centralized control of their content and users, ensuring access for only those who need it.
  • The one-click streaming architecture makes content easy to use for users who may not be familiar with navigating in VR.
  • The NVIDIA CloudXR support of more complex distribution models, like that of Portal, results in several cost-saving benefits, like the ability to build an application one time and deploy it to any device, anywhere.

It’s been technically possible to use NVIDIA CloudXR to stream a VR application for some time, but Innoactive wanted to provide this functionality to casual and even first-time VR users. They faced a set of challenges to make this happen:

  • When users want to start a cloud streamed VR experience, they need a suitable NVIDIA GPU-powered machine available to them on-demand.
  • This server running NVIDIA CloudXR not only requires good performance to perform the rendering, it also must be available to users worldwide.
  • At the same time, it is important to be as resource-aware as possible. To reach a broad audience, all of this must happen with minimal complexity for the user.

To deliver this zero-friction experience, Innoactive needed to build on top of both the client and server side of the NVIDIA CloudXR SDK.

Building a scalable server architecture for NVIDIA CloudXR

On the server, it is important to ensure that an NVIDIA GPU-powered machine is available on-demand to each user, no matter how many users are trying to access machines, or where or when those users require access.

Relying on cloud providers like Amazon Web Services (AWS) and Microsoft Azure for the compute resources is a logical solution. However, Portal was built with the guiding principle of being an open platform supporting all types of VR content as well as all types of environments to be deployed in.

This agnosticism of cloud providers implies that a service is needed, capable of supporting cloud-rendering machines on either cloud service. In turn this provides the most flexibility to customers who have already decided for one or the other upfront. It also enables Portal to provide maximum coverage and performance to end users by being able to switch to another data center with better latency or performance, independently of the respective provider.

For this reason, Innoactive relies on building custom virtual machine (VM) images with preinstalled software components for any cloud provider. They use a tool called Packer and enable the resulting VMs to be scaled using the respective cloud provider’s VM scale set functionality.

By building for today but planning for tomorrow, Innoactive ensures that this process can be replicated in the future and adapted to work with new cloud providers. Services are now capable of dynamically allocating rendering resources whenever and wherever needed.

Workflow starting with machine configuration through the packer, VM images, EC2 Auto Scaling group, and the Innoactive Portal
Figure 2. Custom VM image build pipeline for cloud-agnostic NVIDIA CloudXR servers

When going to production with cloud-rendered VR, the cost of remote rendering resources must stay as low as possible to provide a positive business case. At the same time, ensuring a great user experience requires a high level of performance, which in turn can quickly become expensive. Thus, the Portal team was tasked to find the optimal cost and performance trade-off.

One huge cost driver apart from the size of a VM is its uptime. The target is to pay for what is needed and no more. However, waiting too long to boot up a machine can make the process feel clumsy or untimely.

Due to the on-demand design of Portal, reducing user wait time is key to getting them into a cloud-rendered experience as fast as possible. Portal detects when users start engaging with it and prepares a VM in anticipation of them launching a VR application. This dramatically reduces the load time for when they launch an application, from a matter of minutes to a matter of seconds.

After the cloud-rendered session is complete, Portal’s cloud-rendering orchestration service automatically removes the machine using the respective cloud provider’s endpoints for scaling. This again ensures that only uptime is paid for, not dead time. Portal can adapt its rendering capabilities depending on the use case. For example, it can ensure that the GPU rendering pool is sufficiently large on days that are planned for VR training at a corporation while minimizing the pool size in times of low activity.

To optimize the cost-performance ratio, you should understand customer and regional differences. Keeping latency at a minimum during a session is one of the most important factors to guaranteeing a great experience. Generally, the closer a server is to the end users, the lower the latency and lag is between user input and the streamed application.

To ensure that users are allocated the nearest server, multiple algorithms are used, one example being IP lookups to narrow down user geolocations. In the future, relying on the 5G network and edge computing will further improve performance.

Diagram of Innoactive Portal’s cloud rendering architecture
Figure 3. Portal cloud-rendering architecture.

With the previously mentioned architecture and optimizations, Portal supports users with the intent to render a VR application remotely using NVIDIA CloudXR. Portal takes care of allocating rendering resources in the cloud, synchronizes the desired VR content to the remote machine and executes the NVIDIA CloudXR Server so that users can connect. All users do is select the desired VR content.

While providing value on its own, having a preconfigured, ready-to-roll remote rendering machine is just half of the equation. It still requires users to boot up their streaming client and configure it manually by entering the server IP address, which is not something novices are comfortable with.

Building an NVIDIA CloudXR client into Portal

By integrating the NVIDIA CloudXR SDK within the client application, Portal takes away these manual steps to provide a smooth user experience. To keep users updated throughout every step of the process, Portal uses WebSockets built on top of ASP.NET Core SignalR. The client application’s user interface is also built using Unity, to enable fast iteration while still delivering a great user experience.

This comes with the challenge of combining the NVIDIA CloudXR SDK, which is written in C++ using the Android NDK, and an interactive Unity scene into one Portal client app. The result is a single, modular Android application featuring multiple activities that can be easily loaded to a standalone VR headset like Meta Quest 2, Vive Focus 3, or Pico Neo 3.

The user interface guides users up to the point when a remote rendering server is available. It then automatically connects to it through the NVIDIA CloudXR client and ensures that the VR application is correctly started on the server, guaranteeing a frictionless entry into the streamed experience.

The business-logic layer, written in native Android (Java), contains logic to communicate back and forth with Innoactive’s backend services and handles the entire application state. It updates the user interface layer using a proprietary protocol shipped with Unity called UnitySendMessage and initializes the native NVIDIA CloudXR client library when required.

The native NVIDIA CloudXR activity is extended to use the Java Native Interface (JNI) to communicate back lifecycle events to the business layer. The example code later in this post show this. To isolate the native Unity and NVIDIA CloudXR activities from each other properly, both are designed to run in a dedicated Android process.

Diagram of the application layer architecture for the CloudXR client
Figure 4. Application-layer architecture of the NVIDIA CloudXR client.

The following code examples show the interfacing between Android code and NVIDIA CloudXR. First, the Android code example (Java):

import android.app.Activity;

public class MainActivity extends Activity implements ServerAvailableInterface {
    // load native NVIDIA CloudXR client library (*.aar)
    static {
        System.loadLibrary("CloudXRClient");
    }
    static native void initCloudXR(String jcmdline);

    @Override
    public void onServerReady(String serverIpAddress) {
      // initializes the native CloudXR client after server ip is known
      String arguments = String.format("-s %s", serverIpAddress);
      initCloudXR(arguments);
    }

    public void onConnected() {
      // called by native CloudXR activity when it is successfully connected
      System.out.println("Connected to CloudXR server")
    }

Second, theNVIDIA CloudXR code example (C++):

#include 
#include 
#include 

// ...

// called whenever the CloudXR client state changes
void Client::UpdateClientState(cxrClientState state, cxrStateReason reason) {
    switch (state) {
        // ...
        case cxrClientState_StreamingSessionInProgress:
            OnCloudXRConnected();
            break;
        // ...
    }
}

// Fired when the CloudXR client has established a connection to the server
void Client::OnCloudXRConnected() {
    jmethodID jOnConnected = env ->GetMethodID(mainActivity, "onConnected", "()V");
    env->CallVoidMethod(mainActivity, jOnConnected);
}

// ...

The final client application provides an intuitive interface with a minimal set of required user interactions to select the desired VR content. This interface is accessible from a second screen device independently of VR, often a laptop or phone.

With the Portal’s real-time communication, all connected clients–regardless of device–display the same status updates as a cloud-rendered session is loading. Users then receive a push notification after their content is ready in VR. When done with a session, users can directly proceed with the next streaming session without having to quit the client application.

Bundling Portal’s services together with the extended NVIDIA CloudXR client provides a globally scalable solution for users to access their VR content library from any device with just one click. Portal with NVIDIA CloudXR has changed how customers are able to deploy their PC-based VR. Where customers had a single platform per headset type, Portal now handles the complexity.

The easy-to-use, one-click frontend supports them in scaling VR-based use cases like training employees who are often new to VR. On a more technical level, customers can plan flexibly for headset deployment. They are no longer limited to PC-based VR due to the heavy calculation power that is needed to provide high-quality experiences.

Now, the power of cloud machines makes high-end rendering available on standalone devices through the Portal with NVIDIA CloudXR.

Additional resources

For more information about how Innoactive Portal has helped customers deploy VR training at scale, check out their case studies with SAP and Volkswagen Group.

For more information about NVIDIA CloudXR, see the developer page. Check out GTC 2022 this March 21-24 for the latest from NVIDIA on XR and the era of AI.

Categories
Misc

Tensorflow-gpu makes everything slow

Hello,

I hope it is okay to post this question here. I have installed tensorflow-gpu through the Anaconda Navigator because I have a RTX3090 that I would like to use. However, when using the environment that where I have tensorflow-gpu installed everything is super slow. Like just executing the model without training takes for ever. Even something as simple as

model = Sequential()

model.add(Dense(10, activation=”relu”)

model.add(Dense(1, activation=”sigmoid”)

model.compile(optimizer=”rmsprop”, loss=”binary_crossentropy”)

Does anyone have clue what might be the issue? Thank you in advance!

submitted by /u/0stkreutz
[visit reddit] [comments]

Categories
Misc

Jetson Project of the Month: Creating Intelligent Music with the Neurorack Deep AI-based Synthesizer

An image of the Neurorack synthesizer with NVIDIA Jetson Nano.This Jetson Project of the Month enhances synthesizer-based music by applying deep generative models to a classic Eurorack machine.An image of the Neurorack synthesizer with NVIDIA Jetson Nano.

Are you a fan of synthesizer-driven bands like Depeche Mode, Erasure, or Kraftwerk? Did you ever think of how cool it would be to create your own music with a synthesizer at home? And what if that process could be enhanced with the help of NVIDIA Jetson Nano?  

The latest Jetson Project of the Month has found a way to do just that, bringing together a Eurorack synthesizer with a Jetson Nano to create the Neurorack. This musical audio synthesizer is the first to combine the power of deep generative models and the compactness of a Eurorack machine.

“The goal of this project is to design the next generation of musical instruments, providing a new tool for musicians while enhancing the musician’s creativity. It proposes a novel approach to think [about] and compose music,” noted the app developers, who are members of Artificial Creative Intelligence and Data Science (ACIDS) group, based at the IRCAM Laboratory in Paris, France. “We deeply think that AI can be used to achieve this quest.”

The real-time capabilities of the Neurorack rely on Jetson Nano’s processing power and Ninon Devis’ research into crafting trained models that are lightweight in both computation and memory footprint.

“Our original dream was to find a way to miniaturize deep models and allow them inside embedded audio hardware and synthesizers. As we are passionate about all forms of synthesizers, and especially Eurorack, we thought that it would make sense to go directly for this format as it was more fun! The Jetson Nano was our go-to choice right at the onset … It allowed us to rely on deep models without losing audio quality, while maintaining real-time constraints,” said Devis.

Watch a demo of the project in action here:

The developers had several key design considerations as they approached this project, including:

  • Musicality: the generative model chosen can produce sounds that are impossible to synthesize without using samples.
  • Controllability: the interface that they picked is handy and easy to manipulate.
  • Real-time: the hardware behaves like a traditional synthesizer and is equally reactive.
  • Ability to standalone: it can be played without a computer.

As the developers note in their NVIDIA Developer Forum post about this project: “The model is based on a modified Neural-Source Filter architecture, which allows real-time descriptor-based synthesis of percussive sounds.”

Neurorack uses PyTorch deep audio synthesis models (see Figure 1) to produce sounds that typically require samples, is easy to manipulate and doesn’t require a separate computer.

A diagram showing how the Neurorack is architected with its own hardware and NVIDIA Jetson Nano.
Figure 1: The diagram shows the overall structure of the module and the relations between the hardware and software (green) components.

The hardware features four control voltage (CV) inputs and two gates (along with a screen, rotary, and button for handling the menus), which all communicate with specific Python libraries. The behavior of these controls (and the module itself) is highly dependent on the type of deep model embedded. For this first version of the Neurorack, the developers implemented a descriptor-based impact sounds generator, described in their GitHub documentation.

The Eurorack hardware and software were developed with equal contributions from Ninon Devis, Philippe Esling, and Martin Vert on the ACIDS team. According to their website, ACIDS is “hell-bent on trying to model musical creativity by developing innovative artificial intelligence models.”

The project code and hardware design are free, open-source, and available in their GitHub repository.

The team hopes to make the project accessible to musicians and people interested in AI/embedded computing as well.

“We hope that this project will raise the interest of both communities! Right now reproducing the project is slightly technical, but we will be working on simplifying the deployment and hopefully finding other weird people like us,” Devis said. “We strongly believe that one of the key aspects in developing machine learning models for music will lead to the empowerment of creative expression, even for nonexperts.”

More detail on the science behind this project is available on their website and in their academic paper.

Two of the team members, Devis and Esling, have formed a band using the instruments they developed. They are currently working on a full-length live act, which will feature the Neurorack and plan to perform during the next SynthFest in France this April.

Sign up now for Jetson Developer Day taking place at NVIDIA GTC on March 21. This full-day event led by world-renowned experts in robotics, edge AI, and deep learning, will give you a unique deep dive into building next-generation AI-powered applications and autonomous machines.