Categories
Misc

How to create .proto files for TF GRPC Serve Predict endpoint with gRPC?

I’ve deployed my own model with TF Serve in Docker. I’d like to consume that from a C# app via gRPC. So I guess I should somehow create the .proto files which to use to generate the C# classes. But how would I know the exact gRPC contract in order to create the .proto files?

submitted by /u/Vasilkosturski
[visit reddit] [comments]

Categories
Misc

Inside the DPU: Talk Describes an Engine Powering Data Center Networks

The tech world this week gets its first look under the hood of the NVIDIA BlueField data processing unit. The chip invented the category of the DPU last year, and it’s already being embraced by cloud services, supercomputers and many OEMs and software partners. Idan Burstein, a principal architect leading our Israel-based BlueField design team, Read article >

The post Inside the DPU: Talk Describes an Engine Powering Data Center Networks appeared first on The Official NVIDIA Blog.

Categories
Misc

Make History This GFN Thursday: ‘HUMANKIND’ Arrives on GeForce NOW

This GFN Thursday brings in the highly anticipated magnum opus from SEGA and Amplitude Studios, HUMANKIND, as well as exciting rewards to redeem for members playing Eternal Return. There’s also updates on the newest Fortnite Season 7 game mode, “Impostors,” streaming on GeForce NOW. Plus, there are nine games in total coming to the cloud Read article >

The post Make History This GFN Thursday: ‘HUMANKIND’ Arrives on GeForce NOW appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA at INTERSPEECH 2021

NVIDIA researchers are presenting five papers on our groundbreaking research in speech recognition and synthesis at INTERSPEECH 2021.

Researchers from around the world working on speech applications are gathering this month for INTERSPEECH, a conference focused on the latest research and technologies in speech processing. NVIDIA researchers will present five papers on groundbreaking research in speech recognition and speech synthesis.

Conversational AI research is fueling innovations in speech processing that help computers communicate more like humans and add value to organizations.

Accepted papers from NVIDIA at this year’s INTERSPEECH features the newest speech technology advancements, from free fully formatted speech datasets to new model architectures that deliver state-of-the-art performance.

Here are a couple featured projects:

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction 
Authors: Stanislav Beliaev, Boris Ginsburg
From the abstract: This model has only 13.2M parameters, almost 2x less than the present state-of-the-art text-to-speech models. The non-autoregressive architecture allows for fast training and inference. The small model size and fast inference make the TalkNet an attractive candidate for embedded speech synthesis.

This talk will be live on Thursday, September 2, 2021 at 4:45 pm CET, 7:45 am PST

Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices
Authors: Gonçalo Mordido, Matthijs Van Keirsbilck, Alexander Keller
From the abstract: For command recognition on Google Speech Commands v1, we improve the state-of-the-art accuracy from 97.21% to 97.41% at the same network size. For speech recognition on Librispeech, we half the number of weights to be trained while only sacrificing about 1% of the floating-point baseline’s word error rate.

This talk will be live on Friday, September 3, 2021 at 4 pm CET, 7 am PST

View the full schedule of NVIDIA activities >>>

Categories
Misc

NVIDIA Announces Financial Results for Second Quarter Fiscal 2022

NVIDIA today reported record revenue for the second quarter ended August 1, 2021, of $6.51 billion, up 68 percent from a year earlier and up 15 percent from the previous quarter, with record revenue from the company’s Gaming, Data Center and Professional Visualization platforms.

Categories
Misc

TensorFlow 2.0 Computer Vision Cookbook eBook

TensorFlow 2.0 Computer Vision Cookbook eBook submitted by /u/insanetech_
[visit reddit] [comments]
Categories
Misc

Best resource for TF basics?

Hi. I’ve gone through a good deal of Geron’s HOML, but I feel like I’m lacking in the ‘basics’ of TensorFlow. Are there any good ‘pre-HOML’ resources that you can recommend to build a firmer foundation? Thanks!

submitted by /u/disdainty
[visit reddit] [comments]

Categories
Misc

How to install cuda and cudnn in ubuntu 20.04

Hi there, I’m problems with the installing of cuda and cudnn in ubuntu 20.04. My Machine have a CPU and 54000h and GPU is a gtx 1650.

submitted by /u/WorryNo7966
[visit reddit] [comments]

Categories
Misc

Machine Learning Frameworks Interoperability. Part 2: Data Loading and Data Transfer Bottlenecks

Introduction Efficient pipeline design is crucial for data scientists. When composing complex end-to-end workflows, you may choose from a wide variety of building blocks, each of them specialized for a dedicated task. Unfortunately, repeatedly converting between data formats is an error-prone and performance-degrading endeavor. Let’s change that! In this post series, we discuss different aspects … Continued

Introduction

Efficient pipeline design is crucial for data scientists. When composing complex end-to-end workflows, you may choose from a wide variety of building blocks, each of them specialized for a dedicated task. Unfortunately, repeatedly converting between data formats is an error-prone and performance-degrading endeavor. Let’s change that!

In this post series, we discuss different aspects of efficient framework interoperability:

  • In the first post, discussing pros and cons of distinct memory layouts as well as memory pools for asynchronous memory allocation to enable zero-copy functionality.
  • In this post, we highlight bottlenecks occurring during data loading/transfers and how to mitigate them using Remote Direct Memory Access (RDMA) technology.
  • In the third post, we dive into the implementation of an end-to-end pipeline demonstrating the discussed techniques for optimal data transfer across data science frameworks.

To learn more on framework interoperability, check out our presentation at NVIDIA’s GTC 2021 Conference.

Data loading and data transfer bottlenecks

Data loading bottleneck

Thus far, we have worked on the assumption that the data is already loaded in memory and that a single GPU is used. This section highlights a few bottlenecks that might occur when loading your dataset from storage to device memory or when transferring data between two GPUs using either a single node or multiple nodes setting. We then discuss how to overcome them.

In a traditional workflow (Figure 2), when a dataset is loaded from storage to GPU memory, the data will be copied from the disk to the GPU memory using the CPU and the PCIe bus. Loading the data requires at least two copies of the data. The first one happens when transferring the data from the storage to the host memory (CPU RAM). The second copy of the data occurs when transferring the data from the host memory to the device memory (GPU VRAM).

A disk drive, a CPU, a GPU, and the system memory connected through a PCI Express switch. Data flows through all the elements.
Figure 1: Data movement between the storage, CPU memory, and GPU memory in a traditional setting.

Alternatively, using a GPU-based workflow that leverages NVIDIA Magnum IO GPUDirect Storage technology (see Figure 3), the data can directly flow from the storage to the GPU memory using the PCIe bus, without making use of neither the CPU nor the host memory. Since the data is copied only once, the overall execution time decreases. Not involving the CPU and the host memory for this task also makes those resources available for other CPU-based jobs in your pipeline.

A disk drive, a CPU, a GPU, and the system memory connected through a PCI Express switch. Data flows from the disk to the GPU.
Figure 2: Data movement between the storage and the GPU memory when GPUDirect Storage technology is enabled.

Intra-node data transfer bottleneck

Some workloads require data exchange between two or more GPUs located in the same node (server). In a scenario where NVIDIA GPUDirect Peer to Peer technology is unavailable, the data from the source GPU will be copied first to host-pinned shared memory through the CPU and the PCIe bus. Then, the data will be copied from the host-pinned shared memory to the target GPU through the CPU and the PCIe bus. Note that the data is copied twice before reaching its destination, not to mention the CPU and host memory are involved in this process. Figure 4 depicts the data movement described before.

A picture of two GPUs, a CPU, a PCIe bus and some system memory in the same node, and an animation of the data movement between the source GPU to a buffer in the system memory, and from there to the target GPU.
Figure 3: Data movement between two GPUs in the same node when NVIDIA GPUDirect P2P is unavailable.

When GPUDirect Peer to Peer technology is available, copying data from a source GPU to another GPU in the same node does not require the temporary staging of the data into the host memory anymore. If both GPUs are attached to the same PCIe bus, GPUDirect P2P allows for accessing their corresponding memory without involving the CPU. The former halves the number of copy operations needed to perform the same task. Figure 5 depicts the behavior just described.

A picture of two GPUs, a CPU, a PCIe bus and some system memory in the same node, and an animation of the data movement between the source GPU to the target GPU, without temporarily staging the data in the host memory.
Figure 4: Data movement between two GPUs in the same node when NVIDIA GPUDirect P2P is enabled.

Inter-node data transfer bottleneck

In a multi-node environment where NVIDIA GPUDirect Remote Direct Memory Access technology is unavailable, transferring data between two GPUs in different nodes requires five copy operations:

  • The first copy occurs when transferring the data from the source GPU to a buffer of host-pinned memory in the source node.
  • Then, that data is copied to the NIC’s driver buffer of the source node.
  • In a third step, the data is transferred through the network to NIC’s driver buffer of the target node.
  • A fourth copy happens when copying the data from the target’s node NIC’s driver buffer to a buffer of host-pinned memory in the target node.
  • The last step requires copying the data to the target GPU using the PCIe bus.

That makes a total of five copy operations. What a journey, isn’t it? Figure 6 depicts the process described before.

A picture of two nodes connected through a network. Each node has two GPUs, a CPU, a PCIe bus and some system memory. The data movement between the source GPU and the target GPU is represented by animation, depicting five data copies during that process.
Figure 5: Data movement between two GPUs in different nodes when NVIDIA GPUDirect RDMA is not available.

With GPUDirect RDMA enabled, the number of data copies gets reduced to just one. No more intermediate data copies in shared pinned memory. We can directly copy the data from the source GPU to the target GPU in a single run. That saves us four unnecessary copy operations, compared to a traditional setting. Figure 7 depicts this scenario.

A picture of two nodes connected using a network. Each node has two GPUs, a CPU, a PCIe bus and some system memory. The data is copied once to while being transferred from the source GPU to the target GPU.
Figure 6: Data movement between two GPUs in different nodes when NVIDIA GPUDirect RDMA is not available.

Conclusion

In our second post, you have learned how to exploit NVIDIA GPUDirect functionality to further accelerate the data loading and data distribution stages of your pipeline.

In the third part of our trilogy, we will dive into the implementation details of a medical data science pipeline for the outlier detection of heartbeats in a continuously measured electrocardiogram (ECG) stream.

Categories
Misc

Big Computer on Campus: Universities Graduate to AI Super Systems

This back-to-school season, many universities are powering on brand new AI supercomputers. Researchers and students working in fields from basic science to liberal arts can’t wait to log on. “They would like to use it right now,” said James Wilgenbusch, director of research computing at the University of Minnesota, speaking of Agate, an accelerated supercomputer Read article >

The post Big Computer on Campus: Universities Graduate to AI Super Systems appeared first on The Official NVIDIA Blog.