Categories
Misc

Machine Learning Frameworks Interoperability. Part 2: Data Loading and Data Transfer Bottlenecks

Introduction Efficient pipeline design is crucial for data scientists. When composing complex end-to-end workflows, you may choose from a wide variety of building blocks, each of them specialized for a dedicated task. Unfortunately, repeatedly converting between data formats is an error-prone and performance-degrading endeavor. Let’s change that! In this post series, we discuss different aspects … Continued

Introduction

Efficient pipeline design is crucial for data scientists. When composing complex end-to-end workflows, you may choose from a wide variety of building blocks, each of them specialized for a dedicated task. Unfortunately, repeatedly converting between data formats is an error-prone and performance-degrading endeavor. Let’s change that!

In this post series, we discuss different aspects of efficient framework interoperability:

  • In the first post, discussing pros and cons of distinct memory layouts as well as memory pools for asynchronous memory allocation to enable zero-copy functionality.
  • In this post, we highlight bottlenecks occurring during data loading/transfers and how to mitigate them using Remote Direct Memory Access (RDMA) technology.
  • In the third post, we dive into the implementation of an end-to-end pipeline demonstrating the discussed techniques for optimal data transfer across data science frameworks.

To learn more on framework interoperability, check out our presentation at NVIDIA’s GTC 2021 Conference.

Data loading and data transfer bottlenecks

Data loading bottleneck

Thus far, we have worked on the assumption that the data is already loaded in memory and that a single GPU is used. This section highlights a few bottlenecks that might occur when loading your dataset from storage to device memory or when transferring data between two GPUs using either a single node or multiple nodes setting. We then discuss how to overcome them.

In a traditional workflow (Figure 2), when a dataset is loaded from storage to GPU memory, the data will be copied from the disk to the GPU memory using the CPU and the PCIe bus. Loading the data requires at least two copies of the data. The first one happens when transferring the data from the storage to the host memory (CPU RAM). The second copy of the data occurs when transferring the data from the host memory to the device memory (GPU VRAM).

A disk drive, a CPU, a GPU, and the system memory connected through a PCI Express switch. Data flows through all the elements.
Figure 1: Data movement between the storage, CPU memory, and GPU memory in a traditional setting.

Alternatively, using a GPU-based workflow that leverages NVIDIA Magnum IO GPUDirect Storage technology (see Figure 3), the data can directly flow from the storage to the GPU memory using the PCIe bus, without making use of neither the CPU nor the host memory. Since the data is copied only once, the overall execution time decreases. Not involving the CPU and the host memory for this task also makes those resources available for other CPU-based jobs in your pipeline.

A disk drive, a CPU, a GPU, and the system memory connected through a PCI Express switch. Data flows from the disk to the GPU.
Figure 2: Data movement between the storage and the GPU memory when GPUDirect Storage technology is enabled.

Intra-node data transfer bottleneck

Some workloads require data exchange between two or more GPUs located in the same node (server). In a scenario where NVIDIA GPUDirect Peer to Peer technology is unavailable, the data from the source GPU will be copied first to host-pinned shared memory through the CPU and the PCIe bus. Then, the data will be copied from the host-pinned shared memory to the target GPU through the CPU and the PCIe bus. Note that the data is copied twice before reaching its destination, not to mention the CPU and host memory are involved in this process. Figure 4 depicts the data movement described before.

A picture of two GPUs, a CPU, a PCIe bus and some system memory in the same node, and an animation of the data movement between the source GPU to a buffer in the system memory, and from there to the target GPU.
Figure 3: Data movement between two GPUs in the same node when NVIDIA GPUDirect P2P is unavailable.

When GPUDirect Peer to Peer technology is available, copying data from a source GPU to another GPU in the same node does not require the temporary staging of the data into the host memory anymore. If both GPUs are attached to the same PCIe bus, GPUDirect P2P allows for accessing their corresponding memory without involving the CPU. The former halves the number of copy operations needed to perform the same task. Figure 5 depicts the behavior just described.

A picture of two GPUs, a CPU, a PCIe bus and some system memory in the same node, and an animation of the data movement between the source GPU to the target GPU, without temporarily staging the data in the host memory.
Figure 4: Data movement between two GPUs in the same node when NVIDIA GPUDirect P2P is enabled.

Inter-node data transfer bottleneck

In a multi-node environment where NVIDIA GPUDirect Remote Direct Memory Access technology is unavailable, transferring data between two GPUs in different nodes requires five copy operations:

  • The first copy occurs when transferring the data from the source GPU to a buffer of host-pinned memory in the source node.
  • Then, that data is copied to the NIC’s driver buffer of the source node.
  • In a third step, the data is transferred through the network to NIC’s driver buffer of the target node.
  • A fourth copy happens when copying the data from the target’s node NIC’s driver buffer to a buffer of host-pinned memory in the target node.
  • The last step requires copying the data to the target GPU using the PCIe bus.

That makes a total of five copy operations. What a journey, isn’t it? Figure 6 depicts the process described before.

A picture of two nodes connected through a network. Each node has two GPUs, a CPU, a PCIe bus and some system memory. The data movement between the source GPU and the target GPU is represented by animation, depicting five data copies during that process.
Figure 5: Data movement between two GPUs in different nodes when NVIDIA GPUDirect RDMA is not available.

With GPUDirect RDMA enabled, the number of data copies gets reduced to just one. No more intermediate data copies in shared pinned memory. We can directly copy the data from the source GPU to the target GPU in a single run. That saves us four unnecessary copy operations, compared to a traditional setting. Figure 7 depicts this scenario.

A picture of two nodes connected using a network. Each node has two GPUs, a CPU, a PCIe bus and some system memory. The data is copied once to while being transferred from the source GPU to the target GPU.
Figure 6: Data movement between two GPUs in different nodes when NVIDIA GPUDirect RDMA is not available.

Conclusion

In our second post, you have learned how to exploit NVIDIA GPUDirect functionality to further accelerate the data loading and data distribution stages of your pipeline.

In the third part of our trilogy, we will dive into the implementation details of a medical data science pipeline for the outlier detection of heartbeats in a continuously measured electrocardiogram (ECG) stream.

Categories
Misc

Big Computer on Campus: Universities Graduate to AI Super Systems

This back-to-school season, many universities are powering on brand new AI supercomputers. Researchers and students working in fields from basic science to liberal arts can’t wait to log on. “They would like to use it right now,” said James Wilgenbusch, director of research computing at the University of Minnesota, speaking of Agate, an accelerated supercomputer Read article >

The post Big Computer on Campus: Universities Graduate to AI Super Systems appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA Announces New Omniverse Educational Programs

NVIDIA Omniverse is key to bringing advanced architectural design concepts combining AI, simulation and collaborative workflows to the next generation of students.

NVIDIA announced at SIGGRAPH several additions to its Deep Learning Institute (DLI) curriculum, including an introductory course to Pixar’s Universal Scene Description (USD) and a teaching kit for educators, looking to incorporate hands-on technical training into graphics, architectural design, and digital media production coursework. 

The teaching kit will be based on NVIDIA Omniverse, an open platform for virtual collaboration and real-time simulation for quick and easy creation of photorealistic, physically accurate designs.

NVIDIA is collaborating with Professor Don Greenberg from Cornell University

NVIDIA Omniverse is key to bringing advanced architectural design concepts—which combine AI, simulation, and collaborative workflows—to the next generation of students. 

Professor Don Greenberg will be joining NVIDIA to codevelop Omniverse-based lecture materials, hands-on exercises, and strategies for early-stage design, which can be integrated into class curricula. Greenberg is Cornell University’s Jacob Gould Schurman Professor of Computer Graphics in the College of Architecture, Art, and Planning’s Department of Architecture and Director of the Cornell Program of Computer Graphics.

Greenberg’s expertise is extensive, having authored hundreds of articles, and educated thousands of computer graphics students, architects, and structural engineers. A Coon’s awards winner in 1987, two of his students (Rob Cook and Michael Cohen) have also won the award. Among Greenberg’s students, six have won the SIGGRAPH Achievement Award, and 16 have won Academy Awards for Scientific and Technical Achievements.

“Omniverse, combined with the computational power of new NVIDIA graphics boards, holds the incredible potential to improve early design strategies. This is the time in the design cycle when the most important design decisions are made. By combining the sketched ideas with rapid feedback from parametric studies, such as energy performance, structural Integrity, life cycle costing, or the overall appearance or view options from the nonexistent simulated building in context, better and more comprehensive solutions can be obtained from sketch to reality,” said Greenberg. “I am really excited to start this collaboration with NVIDIA and make these next generation design tools available to other schools and the profession.”

New DLI Graphics and Omniverse Teaching Kit

Designed for educators, the new Graphics & Omniverse Teaching Kit is being developed in consultation with top digital media, film, game development, animation, and visual effects schools as part of the NVIDIA Studio Education Partner Program

Comprehensive and modular, the kit will include lecture materials, quiz problem sets, hands-on exercises, NVIDIA GPU Cloud resources, and more. It is ideal for college and university educators looking to bring graphics and NVIDIA Omniverse into their classrooms, enabling the next generation of creatives. Students will have the opportunity to receive certificates of competency to support career growth. 

The kit will also incorporate teaching materials from The Graphics Codex, an in-depth computer rendering resource covering ray tracing, materials, GPU programming, and human perception. The Graphics Codex was written by Morgan McGuire, the Chief Scientist at Roblox and Computer Science professor at the Universe of Waterloo and McGill University. Also included is the new Ray Tracing Gems II ebook, authored by Adam Marrs, which covers Next Generation Real-Time Rendering with DXR, Vulkan, and OptiX. Marrs is a senior graphics engineer in the Game Engines and Core Technology group at NVIDIA, where he works on real-time rendering for games and film.

More content is being added and early access applications for educators are now available.  

New DLI Training for Everyone 

For those wanting to start using Omniverse to create their own masterpieces but not sure where to start, NVIDIA launched a new DLI training course called Getting Started with Universal Scene Description for Collaborative 3D Workflows.

This self-paced, introductory training course familiarizes users with Universal Scene Description (USD), a framework created by Pixar for the interchange of 3D computer graphics data. USD focuses on collaboration, nondestructive editing, and multiple views and opinions on graphics data. 

The course covers the history and purpose of USD, an overview of scene composition using Python, and includes a series of hands-on exercises consisting of training videos and live scripted examples. Developers will master important concepts such as layer composition, references, and variants. 

This course can be taken anytime, anywhere, with just a computer and an Internet connection—enroll here

Also released at SIGGRAPH is Masterclass by the Masters – Using Omniverse for Artistic 3D Workflows, a new video series focused on how Omniverse can be customized for artistic and creative workflows. Viewers will experience a collection of vignettes made by experts and creative masters, plus live-edit, multi-app workflows using Omniverse Connectors, as well as scene composition and rendering in Omniverse Create.

DLI’s top-rated Fundamentals of Deep Learning will be available at a special offer to all SIGGRAPH attendees throughout August and September. Use the code DLI_SIGGRAPH21 to receive 25% off this workshop. This instructor-led course takes participants through the workings of deep learning with hands-on exercises in computer vision and natural language processing. Those who complete the course will earn a certificate demonstrating their subject matter competency and supporting career growth. 

These courses are just a fraction of what is offered through NVIDIA DLI. Since 2017, DLI has trained more than 300,000 developers through an extensive catalog of training offerings. 

For other NVIDIA announcements and events at this year’s SIGGRAPH, including even more training opportunities, visit the SIGGRAPH page.

Categories
Misc

Investing in Developer Communities Across Africa: NVIDIA AI Emerging Chapters and Python Ghana

Developers across Africa honed their skills in recent online trainings made possible by the NVIDIA AI Emerging Chapters and Python Ghana collaboration.

This is a guest submitted post by Michael Young, co-founder and executive board member at Python Ghana, Python Software Foundation Fellow, and PyCon Africa Executive.

The rate of technological change is the defining characteristic of our generation. Its impact on work, labor, how people live, our social and political interactions, have all been and are being transformed by the digital revolution.” – Tony Blair, former UK prime minister.

Although Africa accounts for around 17% of the world’s population, it only contributes about 3% of the global GDP. It is a known fact that one of the major drivers directly influencing economic growth is the ability to efficiently use technology. 

In Africa several constraints impede adequate utilization of technology, such as unavailability of accessible Internet, expensive and unaffordable data bundles by local providers, and lack of educational reform to bridge the gap of skills between academia and industry. 

Despite poor incentives by governmental bodies to improve technological literacy rates, Africa is still being touted as the next significant growth market by global economic institutions. This could be attributed to its resilience to thrive despite challenging odds.

Recent global events such as the COVID-19 pandemic have fundamentally challenged traditional structures and business models.

Recent global events such as the COVID-19 pandemic have fundamentally challenged traditional structures and business models. Hafez Ghanem, the World Bank Vice President to Africa, said, “The COVID-19 pandemic is testing the limits of societies and economies across the world, and African countries are likely to be hit particularly hard”. As a result, intelligent, data-driven solutions will be crucial to enable sustainability.

Emerging technologies like AI, IoT, blockchain technology, and big data are fast-tracking the digital transformation in numerous sectors across the continent. A few multinational companies are conscientiously investing in Africa, providing resources and opportunities to emerging markets with the objective to improve the social adaptation of technology in these regions. As individuals in underserved communities make full use of these resources, more talent becomes available, which consequently attracts and increases investments, accelerating growth.

NVIDIA is taking active steps toward positively contributing to the growing trend of technology in emerging markets with NVIDIA AI Emerging Chapters. The initiative aims to provide communities with the knowledge and resources for aspiring developers to build and scale AI expertise, nurture emerging technologies, and drive innovation. 

Working with Python Ghana, NVIDIA gathered members from across the country to take part in online live training sessions with experts. The training gave participants access to world-class best practices, and knowledge to facilitate their development as AI engineers.

Many had feedback from the sessions:

 “Taking this course has been an eye opener to me. While undertaking my undergraduate degree as an electrical engineer, I was passionate to find ways faults could be detected in the transformers and other electrical devices.Honestly, after taking this course, I have better understanding and insight. The labs were helpful and quite detailed. I am actually considering opting for a PhD in electrical engineering so I can fully explore the applications of AI to predict faults and downtimes in electrical machines” -— Martha Teye, Applications of AI for Predictive Maintenance 

I’m so excited today! I completed an expert-led, hands-on workshop on Applications of AI for Anomaly Detection as part of the NVIDIA GTC21 conference. Thanks to the instructor Kevin McFall who was amazing and made tough concepts very easy to grasp. Now, I feel very confident in my ability to build strong anomaly detection models, just bring me the data (labeled or not.) A big thanks to Python Ghana and NVIDIA for giving me the opportunity to participate in this prestigious training. I got an assessment score of 100% and earned a certificate of competency as well!”  — Aseda Addai – Deseh, Applications of AI for Anomaly Detection

The workshop gave me the opportunity to add on to my experience with data manipulation and Machine Learning tools, specifically pandas, numpy, and scikit learn libraries and packages, as I experienced new, and evidently more powerful tools, available on the RAPIDS platform. It was an insightful and exhilarating experience. I love and appreciate data science even more now.” — Kate Abena Cobbinah, Fundamentals of Accelerated Data Science

The program was very enlightening and very practical. Questions were answered almost immediately. The trainers also knew their stuff and were able to give good insights when they answered questions.” – Ahmad Bilesanmi, Introduction to Data Science

I’m grateful for a great learning experience at the Deep Learning Institute at this year’s NVIDIA GTC. A great exposition on GPU-accelerated data science. We were exposed to our usual pandas, numpy, scikit-learn etc. on steroids. I learned, first-hand, the utility of their GPU-equivalents in CuDF, CuPy, and CuML. Imagine loading 58,479,894 rows of data in 2 seconds with CuDF compared to 30 seconds with pandas.” — Kwodwo Graham, Fundamentals of Accelerated Data Science with RAPIDS

Categories
Misc

What is a Machine Learning Model?

When you shop for a car, the first question is what model — a Honda Civic for low-cost commuting, a Chevy Corvette for looking good and moving fast, or maybe a Ford F-150 to tote heavy loads. For the journey to AI, the most transformational technology of our time, the engine you need is a Read article >

The post What is a Machine Learning Model? appeared first on The Official NVIDIA Blog.

Categories
Misc

Tensorflow Lite supported Machine Learning Models

Hi

I am relatively new to TensorFlow lite. I would like to ask that whether tensorflow supports deep learning only or we can deploy other models such LDA and SVM through tensorflow lite.

submitted by /u/usmanqadir91
[visit reddit] [comments]

Categories
Misc

Differentiation in JAX with Simple Examples

Differentiation in JAX with Simple Examples submitted by /u/yasserius
[visit reddit] [comments]
Categories
Misc

Validating Tensorflow lite micro research project

Hi everyone, I’m currently preparing for a research project which encapsulates TinyML (TF Lite micro) and Communication protocols. Due to the lack of researchers in TinyML in my area, I would much appreciate it if you knowledgeable folks in Embedded devices, Tensorflow, Distributed learning could help me validate my research idea ;). Feel free to pm me if you want more information.

submitted by /u/dieselVeasel
[visit reddit] [comments]

Categories
Misc

object detection: object belonging to two classes

Say an object belongs to two classes (a human might place it in either of the two). Can that be handled by labeling it with two overlapping bounding boxes, one for each class?

Does the ssd model as implemented in the object detection api handle the two boxes independently during training? From my understanding of ssd, it makes an independent membership-prediction for each of the N+1 classes (including a “no object” prediction) so I think it should work out, but I might have misunderstood.

submitted by /u/Meriipu
[visit reddit] [comments]

Categories
Misc

How to vectorize an audio sample

Hi, I can’t figure out how to vectorize audio, I want to create with autoencoder a sound detector.

a word (command) is the input and output is a sampled command in more hi-definition is just for avoiding errors on transmission or degraded audio.

thanks

submitted by /u/paulred70
[visit reddit] [comments]