DataBloom - Part 348

Misc

Latest Releases and Resources: March 10-16

Post author By
Post date March 10, 2022
No Comments on Latest Releases and Resources: March 10-16

Join experts from NVIDIA at the Healthcare and Life Sciences Developer Summit at GTC; attend Omniverse sessions at GDC; and get hands-on NVIDIA Deep Learning Institute training at GTC.

Our weekly roundup covers the most recent software updates, learning resources, events, and notable news.

Events

Healthcare and Life Sciences Developer Summit

Join developers and AI experts from the NVIDIA healthcare ecosystem for talks, demos, and hands-on workshops at GTC. Get the latest applications and frameworks advancing AI in healthcare, led by experts from NVIDIA and Quantiphi. The Healthcare and Life Sciences Developer Summit at GTC will explore AI advancements across a wide range of domains, including biopharma, medical devices, smart hospitals, and genomics.

Learning resources

Learn about 3D Collaboration and Development at GDC 2022

NVIDIA Omniverse will be at GDC 2022 featuring a variety of world-class engineers and technology leaders in panels, deep-dives, and sessions. Attendees will learn how the future of collaboration will revolutionize their workflow, along with how these changes will transform industries and development altogether.

Each of these sessions will touch on a different aspect of the virtual world-building workflow, and are available online.

Bring It to Life: A Look at Omniverse’s New Runtime Animation System
Opportunities for Universal Scene Description in Game Development
The Importance of Digital Humans for Industries
Deep Dive: One-Click Animation Retargeting in Omniverse

Get Hands-On Training with NVIDIA Deep Learning Institute for $149 at GTC

Get hands-on NVIDIA Deep Learning Institute training at GTC. Choose from 26 full-day workshops at a special GTC rate of $149 available in multiple languages and time zones. Workshop topics include deep learning, cybersecurity, recommender systems, NLP, and more.

Misc

Light Me Up: Innovators Redefine Energy Meters for a More Efficient Grid

Post author By
Post date March 10, 2022
No Comments on Light Me Up: Innovators Redefine Energy Meters for a More Efficient Grid

Say hello to tomorrow’s smart electric meter, literally. You can ask some next-generation home energy hubs questions, just like you do Alexa or Siri. Some devices, arriving this year, will display real-time simulations — vibrant as a video game — to show how you can lower your energy bill or reduce your carbon footprint. They’ll Read article >

The post Light Me Up: Innovators Redefine Energy Meters for a More Efficient Grid appeared first on NVIDIA Blog.

Misc

Insider’s Guide to GTC: AR/VR, Rendering, Simulation, and Video Streaming

Post author By
Post date March 10, 2022
No Comments on Insider’s Guide to GTC: AR/VR, Rendering, Simulation, and Video Streaming

Notable sessions in the making of Green Planet AR, animating realistic digital humans, building a power industry digital twin, and making virtual production more accessible.

Looking for different topic areas? Keep an eye out for our other posts!

Join us at GTC, March 21-24, to explore the latest technology and research across AI, computer vision, data science, robotics, and more!

With over 900 options to choose from, our NVIDIA experts put together some can’t-miss sessions to help get you started:

AR / VR

How to Design Collaborative AR and VR worlds in Omniverse
Omer Shapira, Senior Engineer, Omniverse, NVIDIA

This session introduces Omniverse XR and is led by the engineer building this solution. Omer Shapira will discuss Omniverse’s real-time ray-traced XR renderer, Omniverse’s Autograph system for visual programming, and show how customers are using Omniverse’s XR tools to do everything together—from design reviews to hangouts.

NVIDIA’s New Tools and SDKs for XR Development
Ingo Esser, Principal Developer Technology Engineer, NVIDIA
Peter Pang, Senior Product Manager, NVIDIA
Jason Mawdsley, Director, AI Gaming and Head of AI Gaming Products, NVIDIA
Tion Thomas, RTXGI SDK Producer, NVIDIASirisha Rella, Product Marketing Manager, NVIDIA
Stephanie Rubenstein, Product Marketing Manager, NVIDIA

Experts across the NVIDIA SDK teams are presenting the latest updates and best practices for DLSS, RTXGI, and Riva in XR development. They are also introducing the new VR Capture and Replay tool, which is being released at GTC as early access.

Bring the Green Planet into Your Hands with 5G, AR, and Edge Compute
Stephen Stewart, CTO, Factory 42

This session is a great example of how AR experiences can encourage positive change. Learn how NVIDIA CloudXR helped deliver an innovative mobile AR experience, through a 5G private network, to thousands of people in Piccadilly Circus, London. Deep dive into the challenges of bringing the natural worlds in the Green Planet AR experience—inspired by the BBC’s Green Planet TV series—into the hands of so many people.

Rendering

Connect with the Experts: Getting Started with Ray Tracing and NVIDIA’s Ray Tracing Developer Tools
Aurelio Reis, Director, Graphics Developer Tools, NVIDIA
Jeffrey Kiel, Senior Engineering Manager, Graphics Developer Tools, NVIDIA

Aurelio Reis and Jeff Kiel bring more than 35 years of combined experience building graphics developer tools. This includes real-time ray-tracing solutions such as Nsight Graphics, making this a great session for developers to get their questions answered directly from the team developing the tools.

The Making of Unity’s Latest Flagship Demo, An Achievement in Graphics and Rendering
Mark Schoennagel, Senior Developer Advocate, Unity

Mark Schoennegal has lived through the renaissance of the 3D industry, from the first ray-traced animations all the way through to today’s blockbuster AAA game titles. In this session, he will share how Unity’s award-winning demo team pushes the boundaries of graphics and rendering to achieve realistic digital humans in Enemies.

NVIDIA DLSS Overview and Game Integrations
Andrew Edelsten, Director, Developer Technologies (Deep Learning), NVIDIA

This session is the perfect primer for anyone interested in integrating DLSS to boost game frame rates. Andrew Edelsten, a director with over 20 years of experience in gaming and visual arts, will give a behind-the-scenes look at the underlying technology that powers DLSS and how to get started.

Simulation / Modeling / Design

Accelerating the Next Wave of Innovation and Discovery
Ian Buck, Vice President and General Manager of Accelerated Computing, NVIDIA

Join this special session with Ian Buck, vice president and general manager of Accelerated Computing at NVIDIA and inventor of CUDA. Buck will dive into the latest news, innovations, and technologies that will help companies, industries, and nations reap the benefits of AI supercomputing.

A Vision of the Metaverse: How We will Build Connected Virtual Worlds
Rev Lebaredian, VP Omniverse & Simulation Technology, NVIDIA
Dean Takahashi, Lead Writer, Venture Beat
Virginie Maillard, Global Head of Simulation and Digital Twin Research, Siemens
Amy Bunszel, Executive VP, AEC Design Solutions, Autodesk
Timoni West, VP Augmented, Virtual, and Mixed Reality, Unity
Lori Hufford, VP Engineering Collaboration, Bentley Systems

The speakers within this panel are from some of the largest and most reputable technology companies in the world. The session itself will dive into the future of these 3D virtual worlds and also look at how these companies’ technologies will work together to make it all happen.

The Importance of Digital Humans for Industries
Sarah Bacha, Head of Research and Innovation, Cedrus Digital
Markus Gross, Vice President of Research, Walt Disney Studios
Vladimir Mastilovic, VP of Digital Humans Technology, Epic Games
Matt Workman, Developer, Cine Tracer
Simon Yuen, Director of Graphics and AI, NVIDIA
John Martin II, VP of Product Marketing, Reallusion
Erroll Wood, Principal Scientist, Microsoft

These leaders in research and graphics will talk through the effect that digital humans will have on our professional workloads and how they’ll change our daily lives in the future. These accomplished panelists are technical and business leaders at top software, research, and technology companies.

Case Study on Developing Digital Twins for the Power Industry using Modulus and Omniverse
Stefan Lichtenberger, Technical Portfolio Manager, Siemens Energy
Ram Cherukuri, Senior Product Manager, NVIDIA

In this joint talk with Siemens Energy, we’ll cover building a digital twin of a heat-recovery steam generator unit by simulating the corrosive effects of heat, water, and other conditions on metal over time for predictive maintenance. Learn how the NVIDIA Modulus framework was used to create the underlying physics machine language digital twin model that’s connected, simulated, and visualized in Omniverse.

Video Streaming / Conferencing

Project Starline: A High-fidelity Telepresence System
Harris Nover, Software Engineer, Google
Jason Lawrence, Research Scientist, Google

Imagine sitting and looking through a pane of glass and there you see another person, digitally created but also so real, on the other side. This science-fiction scenario has become reality, combining a 3D representation compressed and rendered in real time, giving the sense that you are in fact present in the same place, communicating together. Harris Nover and Jason Lawrence from Google will walk you through this breakthrough technology.

Put Your Body into It! Easy Talent Tracking in Virtual Environments
Øystein Larsen, Chief Creative Officer, Pixotope

Catch this full feature session from Øystein Larsen to see how this visual effects award-winning creator manipulates what’s possible in a virtual environment with the power of NVIDIA RTX and Maxine.

How Avaya’s Real-time Cloud Media Processing Core Maintains Ultra-low Latency with Maxine
Stephen Whynot, Cloud Media Architect, Avaya

See just what NVIDIA Maxine brings to the table in video streaming alterations and improvements, while still maintaining real-time accuracy with this breakout session by Stephen Whynot from Avaya.

Misc

GeForce NOW RTX 3080 One-Month Memberships Now Available

Post author By
Post date March 10, 2022
No Comments on GeForce NOW RTX 3080 One-Month Memberships Now Available

The GeForce NOW RTX 3080 membership gives gamers unrivaled performance from the cloud – with latency so low that it feels just like playing on a local PC. Today, gamers can experience RTX 3080-class streaming at only $19.99 a month, thanks to GeForce NOW’s new monthly membership plans*. It’s a great chance to experience powerful Read article >

The post GeForce NOW RTX 3080 One-Month Memberships Now Available appeared first on NVIDIA Blog.

Misc

How ML Hybrid Parser Beats Traditional Parser

Post author By
Post date March 9, 2022
No Comments on How ML Hybrid Parser Beats Traditional Parser

How ML Hybrid Parser Beats Traditional Parser

submitted by /u/Kagermanov
[visit reddit] [comments]

Misc

Accelerating Quantum Circuit Simulation with NVIDIA cuStateVec

Post author By
Post date March 9, 2022
No Comments on Accelerating Quantum Circuit Simulation with NVIDIA cuStateVec

Icon of NVIDIA green circle with an arrow pointing diagonally up right. cuStateVec is a library for acceleration of state vector-based quantum circuit simulation. We discuss APIs, integrations, and benchmarks.

Quantum computing aspires to deliver more powerful computation and faster results for certain types of classically intractable problems. Quantum circuit simulation is essential to understanding quantum computation and the development of quantum algorithms. In a quantum circuit, the quantum device is composed of N qubits, and computations are performed by applying a sequence of quantum gates and measurements to the qubits.

Mathematically, the quantum state of the N-qubit system can be described as a complex 2^N-dimensional vector. The most intuitive method to simulate a quantum circuit on a classical computer, known as state vector simulation, stores this vector with its 2^N complex values directly in memory. The circuit is executed by multiplying the vector by a series of matrices that correspond to the gate sequence that makes up the circuit.

However, as the dimension of the state vector grows exponentially with the number of qubits, the memory requirements for a full description of the state limits this method to circuits with 30–50 qubits. Alternative methods based on tensor networks can simulate significantly more qubits but are generally limited in the depth and complexity of circuits that they can effectively simulate.

The NVIDIA cuQuantum SDK features libraries for state vector and tensor network methods. In this post, we focus on state vector simulation and the cuStateVec library. For more information about the library for tensor network methods, see Scaling Quantum Circuit Simulation with NVIDIA cuTensorNet.

cuStateVec library

The cuStateVec library provides single GPU primitives to accelerate state vector simulations. As the state vector method is fundamental in simulating quantum circuits, most quantum computing frameworks and libraries include their own state vector simulator. To enable easy integration to these existing simulators, cuStateVec provides an API set to cover common use cases:

Measurement
Gate application
Expectation value
Sampler
State vector movement

Measurement

A qubit can exist in a superposition of two states, |0> and |1>. When a measurement is performed, one of the values is probabilistically selected and observed, and another value collapses. The cuStateVec measurement API simulates qubit measurement and supports use cases of the measurement on the Z-basis product and batched single-qubit measurements.

Gate application

Quantum circuits have quantum logic gates to modify and prepare quantum states to observe a desirable result. Quantum logic gates are expressed as unitary matrices. The cuStateVec gate application API provides features to apply quantum logic gates for some matrix types, including the following:

Dense
Diagonal
Generalized permutation
Matrix exponential of Pauli matrices

Expectation value

In quantum mechanics, expectation value is calculated for an operator and a quantum state. For quantum circuits, we also calculate the expectation for given circuit and quantum states. cuStateVec has an API to calculate the expectation value with a small memory footprint.

Sampler

The state vector simulation numerically keeps quantum states in the state vector. By calculating the probability for each state vector element, you can efficiently simulate measurements of multiple qubits for multiple times without collapsing the quantum state. The cuStateVec sampler API executes sampling on GPU with a small memory footprint.

State vector movement

The state vector is placed on a GPU to accelerate simulations by the GPU. To analyze a simulation result on a CPU, copy the resulting state vector to the CPU. cuStateVec provides the accessor API to do this on behalf of users. During the copy, the ordering of state vector elements can be rearranged so that you can reorder qubits into the desired qubit ordering.

For more information, see the cuStateVec documentation.

Google Cirq/qsim and NVIDIA cuStateVec

The first project to announce integration of the NVIDIA cuStateVec Library was Google’s qsim, an optimized simulator for their quantum computing framework, Cirq. The Google Quantum AI team extended qsim with a new cuStateVec-based GPU simulation backend to complement their CPU and CUDA simulator engines.

Build instructions for Cirq and qsim with cuStateVec

To enable cuStateVec through Cirq, compile qsim from the source and install the bindings for Cirq provided by the qsimcirq Python package.

# Prerequisite:
# Download cuQuantum Beta2 from https://developer.nvidia.com/cuquantum-downloads
# Extract cuQuantum Beta2 archive and set the path to CUQUANTUM_ROOT
$ tar -xf cuquantum-linux-x86_64-0.1.0.30-archive.tar.xz
$ export CUQUANTUM_ROOT=`pwd`/cuquantum-linux-x86_64-0.1.0.30-archive
$ ln -sf $CUQUANTUM_ROOT/lib $CUQUANTUM_ROOT/lib64
# Clone qsim repository from github and checkout v0.11.1 branch
$ git clone https://github.com/quantumlib/qsim.git
$ git checkout v0.11.1
# Build and install qsimcirq with cuStateVec
$ pip install .
# Install cirq
$ pip install cirq

In this example, we run a circuit that creates a Greenberger-Horne-Zeilinger (GHZ) state and samples experimental outcomes. The following Python script gets the amplitudes in |0…00> and |1…11> by calling three different simulators:

Cirq built-in simulator
qsim CPU-based simulator
qsim accelerated with cuStateVec

For the Cirq and qsim CPU-based simulators, we enable two sockets of a 64-core EPYC 7742 CPU. For the cuStateVec-accelerated simulation, we use a single A100 GPU.

import cirq
import qsimcirq
n_qubits = 32
qubits = cirq.LineQubit.range(n_qubits)
circuit = cirq.Circuit()
circuit.append(cirq.H(qubits[0]))
circuit.append(cirq.CNOT(qubits[idx], qubits[idx + 1]) 
    for idx in range(n_qubits - 1))
# Cirqs = cirq.sim.Simulator()
result = s.compute_amplitudes(circuit, [0, 2**n_qubits-1])
print(f'cirq.sim  : {result}')
# qsim(CPU)
options = qsimcirq.QSimOptions(max_fused_gate_size=4, cpu_threads=512)
s = qsimcirq.QSimSimulator(options)
result = s.compute_amplitudes(circuit, [0, 2**n_qubits-1])
print(f'qsim(CPU) : {result}')
# qsim(cuStateVec)
options = qsimcirq.QSimOptions(use_gpu=True, max_fused_gate_size=4, gpu_mode=1)
s = qsimcirq.QSimSimulator(options)
result = s.compute_amplitudes(circuit, [0, 2**n_qubits-1])
print(f'cuStateVec: {result}')

The following console output shows that the CPU version of qsim was 5.1x faster than Cirq’s simulator by optimizations with CPU SIMD instructions and OpenMP. By using cuStateVec version, the simulation is further accelerated, 30.04x faster than Cirq’s simulator and 5.9x faster than qsim’s CPU version.

cirq.sim  : [0.70710677+0.j 0.70710677+0.j], 87.51 s
qsim(CPU) : [(0.7071067690849304+0j), (0.7071067690849304+0j)], 17.04 s
cuStateVec: [(0.7071067690849304+0j), (0.7071067690849304+0j)], 2.88 s

Performance results

Preliminary performance results on gate applications of some popular circuits are shown in the following figures. Simulations are accelerated for all qubit counts. However, as the number of qubits is increased, the simulation becomes significantly accelerated, by a factor of roughly 10-20x for the largest circuits. This performance opens opportunities to explore development and evaluation of larger quantum circuits.

Cirq/qsim + cuStateVec on the A100 versus 64-core CPU

Diagram shows single GPU simulation performance between QFT, Shor's, and Sycamore Supremacy Circuit with Shor's showing the greatest speedup with less number of quibits. — *Figure 1. Simulation performance of popular quantum circuits with cuStateVec on a single NVIDIA A100 GPU, versus Cirq/qsim on a 64-core EPYC 7742 CPU*

VQE speed-up On One NVIDIA A100 relative to 64 CPU cores in EPYC 7742

Bar chart shows the acceleration ratio of VQE circuits between LIH (8 quibits), H20 (10 quibits), CH4 (14 quibits), and C2H4 (22 quibits) with C2H4 showing more than double the speed-up compared to the others. — *Figure 2. Variational Quantum Eigensolver speed-up for several different molecules with cuStateVec on a single NVIDIA A100 GPU, versus Cirq/qsim on a 64-core EPYC 7742 CPU*

Multi-GPU state vector simulation

State vector simulations are also well suited for execution on multiple GPUs. Most gate applications are a perfectly parallel operation and accelerated by splitting the state vector and distributing it on several GPUs.

Beyond approximately 30 qubits, a multi-GPU simulation is inevitable. This is because a state vector is not able to fit in a single GPU’s memory due to its exponential increase in size with additional qubits.

When multiple GPUs work together on a simulation, each GPU can apply a gate to its part of the state vector in parallel. In most cases, each GPU only needs local data for the update of the state vector and each GPU can apply the gate independently.

However, depending on which of the simulated qubits a gate acts on, the GPUs might sometimes require parts of the state vector stored in a different GPU to perform the update. In this case, the GPUs must exchange large parts of the state vector. These parts are typically hundreds of megabytes or several gigabytes in size. Therefore, multi-GPU state vector simulations are sensitive to the bandwidth of the GPU interconnect.

The DGX A100 is a perfect match for these requirements, with eight NVIDIA A100 GPUs providing a GPU-to-GPU direct bandwidth of 600GB/s using NVLink. We chose three common quantum computing algorithms with 30-32 qubits to benchmark Cirq/qsim with cuStateVec on the DGX A100:

Quantum Fourier Transform (QFT)
Shor’s algorithm
Sycamore Supremacy circuit

All benchmarks show good strong-scaling behavior between 4.5–7x speed-up on eight GPUs, compared to a single GPU run.

Diagram shows multi-GPU scaling between what is considered ideal, QFT (32 quibits), Shor's (30 quibits), and Sycamore Supremacy Circuit (32 quibits) with QFT the most comparable to the ideal speedup for the number of GPUs used. — *Figure 3. Multi-GPU scaling of state vector simulations for popular circuits on DGX A100*

Bar chart compares 1 GPU, 2 GPUs, 4 GPUs, and 8 GPUs, between QFT, Shor's, and Sycamore Supremacy Circuit with QFT showing greatest speedup in each GPU category. — *Figure 4. Acceleration ratio of simulations for popular quantum circuits. Performance of GPU simulations was measured on DGX A100 and compared to the performance of two sockets of EPYC 7742*.

In comparison to the simulation time on two 64-core CPUs, the DGX-A100 delivers impressive overall speed-ups between 50–90x.

Summary

The cuStateVec library in the NVIDIA cuQuantum SDK aims to accelerate state vector simulators of quantum circuits on GPUs. Google’s simulator for Cirq qsim is one of the first simulators to adopt the library, benefiting Cirq users with the library’s GPU acceleration for their existing programs. Integrations to more quantum circuit frameworks will follow, including IBM’s Qiskit software.

We are also scaling up. Preliminary results for cuStateVec-based multi-GPU simulations show a 50–90x speedup on key quantum algorithms. We hope that cuStateVec becomes a valuable tool for breaking new ground in quantum computing.

Have feedback and suggestions on how we can improve the cuQuantum libraries? Send an email to cuquantum-feedback@nvidia.com.

Getting Started

The current Beta 2 version of cuQuantum is available for download. Documentation can be found here, and examples are on our GitHub.

Download the cuQuantum Appliance, our container for multi-GPU cuStateVec.

For more information, see the following resources:

GTC 2022

You can learn more about NVIDIA cuQuantum and other advances through GTC sessions and posts:

Misc

Insider’s Guide to GTC: Computer Vision, NLP, Recommenders, and Robotics

Post author By
Post date March 9, 2022
No Comments on Insider’s Guide to GTC: Computer Vision, NLP, Recommenders, and Robotics

Great sessions on custom computer vision models, expressive TTS, localized NLP, scalable recommenders, and commercial and healthcare robotics apps.

Looking for different topic areas? Keep an eye out for our other posts!

Join us at GTC, March 21-24, to explore the latest technology and research across AI, computer vision, data science, robotics, and more!

With over 900 options to choose from, our NVIDIA experts put together some can’t-miss sessions to help get you started:

Computer Vision / Video Analytics

Creating the Future: Creating the World’s Largest Synthetic Object Recognition Dataset for Industry (SORDI)
Jimmy Nassif, CTO, idealworks
Marc Kamradt, Head of TechOffice MUNICH, BMW Group

BMW builds a car every 56 seconds. How do they increase quality? They use robots and complement real data with synthetic. Learn how BMW, Microsoft, and NVIDIA are accelerating production and quality by recognizing parts, obstacles, and people through artificial intelligence-based computer vision.

How To Develop and Optimize Edge AI apps with NVIDIA DeepStream
Carlos Garcia-Sierra, DeepStream Product Manager, NVIDIA
Jitendra Kumar, Senior System Software Engineer, NVIDIA

This talk covers the best practices for developing and optimizing the performance of edge AI applications using DeepStream SDK. Deep dive into a multisensor, multimodel design and learn how to reduce development time and maximize performance using AI at the edge.

AI Models Made Simple with NVIDIA TAO
Chintan Shah, Senior Product Manager, NVIDIA
Akhil Docca, Senior Product Marketing Manager, NVIDIA

A primary challenge confronting enterprises is the demand for creating AI models far outpaces the number of data scientists available. Developers need to easily customize models and bring their AI to market faster. This session will demonstrate the power and ease of NVIDIA TAO that solves this problem. Get a preview at GTC for the new capabilities of TAO Toolkit, including Bring Your Own Model Weights, Rest APIs, TensorBoard visualization, new pretrained models, and more.

Conversational AI / NLP

Conversational AI Demystified
Sirisha Rella, Product Marketing Manager, NVIDIA

It’s easier than ever to develop AI speech applications like virtual assistants and real-time transcription. Today’s advanced tools and technologies make it easy to fine-tune and build scalable, responsive applications. This popular session shows users how to build and deploy their first end-to-end conversational AI pipeline using NVIDIA Riva, as an example.

Expressive Neural Text-to-Speech
Andrew Breen, Senior Manager Text-to-Speech Research, Amazon

Text-to-speech (TTS) research expert Andrew Breen will give a high-level overview of recent developments in neural TTS, including adopted approaches, technical challenges, and future direction. Breen was awarded the IEE J. Langham Thomson premium in 1993, and has received business awards from BT, MCI, and Nuance. He invented the Laureate TTS system at BT Labs and founded Nuance’s TTS organization.

Building Large-scale, Localized Language Models: From Data Preparation to Training and Deployment to Production
Miguel Martinez, Senior Deep Learning Solution Architect, NVIDIA
Meriem Bendris, Senior Deep Learning Data Scientist, NVIDIA

Natural Language Processing (NLP) breakthroughs in large-scale language models have boosted the capability to solve problems with zero-shot translation and supervised fine-tuning. However, executing NLP models on localized languages remains limited due to data preparation, training, and deployment challenges. This session highlights scaling challenges and solutions to show how to optimize NLP models using NVIDIA NeMo Megatron—a framework for training large NLP models in other languages.

Recommenders / Personalization

Building and Deploying Recommender Systems Quickly and Easily with NVIDIA Merlin
Even Oldridge, Senior Manager, Merlin Recommender Systems Team, NVIDIA

Merlin expert and Twitter influencer Even Oldridge will demonstrate how to optimize recommendation models for maximum performance and scale. Olrdige is a Twitter influencer and has 8 years of recommender system experience, along with a PhD in computer vision.

Building AI-based Recommender System Leveraging the Power of Deep Learning and GPU
Khalifeh AlJadda, Senior Director of Data Science, The Home Depot

Tackle AI-based recommendation system challenges and uncover best practices for delivering personalized experiences that differentiate you from competitors. Hear from Khalifeh AlJadda, an expert in implementing large-scale, distributed, machine-learning algorithms in search and recommendation engines. AlJadda leads the Recommendation Data Science, Search Data Science, and Visual AI teams at The Home Depot. With a PhD in computer science, he previously led the design and implementation of CareerBuilder’s language-agnostic semantic search engine.

Multi-Objective Optimization to Boost Exploration in Recommender Systems
Serdar Kadioglu, Vice President AI | Adjunct Assistant Professor, Fidelity Investments | Brown University

How can one use combinatorial optimization to formalize item universe selection in new applications with limited or no datasets? Serdar Kadioglu, will provide insights on how to apply techniques like unsupervised clustering and latent text embeddings to create a multilevel framework for your business. Kadioglu previously led the Advanced Constraint Technology R&D team at Oracle and worked at Adobe. As an adjunct professor at Brown University for computer science, Kadioglu’s algorithmic research is at the intersection of AI and discrete optimization with an interest in building robust and scalable products.

Robotics

Delivering AI Robotics at Scale: A Behind-the-Scenes Look
Mostafa Rohaninejad, Founding Researcher, Covariant.ai

Bringing practical AI robotics into the physical world, such as on a factory floor, is hard. Covariant is working to solve this problem. Mostafa Rohaninejad is part of the core team that built the full AI stack at Covariant from the ground up. In his session, he will share both the technical challenges and the exciting commercial possibilities of AI Robotics.

Leveraging Embedded Computing to Unlock Autonomy in Human Environments
Andrea Thomaz, Co-Founder and CEO, Diligent Robotics

During the COVID-19 pandemic, hospitals faced high nurse turnover, record burnout, and crisis-level labor shortages. Hospitals must alleviate this staffing crisis. Enter Diligent Robotics, and their robot Moxi, which completes routine tasks to assist nursing staff. Andrea Thomaz will share the unique challenges in achieving robot autonomy in a busy hospital, like maneuvering around objects or navigating to a patient room, all while integrating multiple camera streams that feed into embedded GPUs.

Building Autonomy Off-Road from the Ground Up with Jetson
Nick Peretti, CV/ML Engineer, Scythe Robotics

Autonomy is critical when it comes to outdoor and off-road robotics, but environmental and task-specific demands require a different approach than indoor or on-road environments. Nick Peretti will share the NVIDIA Jetson-centered approach that Scythe Robotics uses to run its full sense-plan-act software suite with their autonomous commercial mowers. He will highlight tools and approaches that have enabled Scythe to move quickly to field units, and lessons learned along the way.

Offsites

Optimizing Airline Tail Assignments for Cleaner Skies

Post author By
Post date March 9, 2022
No Comments on Optimizing Airline Tail Assignments for Cleaner Skies

Posted by Emily Masten, Software Engineer, Google Research, Operations Research Team

Airlines around the world are exploring several tactics to meet aggressive CO₂ commitments set by the International Civil Aviation Organization (ICAO). This effort has been emphasized in Europe, where aviation accounts for 13.9% of the transportation industry’s carbon emissions. The largest push comes from the European Green Deal, which aims to decrease carbon emissions from transportation by 90% by 2051. The Lufthansa Group has gone even further, committing to a 50% reduction in emissions compared to 2019 by the year 2030 and to reach net-zero emissions by 2050.

One unexpected approach that airlines can use to lower carbon emissions is through optimizing their tail assignment, i.e., how to assign aircraft (identified by the aircraft registration painted on their tails) to legs in a way that minimizes the total operating cost, of which fuel is a major contributor. More fuel needed to operate the aircraft means higher operating costs and more carbon ejected into the atmosphere. For example, a typical long-haul flight (longer than ~4,100km or ~2,500mi) emits about a ton of CO₂.

The amount of fuel needed to fly between origin and destination can vary widely — e.g., larger aircraft weigh more and therefore require more fuel, while modern and younger aircraft tend to be more fuel-efficient because they use newer technology. The mass of the fuel itself is also significant. Aircraft are less fuel-efficient early in their flights when their fuel tanks are full than later when the volume of fuel is reduced. Another important factor for the tail assignment is the number of passengers on board; as the number of bookings changes, a smaller or larger aircraft might be required. Other factors can affect fuel consumption, both negative (e.g., headwinds or the age of the engines) or positive (e.g., tailwinds, sharklets, skin).

During the past year, Google’s Operations Research team has been working with the Lufthansa Group to optimize their tail assignment to reduce carbon emissions and the cost of operating their flights. As part of this collaboration, we developed and launched a mathematical tail assignment solver that has been fully integrated to optimize the fleet schedule for SWISS International Air Lines (a Lufthansa Group subsidiary), which we estimate will result in significant reductions in carbon emissions. This solver is the first step of a multi-phase project that started at SWISS.

A Mathematical Model for Tail Assignment
We structure the task of tail assignment optimization as a network flow problem, which is essentially a directed graph characterized by a set of nodes and a set of arcs, with additional constraints related to the problem at hand. Nodes may have either a supply or a demand for a commodity, while arcs have a flow capacity and a cost per unit of flow. The goal is to determine flows for every arc that minimize the total flow cost of each commodity, while maintaining flow balance in the network.

We decided to use a flow network because it is the most common way of modeling this problem in literature, and the commodities, arcs, and nodes of the flow network have a simple one-to-one correspondence to tails, legs, and airports in the real-life problem. In this case, the arcs of the network correspond to each leg of the flight schedule, and each individual tail is a single instance of a commodity that “flows” along the network. Each leg and tail pair in the network has an associated assignment cost, and the model’s objective is to pick valid leg and tail pairs such that these assignment costs are minimized.

A simple example of the tail assignment problem. There are four legs in this schedule and four possible tails that one can assign to those legs. Each tail and leg pair has an associated operational cost. For example, for Leg 1, it costs $50 to assign Tail 1 to it but $100 to assign Tail 2. The optimal solution, with the minimum cost, is to assign Tail 4 to Legs 3 and 2 and Tail 1 to Legs 1 and 4.

Aside from the standard network flow constraints, the model takes into account additional airline-specific constraints so that the solution is tailored to Lufthansa Group airlines. For example, aircraft turnaround times — i.e., the amount of time an aircraft spends on the ground between two consecutive flights — are airline-specific and can vary for a number of reasons. Catering might be loaded at an airline’s hub, reducing the turnaround time needed at outstations, or a route could have a higher volume of vacation travelers who often take longer to board and disembark than business travelers. Another constraint is that each aircraft must be on the ground for a nightly check at a specified airport’s maintenance hub to receive mandated maintenance work or cleaning. Furthermore, each airline has their own maintenance schedule, which can require aircraft to undergo routine maintenance checks every few nights, in part to help maintain the aircraft’s fuel efficiency.

Preliminary Results & Next Steps
After using our solver to optimize their fleet schedule in Europe, SWISS Airlines estimates an annual savings of over 3.5 million Swiss Francs and a 6500 ton reduction in CO₂ emitted. We expect these savings will multiply when the model is rolled out to the rest of the airlines in the Lufthansa Group and again when traffic returns to pre-COVID levels. Future work will include ensuring this model is usable with larger sets of data, and adding crew and passenger assignment to the optimization system to improve the flight schedules for both passengers and flight crew.

If you are interested in experimenting with your own network flow models, check out OR-Tools, our open source software suite that can be used to build optimization solutions similar to the solver presented in this post. Refer to OR-Tools related documentation for more information.

Acknowledgements
Thanks to Jon Orwant for collaborating extensively on this blog post and for establishing the partnership with Lufthansa and SWISS, along with Alejandra Estanislao. Thanks to the Operations Research Team and to the folks at SWISS, this work could not be possible without their hard work and contributions.

Misc

Do I Need to Update My Data Center Network?

Post author By
Post date March 9, 2022
No Comments on Do I Need to Update My Data Center Network?

Is your network getting long in the tooth and are you thinking about an upgrade? This blog will cover three areas to consider when updating your data center network.

Normally, data center networks are updated when new applications or servers are installed in the infrastructure. But independent of new server and application infrastructure forcing an update, there are other areas to consider. Three questions to ask when assessing if you need to update your network are:

How are server speeds dictating network design?
Are your network features out of date?
Is your operational workflow inefficient?

How are server speeds dictating networking design?

Network device selection typically starts with understanding how the server Network Interface Cards (NICs) are configured. In the past, server NICs at 10 gigabits/second (10G) were considered the norm. But over the past 5 years we’ve seen a real growth in server computing power. In the accelerated computing world we tend to see 25 to 100G network speeds as the new norm for servers, with the latest servers able to use even 200G NICs.

With higher NIC speeds, the top-of-rack (leaf) switch needs to be upgraded. Failure to update your legacy core (spine) switches will cause oversubscription ratios to move unfavorably, introducing excess congestion and unpredictable latency. If you’re upgrading the leaf switches, you’ll need to upgrade the spine switches as well. Maintaining the same oversubscription ratio should be the goal.

Are your network features out of date?

In addition to the hardware, it may be time to upgrade your network operating system (NOS), especially if you’re using legacy network features and protocols that are hindering your network through inefficient management and packet forwarding. Legacy networks were often built using layer 2 infrastructure everywhere. The entire network would be on a single broadcast domain, and solutions like MLAG would provide one layer of redundancy for every link.

Updating the network to leverage solutions like Layer 3 to the host (also known as host based networking) or VXLAN+EVPN overlays, helps alleviate issues caused by inefficient broadcast domains. Host based networking and overlay networks enable better traffic management, easier provisioning, and more granular security. Anytime is a good time to update your network if it’s relying upon older technologies that aren’t modernized.

Is your operational workflow inefficient?

Beyond the infrastructure hardware and software, an area of optimization that benefits from modern updates is the operational workflow. Classic networks tend to have network admins log in to each switch, router or firewall individually. Configurations are applied unique to the node, and are backed up using a plain text file and stored locally on the network admin’s computer. These workflows tend to be error prone and can lead to typos or inefficiencies such as lost backups and slow maintenance windows. Configuration errors can open security gaps that a cyber adversary could exploit.

Modern networks leverage more advanced tooling and technologies that solve most of those problems.

Updating your networking can add benefits, including:

Infrastructure as code, so your configurations are centralized.
Automation, which allows managing and updating multiple nodes at the same time.
Continuous integration that systematically validates configuration and design prior to deployment.
Network simulation through digital twins. This helps predict how the network will behave and tie together all the elements from network DevOps to automation to continuous integration.

These optimized workflows reduce maintenance window times and errors in deployments, reducing the risk of security gaps while saving time and effort.

Conclusion

There are many reasons to update a network. It’s important to take stock of your network’s hardware, software and operational efficiencies and look at the big picture. With this information you can determine if updating your network will result in faster throughput, more productivity, and lower ownership costs.

Misc

Layer "triplet_snn" expects 3 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor: shape=(1, 224, 224, 3)

Post author By
Post date March 9, 2022
No Comments on Layer "triplet_snn" expects 3 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor: shape=(1, 224, 224, 3)

I just need a little help regarding my project. I’ve already written a code but I am facing an error in that. I am using a few-shot learning technique, triplet neural network. The triplet neural network(TNN) is a horizontal concatenation triplet consisting of three identical Convolutional Neural Networks (with common parameters) that are trained with triplets of inputs. An anchor instance, a positive instance (of the same class as the anchor), and a negative instance make up the input triplet (different class from the anchor). After that, the network is trained to learn a triplet loss embedding function. To compute triplet loss, three training examples are required. Each triplet is formed by intentionally selecting training examples such that each triplet has: • a reference image called anchor image • an image having the same label as anchor is called a positive image • an image has a different label than the anchor called a negative image.

The TNN learns to create k-dimensional feature vector representation of images in such a manner that similar images lie closer in an embedding space of k-dimensions.

embedding_dimension = 128

from tensorflow.keras.applications.vgg16 import VGG16

pre_trained_vgg16 = tf.keras.applications.VGG16( input_shape=(size, size,3), include_top=False, weights=”imagenet”, input_tensor=None, pooling=None, classes=1000, classifier_activation=None )

pre_trained_vgg16.save(‘vgg16.h5’)

pre_trained_vgg16 = tf.keras.models.load_model(‘vgg16.h5’)

def build_embedding_network(embedding_dimension):

embedding_network = pre_trained_vgg16

embedding_network.trainable = False

x = embedding_network.output

x = tf.keras.layers.GlobalAveragePooling2D()(x)

x = tf.keras.layers.Flatten()(x)

x = tf.keras.layers.Dropout(0.2)(x)

x = tf.keras.layers.Dense(2*embedding_dimension,
activation=’sigmoid'(x)
x = tf.keras.layers.Dense(embedding_dimension, activation=’sigmoid’)
(x)
embedding_network = tf.keras.Model(embedding_network.input, x,
name=”embedding_network”)

return embedding_network

def build_metric_network(single_embedding_dim):

input1 = tf.keras.layers.Input((single_embedding_dim), name=”input1″)
input2 = tf.keras.layers.Input((single_embedding_dim), name=”input2″)
embedded_distance =
tf.keras.layers.Subtract(name=’subtract_embeddings’) ([input1, input2])

embedded_distance = tf.keras.layers.Lambda(lambda
x:K.sqrt(K.sum(K.square(x), axis=-1, keepdims=True)),
name=’euclidean_distance’)(embedded_distance)

metric_network = tf.keras.Model(inputs=[input1, input2],
outputs=[embedded_distance],
name=”metric_network”)

return metric_network

class TripletLossLayer(tf.keras.layers.Layer):

def __init__(self, margin, **kwargs):

self.margin = margin

super(TripletLossLayer, self).__init__(**kwargs)

def triplet_loss(self, inputs):

ap_dist, an_dist = inputs

square

ap_dist2 = K.square(ap_dist)

an_dist2 = K.square(an_dist)

return K.sum(K.maximum(ap_dist2 – an_dist2 + self.margin, 0))

def call(self, inputs):

loss = self.triplet_loss(inputs)

self.add_loss(loss)

return loss

def get_config(self):

config = super().get_config().copy()

config.update({‘margin’: self.margin})

return config

def build_triplet_snn(input_shape, embedding_network, metric_network, margin=0.1):

Define the tensors for the three input images

anchor_input = tf.keras.layers.Input(input_shape, name=”anchor_input”)
positive_input = tf.keras.layers.Input(input_shape,
name=”positive_input”)

negative_input = tf.keras.layers.Input(input_shape,
name=”negative_input”)

Generate the embeddings (feature vectors) for the three images

embedding_a = embedding_network(anchor_input)

embedding_p = embedding_network(positive_input)

embedding_n = embedding_network(negative_input)

ap_dist = metric_network([embedding_a,embedding_p])

an_dist = metric_network([embedding_a,embedding_n])

Triplet loss layer

loss_layer = TripletLossLayer(margin=margin, name=’TripletLossLayer’)([ap_dist, an_dist])

Compute the concatenated pairs

all_concatenated = tf.keras.layers.Concatenate(axis=-1,name=”All-Embeddings”)([embedding_a,embedding_p,embedding_n])

Connect the inputs with the outputs

triplet_snn = tf.keras.Model(inputs=[anchor_input, positive_input,
negative_input],outputs=[loss_layer, all_concatenated],
name=”triplet_snn”)

Return the model

return triplet_snn

embedding_network = build_embedding_network(embedding_dimension) metric_network = build_metric_network(embedding_dimension)

triplet_snn = build_triplet_snn(input_shape=(size, size,3), embedding_network=embedding_network, metric_network=metric_network,margin=0.1)

learning_rate = 0.0001

epochs = 5

class TripletDataGenerator(tf.keras.utils.Sequence):

def __init__(self, triplet_dataset, shuffle=False):
self.triplet_dataset = triplet_dataset self.shuffle = shuffle
self.on_epoch_end() def __len__(self):

return len(self.triplet_dataset)

def __getitem__(self, index):

return triplet_dataset[index][0]

return (np.array(triplet_dataset[index][0]).reshape(1,224,224,3))

def on_epoch_end(self):

if self.shuffle == True:
random.shuffle(self.triplet_dataset)

data_gen = TripletDataGenerator(triplet_dataset)

filepath = ‘C:\Users\Y540\Desktop\Retinal Disease\TrainedSNN\temp\weights.{epoch}’

save_model_weights_at_every_epoch = tf.keras.callbacks.ModelCheckpoint( filepath,monitor=”loss”,verbose=1,save_best_only=False,save_weights_only=True,mode=”auto”,save_freq=”epoch” )

optimizer = tf.keras.optimizers.Adam(lr=learning_rate) triplet_snn.compile(loss=None, optimizer=optimizer, run_eagerly=True)

%%time

history = triplet_snn.fit(data_gen, epochs=epochs, verbose=1, callbacks=[save_model_weights_at_every_epoch])

submitted by /u/Intelligent_Term6689
[visit reddit] [comments]