Categories
Misc

Insider’s Guide to GTC: Computer Vision, NLP, Recommenders, and Robotics

Great sessions on custom computer vision models, expressive TTS, localized NLP, scalable recommenders, and commercial and healthcare robotics apps.

Looking for different topic areas? Keep an eye out for our other posts!

Join us at GTC, March 21-24, to explore the latest technology and research across AI, computer vision, data science, robotics, and more! 

With over 900 options to choose from, our NVIDIA experts put together some can’t-miss sessions to help get you started: 


Computer Vision / Video Analytics

Creating the Future: Creating the World’s Largest Synthetic Object Recognition Dataset for Industry (SORDI)
Jimmy Nassif, CTO, idealworks
Marc Kamradt, Head of TechOffice MUNICH, BMW Group

BMW builds a car every 56 seconds. How do they increase quality? They use robots and complement real data with synthetic. Learn how BMW, Microsoft, and NVIDIA are accelerating production and quality by recognizing parts, obstacles, and people through artificial intelligence-based computer vision.   

How To Develop and Optimize Edge AI apps with NVIDIA DeepStream
Carlos Garcia-Sierra, DeepStream Product Manager, NVIDIA
Jitendra Kumar, Senior System Software Engineer, NVIDIA

This talk covers the best practices for developing and optimizing the performance of edge AI applications using DeepStream SDK. Deep dive into a multisensor, multimodel design and learn how to reduce development time and maximize performance using AI at the edge.

AI Models Made Simple with NVIDIA TAO
Chintan Shah, Senior Product Manager, NVIDIA
Akhil Docca, Senior Product Marketing Manager, NVIDIA

A primary challenge confronting enterprises is the demand for creating AI models far outpaces the number of data scientists available. Developers need to easily customize models and bring their AI to market faster. This session will demonstrate the power and ease of NVIDIA TAO that solves this problem. Get a preview at GTC for the new capabilities of TAO Toolkit, including Bring Your Own Model Weights, Rest APIs, TensorBoard visualization, new pretrained models, and more. 


Conversational AI / NLP

Conversational AI Demystified
Sirisha Rella, Product Marketing Manager, NVIDIA

It’s easier than ever to develop AI speech applications like virtual assistants and real-time transcription. Today’s advanced tools and technologies make it easy to fine-tune and build scalable, responsive applications. This popular session shows users how to build and deploy their first end-to-end conversational AI pipeline using NVIDIA Riva, as an example. 

Expressive Neural Text-to-Speech
Andrew Breen, Senior Manager Text-to-Speech Research, Amazon

Text-to-speech (TTS) research expert Andrew Breen will give a high-level overview of recent developments in neural TTS, including adopted approaches, technical challenges, and future direction. Breen was awarded the IEE J. Langham Thomson premium in 1993, and has received business awards from BT, MCI, and Nuance. He invented the Laureate TTS system at BT Labs and founded Nuance’s TTS organization.

Building Large-scale, Localized Language Models: From Data Preparation to Training and Deployment to Production
Miguel Martinez, Senior Deep Learning Solution Architect, NVIDIA
Meriem Bendris, Senior Deep Learning Data Scientist, NVIDIA

Natural Language Processing (NLP) breakthroughs in large-scale language models have boosted the capability to solve problems with zero-shot translation and supervised fine-tuning. However, executing NLP models on localized languages remains limited due to data preparation, training, and deployment challenges. This session highlights scaling challenges and solutions to show how to optimize NLP models using NVIDIA NeMo Megatron—a framework for training large NLP models in other languages.


Recommenders / Personalization

Building and Deploying Recommender Systems Quickly and Easily with NVIDIA Merlin
Even Oldridge, Senior Manager, Merlin Recommender Systems Team, NVIDIA

Merlin expert and Twitter influencer Even Oldridge will demonstrate how to optimize recommendation models for maximum performance and scale. Olrdige is a Twitter influencer and has 8 years of recommender system experience, along with a PhD in computer vision.

Building AI-based Recommender System Leveraging the Power of Deep Learning and GPU
Khalifeh AlJadda, Senior Director of Data Science, The Home Depot

Tackle AI-based recommendation system challenges and uncover best practices for delivering personalized experiences that differentiate you from competitors. Hear from Khalifeh AlJadda, an expert in implementing large-scale, distributed, machine-learning algorithms in search and recommendation engines. AlJadda leads the Recommendation Data Science, Search Data Science, and Visual AI teams at The Home Depot. With a PhD in computer science, he previously led the design and implementation of CareerBuilder’s language-agnostic semantic search engine.

Multi-Objective Optimization to Boost Exploration in Recommender Systems
Serdar Kadioglu, Vice President AI | Adjunct Assistant Professor, Fidelity Investments | Brown University

How can one use combinatorial optimization to formalize item universe selection in new applications with limited or no datasets? Serdar Kadioglu, will provide insights on how to apply techniques like unsupervised clustering and latent text embeddings to create a multilevel framework for your business. Kadioglu previously led the Advanced Constraint Technology R&D team at Oracle and worked at Adobe. As an adjunct professor at Brown University for computer science, Kadioglu’s algorithmic research is at the intersection of AI and discrete optimization with an interest in building robust and scalable products.


Robotics

Delivering AI Robotics at Scale: A Behind-the-Scenes Look
Mostafa Rohaninejad, Founding Researcher, Covariant.ai

Bringing practical AI robotics into the physical world, such as on a factory floor, is hard. Covariant is working to solve this problem. Mostafa Rohaninejad is part of the core team that built the full AI stack at Covariant from the ground up. In his session, he will share both the technical challenges and the exciting commercial possibilities of AI Robotics.

Leveraging Embedded Computing to Unlock Autonomy in Human Environments
Andrea Thomaz, Co-Founder and CEO, Diligent Robotics

During the COVID-19 pandemic, hospitals faced high nurse turnover, record burnout, and crisis-level labor shortages. Hospitals must alleviate this staffing crisis. Enter Diligent Robotics, and their robot Moxi, which completes routine tasks to assist nursing staff. Andrea Thomaz will share the unique challenges in achieving robot autonomy in a busy hospital, like maneuvering around objects or navigating to a patient room, all while integrating multiple camera streams that feed into embedded GPUs.

Building Autonomy Off-Road from the Ground Up with Jetson
Nick Peretti, CV/ML Engineer, Scythe Robotics

Autonomy is critical when it comes to outdoor and off-road robotics, but environmental and task-specific demands require a different approach than indoor or on-road environments. Nick Peretti will share the NVIDIA Jetson-centered approach that Scythe Robotics uses to run its full sense-plan-act software suite with their autonomous commercial mowers. He will highlight tools and approaches that have enabled Scythe to move quickly to field units, and lessons learned along the way.

Categories
Offsites

Optimizing Airline Tail Assignments for Cleaner Skies

Airlines around the world are exploring several tactics to meet aggressive CO2 commitments set by the International Civil Aviation Organization (ICAO). This effort has been emphasized in Europe, where aviation accounts for 13.9% of the transportation industry’s carbon emissions. The largest push comes from the European Green Deal, which aims to decrease carbon emissions from transportation by 90% by 2051. The Lufthansa Group has gone even further, committing to a 50% reduction in emissions compared to 2019 by the year 2030 and to reach net-zero emissions by 2050.

One unexpected approach that airlines can use to lower carbon emissions is through optimizing their tail assignment, i.e., how to assign aircraft (identified by the aircraft registration painted on their tails) to legs in a way that minimizes the total operating cost, of which fuel is a major contributor. More fuel needed to operate the aircraft means higher operating costs and more carbon ejected into the atmosphere. For example, a typical long-haul flight (longer than ~4,100km or ~2,500mi) emits about a ton of CO2.

The amount of fuel needed to fly between origin and destination can vary widely — e.g., larger aircraft weigh more and therefore require more fuel, while modern and younger aircraft tend to be more fuel-efficient because they use newer technology. The mass of the fuel itself is also significant. Aircraft are less fuel-efficient early in their flights when their fuel tanks are full than later when the volume of fuel is reduced. Another important factor for the tail assignment is the number of passengers on board; as the number of bookings changes, a smaller or larger aircraft might be required. Other factors can affect fuel consumption, both negative (e.g., headwinds or the age of the engines) or positive (e.g., tailwinds, sharklets, skin).

During the past year, Google’s Operations Research team has been working with the Lufthansa Group to optimize their tail assignment to reduce carbon emissions and the cost of operating their flights. As part of this collaboration, we developed and launched a mathematical tail assignment solver that has been fully integrated to optimize the fleet schedule for SWISS International Air Lines (a Lufthansa Group subsidiary), which we estimate will result in significant reductions in carbon emissions. This solver is the first step of a multi-phase project that started at SWISS.

A Mathematical Model for Tail Assignment
We structure the task of tail assignment optimization as a network flow problem, which is essentially a directed graph characterized by a set of nodes and a set of arcs, with additional constraints related to the problem at hand. Nodes may have either a supply or a demand for a commodity, while arcs have a flow capacity and a cost per unit of flow. The goal is to determine flows for every arc that minimize the total flow cost of each commodity, while maintaining flow balance in the network.

We decided to use a flow network because it is the most common way of modeling this problem in literature, and the commodities, arcs, and nodes of the flow network have a simple one-to-one correspondence to tails, legs, and airports in the real-life problem. In this case, the arcs of the network correspond to each leg of the flight schedule, and each individual tail is a single instance of a commodity that “flows” along the network. Each leg and tail pair in the network has an associated assignment cost, and the model’s objective is to pick valid leg and tail pairs such that these assignment costs are minimized.

A simple example of the tail assignment problem. There are four legs in this schedule and four possible tails that one can assign to those legs. Each tail and leg pair has an associated operational cost. For example, for Leg 1, it costs $50 to assign Tail 1 to it but $100 to assign Tail 2. The optimal solution, with the minimum cost, is to assign Tail 4 to Legs 3 and 2 and Tail 1 to Legs 1 and 4.

Aside from the standard network flow constraints, the model takes into account additional airline-specific constraints so that the solution is tailored to Lufthansa Group airlines. For example, aircraft turnaround times — i.e., the amount of time an aircraft spends on the ground between two consecutive flights — are airline-specific and can vary for a number of reasons. Catering might be loaded at an airline’s hub, reducing the turnaround time needed at outstations, or a route could have a higher volume of vacation travelers who often take longer to board and disembark than business travelers. Another constraint is that each aircraft must be on the ground for a nightly check at a specified airport’s maintenance hub to receive mandated maintenance work or cleaning. Furthermore, each airline has their own maintenance schedule, which can require aircraft to undergo routine maintenance checks every few nights, in part to help maintain the aircraft’s fuel efficiency.

Preliminary Results & Next Steps
After using our solver to optimize their fleet schedule in Europe, SWISS Airlines estimates an annual savings of over 3.5 million Swiss Francs and a 6500 ton reduction in CO2 emitted. We expect these savings will multiply when the model is rolled out to the rest of the airlines in the Lufthansa Group and again when traffic returns to pre-COVID levels. Future work will include ensuring this model is usable with larger sets of data, and adding crew and passenger assignment to the optimization system to improve the flight schedules for both passengers and flight crew.

If you are interested in experimenting with your own network flow models, check out OR-Tools, our open source software suite that can be used to build optimization solutions similar to the solver presented in this post. Refer to OR-Tools related documentation for more information.

Acknowledgements
Thanks to Jon Orwant for collaborating extensively on this blog post and for establishing the partnership with Lufthansa and SWISS, along with Alejandra Estanislao. Thanks to the Operations Research Team and to the folks at SWISS, this work could not be possible without their hard work and contributions.

Categories
Misc

Do I Need to Update My Data Center Network?

Is your network getting long in the tooth and are you thinking about an upgrade? This blog will cover three areas to consider when updating your data center network.

Normally, data center networks are updated when new applications or servers are installed in the infrastructure. But independent of new server and application infrastructure forcing an update, there are other areas to consider. Three questions to ask when assessing if you need to update your network are:

  • How are server speeds dictating network design?
  • Are your network features out of date?
  • Is your operational workflow inefficient?

How are server speeds dictating networking design?

Network device selection typically starts with understanding how the server Network Interface Cards (NICs) are configured. In the past, server NICs at 10 gigabits/second (10G) were considered the norm. But over the past 5 years we’ve seen a real growth in server computing power. In the accelerated computing world we tend to see 25 to 100G network speeds as the new norm for servers, with the latest servers able to use even 200G NICs.

With higher NIC speeds, the top-of-rack (leaf) switch needs to be upgraded. Failure to update your legacy core (spine) switches will cause oversubscription ratios to move unfavorably, introducing excess congestion and unpredictable latency. If you’re upgrading the leaf switches, you’ll need to upgrade the spine switches as well. Maintaining the same oversubscription ratio should be the goal.

Are your network features out of date?

In addition to the hardware, it may be time to upgrade your network operating system (NOS), especially if you’re using legacy network features and protocols that are hindering your network through inefficient management and packet forwarding. Legacy networks were often built using layer 2 infrastructure everywhere. The entire network would be on a single broadcast domain, and solutions like MLAG would provide one layer of redundancy for every link.

Updating the network to leverage solutions like Layer 3 to the host (also known as host based networking) or VXLAN+EVPN overlays, helps alleviate issues caused by inefficient broadcast domains. Host based networking and overlay networks enable better traffic management, easier provisioning, and more granular security. Anytime is a good time to update your network if it’s relying upon older technologies that aren’t modernized.

Is your operational workflow inefficient?

Beyond the infrastructure hardware and software, an area of optimization that benefits from modern updates is the operational workflow. Classic networks tend to have network admins log in to each switch, router or firewall individually. Configurations are applied unique to the node, and are backed up using a plain text file and stored locally on the network admin’s computer. These workflows tend to be error prone and can lead to typos or inefficiencies such as lost backups and slow maintenance windows. Configuration errors can open security gaps that a cyber adversary could exploit.

Modern networks leverage more advanced tooling and technologies that solve most of those problems.

Updating your networking can add benefits, including: 

  • Infrastructure as code, so your configurations are centralized. 
  • Automation, which allows managing and updating multiple nodes at the same time. 
  • Continuous integration that systematically validates configuration and design prior to deployment. 
  • Network simulation through digital twins. This helps predict how the network will behave and tie together all the elements from network DevOps to automation to continuous integration.

These optimized workflows reduce maintenance window times and errors in deployments, reducing the risk of security gaps while saving time and effort.

Conclusion

There are many reasons to update a network. It’s important to take stock of your network’s hardware, software and operational efficiencies and look at the big picture. With this information you can determine if updating your network will result in faster throughput, more productivity, and lower ownership costs.

Categories
Misc

Layer "triplet_snn" expects 3 input(s), but it received 1 input tensors. Inputs received: [<tf.Tensor: shape=(1, 224, 224, 3)

I just need a little help regarding my project. I’ve already written a code but I am facing an error in that. I am using a few-shot learning technique, triplet neural network. The triplet neural network(TNN) is a horizontal concatenation triplet consisting of three identical Convolutional Neural Networks (with common parameters) that are trained with triplets of inputs. An anchor instance, a positive instance (of the same class as the anchor), and a negative instance make up the input triplet (different class from the anchor). After that, the network is trained to learn a triplet loss embedding function. To compute triplet loss, three training examples are required. Each triplet is formed by intentionally selecting training examples such that each triplet has: • a reference image called anchor image • an image having the same label as anchor is called a positive image • an image has a different label than the anchor called a negative image.

The TNN learns to create k-dimensional feature vector representation of images in such a manner that similar images lie closer in an embedding space of k-dimensions.

embedding_dimension = 128

from tensorflow.keras.applications.vgg16 import VGG16

pre_trained_vgg16 = tf.keras.applications.VGG16( input_shape=(size, size,3), include_top=False, weights=”imagenet”, input_tensor=None, pooling=None, classes=1000, classifier_activation=None )

pre_trained_vgg16.save(‘vgg16.h5’)

pre_trained_vgg16 = tf.keras.models.load_model(‘vgg16.h5’)

def build_embedding_network(embedding_dimension):

embedding_network = pre_trained_vgg16

embedding_network.trainable = False

x = embedding_network.output

x = tf.keras.layers.GlobalAveragePooling2D()(x)

x = tf.keras.layers.Flatten()(x)

x = tf.keras.layers.Dropout(0.2)(x)

x = tf.keras.layers.Dense(2*embedding_dimension,
activation=’sigmoid'(x)
x = tf.keras.layers.Dense(embedding_dimension, activation=’sigmoid’)
(x)
embedding_network = tf.keras.Model(embedding_network.input, x,
name=”embedding_network”)

return embedding_network

def build_metric_network(single_embedding_dim):

input1 = tf.keras.layers.Input((single_embedding_dim), name=”input1″)
input2 = tf.keras.layers.Input((single_embedding_dim), name=”input2″)
embedded_distance =
tf.keras.layers.Subtract(name=’subtract_embeddings’) ([input1, input2])

embedded_distance = tf.keras.layers.Lambda(lambda
x:K.sqrt(K.sum(K.square(x), axis=-1, keepdims=True)),
name=’euclidean_distance’)(embedded_distance)

metric_network = tf.keras.Model(inputs=[input1, input2],
outputs=[embedded_distance],
name=”metric_network”)

return metric_network

class TripletLossLayer(tf.keras.layers.Layer):

def __init__(self, margin, **kwargs):

self.margin = margin

super(TripletLossLayer, self).__init__(**kwargs)

def triplet_loss(self, inputs):

ap_dist, an_dist = inputs

square

ap_dist2 = K.square(ap_dist)

an_dist2 = K.square(an_dist)

return K.sum(K.maximum(ap_dist2 – an_dist2 + self.margin, 0))

def call(self, inputs):

loss = self.triplet_loss(inputs)

self.add_loss(loss)

return loss

def get_config(self):

config = super().get_config().copy()

config.update({‘margin’: self.margin})

return config

def build_triplet_snn(input_shape, embedding_network, metric_network, margin=0.1):

Define the tensors for the three input images

anchor_input = tf.keras.layers.Input(input_shape, name=”anchor_input”)
positive_input = tf.keras.layers.Input(input_shape,
name=”positive_input”)

negative_input = tf.keras.layers.Input(input_shape,
name=”negative_input”)

Generate the embeddings (feature vectors) for the three images

embedding_a = embedding_network(anchor_input)

embedding_p = embedding_network(positive_input)

embedding_n = embedding_network(negative_input)

ap_dist = metric_network([embedding_a,embedding_p])

an_dist = metric_network([embedding_a,embedding_n])

Triplet loss layer

loss_layer = TripletLossLayer(margin=margin, name=’TripletLossLayer’)([ap_dist, an_dist])

Compute the concatenated pairs

all_concatenated = tf.keras.layers.Concatenate(axis=-1,name=”All-Embeddings”)([embedding_a,embedding_p,embedding_n])

Connect the inputs with the outputs

triplet_snn = tf.keras.Model(inputs=[anchor_input, positive_input,
negative_input],outputs=[loss_layer, all_concatenated],
name=”triplet_snn”)

Return the model

return triplet_snn

embedding_network = build_embedding_network(embedding_dimension) metric_network = build_metric_network(embedding_dimension)

triplet_snn = build_triplet_snn(input_shape=(size, size,3), embedding_network=embedding_network, metric_network=metric_network,margin=0.1)

learning_rate = 0.0001

epochs = 5

class TripletDataGenerator(tf.keras.utils.Sequence):

def __init__(self, triplet_dataset, shuffle=False):
self.triplet_dataset = triplet_dataset self.shuffle = shuffle
self.on_epoch_end() def __len__(self):

return len(self.triplet_dataset)

def __getitem__(self, index):

return triplet_dataset[index][0]

return (np.array(triplet_dataset[index][0]).reshape(1,224,224,3))

def on_epoch_end(self):

if self.shuffle == True:
random.shuffle(self.triplet_dataset)

data_gen = TripletDataGenerator(triplet_dataset)

filepath = ‘C:\Users\Y540\Desktop\Retinal Disease\TrainedSNN\temp\weights.{epoch}’

save_model_weights_at_every_epoch = tf.keras.callbacks.ModelCheckpoint( filepath,monitor=”loss”,verbose=1,save_best_only=False,save_weights_only=True,mode=”auto”,save_freq=”epoch” )

optimizer = tf.keras.optimizers.Adam(lr=learning_rate) triplet_snn.compile(loss=None, optimizer=optimizer, run_eagerly=True)

%%time

history = triplet_snn.fit(data_gen, epochs=epochs, verbose=1, callbacks=[save_model_weights_at_every_epoch])

submitted by /u/Intelligent_Term6689
[visit reddit] [comments]

Categories
Misc

Do you use GPU cloud platform?

Hi guys!

We are an GPU cloud platform based on blockchain, we have a plan to support developers about GPU. We want to know what’s the most important reason you choose GPU cloud platform if you are a user of GPU cloud platform. Compared to other GPU cloud platform, do your GPU cloud platform have any special function?

It would be great to get some insight from people who know this and willing to share the comments on this! We will invite 3 of you to get a free GPU for 72 hours. THX

View Poll

submitted by /u/May-Feng
[visit reddit] [comments]

Categories
Misc

Object Detection API performance on AMD G Embedded CPUs

This is my first post here, so hello everyone!
I am using SSD MobileNet V2 FPNLite 320×320 as model currently, but I am just prototyping, so it could change. Basically – I need a relatively low power device which will run my model in real time. Raspberry Pi is out of stock everywhere and I found possible alternative – second hand thin clients. Most of cheap ones have AMD G Embedded CPUs – G-T56N or G-T48E and I couldn’t find anything about them related in any way to machine learning. Will they have enough power to run object detection in real time? How do they compare to RPi 4 or 3 performance? I am obviously fine with bigger form factor and power consumption.

Any help will be appreciated!

submitted by /u/Own-Combination-4238
[visit reddit] [comments]

Categories
Misc

NVIDIA Announces Investor Day for Financial Community

NVIDIA will present the following virtual event for the financial community: NVIDIA Investor Day Tuesday, March 22, 2022, at 10 a.m. …

Categories
Offsites

Robust Graph Neural Networks

Graph Neural Networks (GNNs) are powerful tools for leveraging graph-structured data in machine learning. Graphs are flexible data structures that can model many different kinds of relationships and have been used in diverse applications like traffic prediction, rumor and fake news detection, modeling disease spread, and understanding why molecules smell.

Graphs can model the relationships between many different types of data, including web pages (left), social connections (center), or molecules (right).

As is standard in machine learning (ML), GNNs assume that training samples are selected uniformly at random (i.e., are an independent and identically distributed or “IID” sample). This is easy to do with standard academic datasets, which are specifically created for research analysis and therefore have every node already labeled. However, in many real world scenarios, data comes without labels, and labeling data can be an onerous process involving skilled human raters, which makes it difficult to label all nodes. In addition, biased training data is a common issue because the act of selecting nodes for labeling is usually not IID. For example, sometimes fixed heuristics are used to select a subset of data (which shares some characteristics) for labeling, and other times, human analysts individually choose data items for labeling using complex domain knowledge.

Localized training data is a typical non-IID bias exhibited in graph-structured data. This is shown on the left figure by taking an orange node and expanding to those around it. Instead, an IID training sample of nodes for labeling would be uniformly distributed, as illustrated by the sampling process on the right.

To quantify the amount of bias present in a training set, one can use methods that measure how large the shift is between two different probability distributions, where the size of the shift can be thought of as the amount of bias. As the shift grows in size, machine learning models have more difficulty generalizing from the biased training set. This situation can meaningfully hurt generalizability — on academic datasets, we’ve observed domain shifts causing a performance drop of 15-20% (as measured by the F1 score).

In “Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data”, presented at NeurIPS 2021, we introduce a solution for using GNNs on biased data. Called Shift-Robust GNN (SR-GNN), this approach is designed to account for distributional differences between biased training data and a graph’s true inference distribution. SR-GNN adapts GNN models to the presence of distributional shift between the nodes labeled for training and the rest of the dataset. We illustrate the effectiveness of SR-GNN in a variety of experiments with biased training datasets on common GNN benchmark datasets for semi-supervised learning and show that SR-GNN outperforms other GNN baselines in accuracy, reducing the negative effects of biased training data by 30–40%.

The Impact of Distribution Shifts on Performance
To demonstrate how distribution shift affects GNN performance, we first generate a number of biased training sets for known academic datasets. Then in order to understand the effect, we plot the generalization (test accuracy) versus a measure of distribution shift (the Central Moment Discrepancy1, CMD). For example, consider the well known PubMed citation dataset, which can be thought of as a graph where the nodes are medical research papers and the edges represent citations between them. When we generate biased training data for PubMed, the plot looks like this:

The effect of distribution shift on the PubMed dataset. Performance (F1) is shown on the y-axis vs. the distribution shift, Central Moment Discrepancy (CMD), on the x-axis, for 100 biased training set samples. As the distribution shift increases, the model’s accuracy falls.

Here one can observe a strong negative correlation between the distribution shift in the dataset and the classification accuracy: as CMD increases, the performance (F1) decreases. That is, GNNs can have difficulty generalizing as their training data looks less like the test dataset.

To address this, we propose a shift-robust regularizer (similar in idea to domain-invariant learning) to minimize the distribution shift between training data and an IID sample from unlabeled data. To do this, we measure the domain shift (e.g., via CMD) in real time as the model is training and apply a direct penalty based on this that forces the model to ignore as much of the training bias as possible. This forces the feature encoders that the model learns for the training data to also work effectively for any unlabeled data, which might come from a different distribution.

The figure below shows what this looks like when compared to a traditional GNN model. We still have the same inputs (the node features X, and the Adjacency Matrix A), and the same number of layers. However at the final embedding Zk from layer (k) of the GNN is compared against embeddings from unlabeled data points to verify that the model is correctly encoding them.

SR-GNN adds two kinds of regularizations to deep GNN models. First, a domain shift regularization (λ term) minimizes the distance between hidden representations of the labeled (Zk) and unlabeled (ZIID) data. Second, the instance weight (β) of the examples can be changed to further approximate the true distribution.

We write this regularization as an additional term in the formula for the model’s loss based on the distance between the training data’s representations and the true data’s distribution (full formulas available in the paper).

In our experiments, we compare our method and a number of standard graph neural network models, to measure their performance on node classification tasks. We demonstrate that adding the SR-GNN regularization gives a 30–40% percent improvement on classification tasks with biased training data labels.

A comparison of SR-GNN using node classification with biased training data on the PubMed dataset. SR-GNN outperforms seven baselines, including DGI, GCN, GAT, SGC and APPNP.

Shift-Robust Regularization for Linear GNNs via Instance Re-weighting
Moreover, it’s worth noting that there’s another class of GNN models (e.g., APPNP, SimpleGCN, etc) that are based on linear operations to speed up their graph convolutions. We also examined how to make these models more reliable in the presence of biased training data. While the same regularization mechanism can not be directly applied due to their different architecture, we can “correct” the training bias by re-weighting the training instances according to their distance from an approximated true distribution. This allows correcting the distribution of the biased training data without passing gradients through the model.

Finally, the two regularizations — for both deep and linear GNNs — can be combined into a generalized regularization for the loss, which combines both domain regularization and instance reweighting (details, including the loss formulas, available in the paper).

Conclusion
Biased training data is common in real world scenarios and can arise due to a variety of reasons, including difficulties of labeling a large amount of data, the various heuristics or inconsistent techniques that are used to choose nodes for labeling, delayed label assignment, and others. We presented a general framework (SR-GNN) that can reduce the influence of biased training data and can be applied to various types of GNNs, including both deeper GNNs and more recent linearized (shallow) versions of these models.

Acknowledgements
Qi Zhu is a PhD Student at UIUC. Thanks to our collaborators Natalia Ponomareva (Google Research) and Jiawei Han (UIUC). Thanks to Tom Small and Anton Tsitsulin for visualizations.


1We note that many measures of distribution shift have been proposed in the literature. Here we use CMD (as it is quick to calculate and generally shows good performance in the domain adaptation literature), but the concept generalizes to any measure of distribution distances/domain shift. 

Categories
Misc

Can’t gpu after upgrading the tensorflow from v2.3.2 to v2.8.

Can't gpu after upgrading the tensorflow from v2.3.2 to v2.8.

Hi,

I recently upgraded tensorflow of my local machine from v2.3.2 to v2.8 to use new features and now the tensorflow is unable to access the gpu. Below is the screenshot of command prompt while new tensorflow executes a gpu command. What should I do to rectify this?

https://preview.redd.it/hgp8dfbm55m81.png?width=1920&format=png&auto=webp&s=0d4d14f7061597d2770a4ae52fe6cb20d409aa13

submitted by /u/Better-Ad8608
[visit reddit] [comments]

Categories
Misc

Deploy AI Workloads at Scale with Bottlerocket and NVIDIA-Powered Amazon EC2 Instances

AWS and NVIDIA collaborated on Bottlerocket, a container-optimized OS, to support all NVIDIA powered Amazon EC2 instances like the P4d, P3, G4dn, and G5 instances.

Deploying AI-powered services like voice-based assistants, e-commerce product recommendations, and contact-center automation into production at scale is challenging. Delivering the best end-user experience while reducing operational costs requires accounting for multiple factors. These include composition and performance of underlying infrastructure, flexibility to scale resources based on user-demand, cluster management overhead, and security. 

To address the challenges of deploying AI at scale, Enterprise IT teams have adopted Kubernetes (K8s) for container orchestration and NVIDIA accelerated computing to meet the performance needs of production AI deployments. In addition, there’s a growing focus on the role of the operating system (OS) for production infrastructure. The host OS of the production environment has a direct impact on the security, resource utilization, and time it takes to provision and scale additional resources. This influences the user experience, security, and cost of deployments as user demand increases.

Botterocket: a Linux-based container-optimized OS 

Bottlerocket is a minimal, Linux based open-source OS developed by AWS that is purpose built for running containers. With a strong emphasis on security, it only includes essential software for running containers. 

This reduces the attack surface and impact of vulnerabilities, requiring less effort to meet node compliance requirements. In addition, the minimal host footprint of Bottlerocket helps improve node resource usage and boot times. 

Updates to Bottlerocket are applied in a single step and can be rolled back if necessary. This results in lower error rates and improved uptime for container applications. They can also be automated using container orchestration services such as Amazon Elastic Kubernetes Service(EKS) and Amazon Elastic Container Service (ECS).

Use Bottlerocket with Amazon EC2 instances powered by NVIDIA GPUs

AWS and NVIDIA have collaborated to enable Bottlerocket to support all NVIDIA-powered Amazon EC2 instances including P4d, P3, G4dn, and G5. This support combines the computational power of NVIDIA-powered GPU instances with the benefits of a container-optimized OS for deploying AI models on K8s clusters at scale. 

The result is enhanced security and faster boot times, especially when running AI workloads scaling additional GPU-based instances in real time. 

An illustration of the various applications that can be deployed.
Figure 1: Containerized GPU-optimized applications can be deployed on K8s clusters using Bottlerocket support for NVIDIA-powered Amazon EC2 instances.

Support for NVIDIA GPUs is delivered in the form of the Bottlerocket GPU-optimized AMI. This includes NVIDIA drivers, a K8s GPU device-plugin, and containerd runtime built into the base image. 

The AMI provides everything to provision and register self-managed nodes, with NVIDIA-powered GPU instances and Bottlerocket OS to an Amazon EKS cluster.

In addition, you can also leverage NVIDIA optimized software from the NVIDIA NGC Catalog on AWS Marketplace—a hub for pretrained models, scripts, Helm charts, and a wide array of AI and HPC software. 

For AI inference deployments on AWS, you can leverage the NVIDIA Triton Inference Server. Use the open-source inference serving software to deploy trained AI models from many frameworks including TensorFlow, TensorRT, PyTorch, ONNX, XGBoost, and Python on any GPU or CPU infrastructure.

Learn more about the Bottlerocket support for NVIDIA GPUs from AWS.