![]() |
submitted by /u/Kagermanov [visit reddit] [comments] |

![]() |
submitted by /u/Kagermanov [visit reddit] [comments] |
cuStateVec is a library for acceleration of state vector-based quantum circuit simulation. We discuss APIs, integrations, and benchmarks.
Quantum computing aspires to deliver more powerful computation and faster results for certain types of classically intractable problems. Quantum circuit simulation is essential to understanding quantum computation and the development of quantum algorithms. In a quantum circuit, the quantum device is composed of N qubits, and computations are performed by applying a sequence of quantum gates and measurements to the qubits.
Mathematically, the quantum state of the N-qubit system can be described as a complex 2N-dimensional vector. The most intuitive method to simulate a quantum circuit on a classical computer, known as state vector simulation, stores this vector with its 2N complex values directly in memory. The circuit is executed by multiplying the vector by a series of matrices that correspond to the gate sequence that makes up the circuit.
However, as the dimension of the state vector grows exponentially with the number of qubits, the memory requirements for a full description of the state limits this method to circuits with 30–50 qubits. Alternative methods based on tensor networks can simulate significantly more qubits but are generally limited in the depth and complexity of circuits that they can effectively simulate.
The NVIDIA cuQuantum SDK features libraries for state vector and tensor network methods. In this post, we focus on state vector simulation and the cuStateVec library. For more information about the library for tensor network methods, see Scaling Quantum Circuit Simulation with NVIDIA cuTensorNet.
The cuStateVec library provides single GPU primitives to accelerate state vector simulations. As the state vector method is fundamental in simulating quantum circuits, most quantum computing frameworks and libraries include their own state vector simulator. To enable easy integration to these existing simulators, cuStateVec provides an API set to cover common use cases:
A qubit can exist in a superposition of two states, |0>
and |1>
. When a measurement is performed, one of the values is probabilistically selected and observed, and another value collapses. The cuStateVec measurement API simulates qubit measurement and supports use cases of the measurement on the Z-basis product and batched single-qubit measurements.
Quantum circuits have quantum logic gates to modify and prepare quantum states to observe a desirable result. Quantum logic gates are expressed as unitary matrices. The cuStateVec gate application API provides features to apply quantum logic gates for some matrix types, including the following:
In quantum mechanics, expectation value is calculated for an operator and a quantum state. For quantum circuits, we also calculate the expectation for given circuit and quantum states. cuStateVec has an API to calculate the expectation value with a small memory footprint.
The state vector simulation numerically keeps quantum states in the state vector. By calculating the probability for each state vector element, you can efficiently simulate measurements of multiple qubits for multiple times without collapsing the quantum state. The cuStateVec sampler API executes sampling on GPU with a small memory footprint.
The state vector is placed on a GPU to accelerate simulations by the GPU. To analyze a simulation result on a CPU, copy the resulting state vector to the CPU. cuStateVec provides the accessor API to do this on behalf of users. During the copy, the ordering of state vector elements can be rearranged so that you can reorder qubits into the desired qubit ordering.
For more information, see the cuStateVec documentation.
The first project to announce integration of the NVIDIA cuStateVec Library was Google’s qsim, an optimized simulator for their quantum computing framework, Cirq. The Google Quantum AI team extended qsim with a new cuStateVec-based GPU simulation backend to complement their CPU and CUDA simulator engines.
To enable cuStateVec through Cirq, compile qsim from the source and install the bindings for Cirq provided by the qsimcirq Python package.
# Prerequisite: # Download cuQuantum Beta2 from https://developer.nvidia.com/cuquantum-downloads # Extract cuQuantum Beta2 archive and set the path to CUQUANTUM_ROOT $ tar -xf cuquantum-linux-x86_64-0.1.0.30-archive.tar.xz $ export CUQUANTUM_ROOT=`pwd`/cuquantum-linux-x86_64-0.1.0.30-archive $ ln -sf $CUQUANTUM_ROOT/lib $CUQUANTUM_ROOT/lib64 # Clone qsim repository from github and checkout v0.11.1 branch $ git clone https://github.com/quantumlib/qsim.git $ git checkout v0.11.1 # Build and install qsimcirq with cuStateVec $ pip install . # Install cirq $ pip install cirq
In this example, we run a circuit that creates a Greenberger-Horne-Zeilinger (GHZ) state and samples experimental outcomes. The following Python script gets the amplitudes in |0…00>
and |1…11>
by calling three different simulators:
For the Cirq and qsim CPU-based simulators, we enable two sockets of a 64-core EPYC 7742 CPU. For the cuStateVec-accelerated simulation, we use a single A100 GPU.
import cirq import qsimcirq n_qubits = 32 qubits = cirq.LineQubit.range(n_qubits) circuit = cirq.Circuit() circuit.append(cirq.H(qubits[0])) circuit.append(cirq.CNOT(qubits[idx], qubits[idx + 1]) for idx in range(n_qubits - 1)) # Cirqs = cirq.sim.Simulator() result = s.compute_amplitudes(circuit, [0, 2**n_qubits-1]) print(f'cirq.sim : {result}') # qsim(CPU) options = qsimcirq.QSimOptions(max_fused_gate_size=4, cpu_threads=512) s = qsimcirq.QSimSimulator(options) result = s.compute_amplitudes(circuit, [0, 2**n_qubits-1]) print(f'qsim(CPU) : {result}') # qsim(cuStateVec) options = qsimcirq.QSimOptions(use_gpu=True, max_fused_gate_size=4, gpu_mode=1) s = qsimcirq.QSimSimulator(options) result = s.compute_amplitudes(circuit, [0, 2**n_qubits-1]) print(f'cuStateVec: {result}')
The following console output shows that the CPU version of qsim was 5.1x faster than Cirq’s simulator by optimizations with CPU SIMD instructions and OpenMP. By using cuStateVec version, the simulation is further accelerated, 30.04x faster than Cirq’s simulator and 5.9x faster than qsim’s CPU version.
cirq.sim : [0.70710677+0.j 0.70710677+0.j], 87.51 s qsim(CPU) : [(0.7071067690849304+0j), (0.7071067690849304+0j)], 17.04 s cuStateVec: [(0.7071067690849304+0j), (0.7071067690849304+0j)], 2.88 s
Preliminary performance results on gate applications of some popular circuits are shown in the following figures. Simulations are accelerated for all qubit counts. However, as the number of qubits is increased, the simulation becomes significantly accelerated, by a factor of roughly 10-20x for the largest circuits. This performance opens opportunities to explore development and evaluation of larger quantum circuits.
State vector simulations are also well suited for execution on multiple GPUs. Most gate applications are a perfectly parallel operation and accelerated by splitting the state vector and distributing it on several GPUs.
Beyond approximately 30 qubits, a multi-GPU simulation is inevitable. This is because a state vector is not able to fit in a single GPU’s memory due to its exponential increase in size with additional qubits.
When multiple GPUs work together on a simulation, each GPU can apply a gate to its part of the state vector in parallel. In most cases, each GPU only needs local data for the update of the state vector and each GPU can apply the gate independently.
However, depending on which of the simulated qubits a gate acts on, the GPUs might sometimes require parts of the state vector stored in a different GPU to perform the update. In this case, the GPUs must exchange large parts of the state vector. These parts are typically hundreds of megabytes or several gigabytes in size. Therefore, multi-GPU state vector simulations are sensitive to the bandwidth of the GPU interconnect.
The DGX A100 is a perfect match for these requirements, with eight NVIDIA A100 GPUs providing a GPU-to-GPU direct bandwidth of 600GB/s using NVLink. We chose three common quantum computing algorithms with 30-32 qubits to benchmark Cirq/qsim with cuStateVec on the DGX A100:
All benchmarks show good strong-scaling behavior between 4.5–7x speed-up on eight GPUs, compared to a single GPU run.
In comparison to the simulation time on two 64-core CPUs, the DGX-A100 delivers impressive overall speed-ups between 50–90x.
The cuStateVec library in the NVIDIA cuQuantum SDK aims to accelerate state vector simulators of quantum circuits on GPUs. Google’s simulator for Cirq qsim is one of the first simulators to adopt the library, benefiting Cirq users with the library’s GPU acceleration for their existing programs. Integrations to more quantum circuit frameworks will follow, including IBM’s Qiskit software.
We are also scaling up. Preliminary results for cuStateVec-based multi-GPU simulations show a 50–90x speedup on key quantum algorithms. We hope that cuStateVec becomes a valuable tool for breaking new ground in quantum computing.
Have feedback and suggestions on how we can improve the cuQuantum libraries? Send an email to cuquantum-feedback@nvidia.com.
The current Beta 2 version of cuQuantum is available for download. Documentation can be found here, and examples are on our GitHub.
Download the cuQuantum Appliance, our container for multi-GPU cuStateVec.
For more information, see the following resources:
You can learn more about NVIDIA cuQuantum and other advances through GTC sessions and posts:
Great sessions on custom computer vision models, expressive TTS, localized NLP, scalable recommenders, and commercial and healthcare robotics apps.
Looking for different topic areas? Keep an eye out for our other posts!
Join us at GTC, March 21-24, to explore the latest technology and research across AI, computer vision, data science, robotics, and more!
With over 900 options to choose from, our NVIDIA experts put together some can’t-miss sessions to help get you started:
Creating the Future: Creating the World’s Largest Synthetic Object Recognition Dataset for Industry (SORDI)
Jimmy Nassif, CTO, idealworks
Marc Kamradt, Head of TechOffice MUNICH, BMW Group
BMW builds a car every 56 seconds. How do they increase quality? They use robots and complement real data with synthetic. Learn how BMW, Microsoft, and NVIDIA are accelerating production and quality by recognizing parts, obstacles, and people through artificial intelligence-based computer vision.
How To Develop and Optimize Edge AI apps with NVIDIA DeepStream
Carlos Garcia-Sierra, DeepStream Product Manager, NVIDIA
Jitendra Kumar, Senior System Software Engineer, NVIDIA
This talk covers the best practices for developing and optimizing the performance of edge AI applications using DeepStream SDK. Deep dive into a multisensor, multimodel design and learn how to reduce development time and maximize performance using AI at the edge.
AI Models Made Simple with NVIDIA TAO
Chintan Shah, Senior Product Manager, NVIDIA
Akhil Docca, Senior Product Marketing Manager, NVIDIA
A primary challenge confronting enterprises is the demand for creating AI models far outpaces the number of data scientists available. Developers need to easily customize models and bring their AI to market faster. This session will demonstrate the power and ease of NVIDIA TAO that solves this problem. Get a preview at GTC for the new capabilities of TAO Toolkit, including Bring Your Own Model Weights, Rest APIs, TensorBoard visualization, new pretrained models, and more.
Conversational AI Demystified
Sirisha Rella, Product Marketing Manager, NVIDIA
It’s easier than ever to develop AI speech applications like virtual assistants and real-time transcription. Today’s advanced tools and technologies make it easy to fine-tune and build scalable, responsive applications. This popular session shows users how to build and deploy their first end-to-end conversational AI pipeline using NVIDIA Riva, as an example.
Expressive Neural Text-to-Speech
Andrew Breen, Senior Manager Text-to-Speech Research, Amazon
Text-to-speech (TTS) research expert Andrew Breen will give a high-level overview of recent developments in neural TTS, including adopted approaches, technical challenges, and future direction. Breen was awarded the IEE J. Langham Thomson premium in 1993, and has received business awards from BT, MCI, and Nuance. He invented the Laureate TTS system at BT Labs and founded Nuance’s TTS organization.
Building Large-scale, Localized Language Models: From Data Preparation to Training and Deployment to Production
Miguel Martinez, Senior Deep Learning Solution Architect, NVIDIA
Meriem Bendris, Senior Deep Learning Data Scientist, NVIDIA
Natural Language Processing (NLP) breakthroughs in large-scale language models have boosted the capability to solve problems with zero-shot translation and supervised fine-tuning. However, executing NLP models on localized languages remains limited due to data preparation, training, and deployment challenges. This session highlights scaling challenges and solutions to show how to optimize NLP models using NVIDIA NeMo Megatron—a framework for training large NLP models in other languages.
Building and Deploying Recommender Systems Quickly and Easily with NVIDIA Merlin
Even Oldridge, Senior Manager, Merlin Recommender Systems Team, NVIDIA
Merlin expert and Twitter influencer Even Oldridge will demonstrate how to optimize recommendation models for maximum performance and scale. Olrdige is a Twitter influencer and has 8 years of recommender system experience, along with a PhD in computer vision.
Building AI-based Recommender System Leveraging the Power of Deep Learning and GPU
Khalifeh AlJadda, Senior Director of Data Science, The Home Depot
Tackle AI-based recommendation system challenges and uncover best practices for delivering personalized experiences that differentiate you from competitors. Hear from Khalifeh AlJadda, an expert in implementing large-scale, distributed, machine-learning algorithms in search and recommendation engines. AlJadda leads the Recommendation Data Science, Search Data Science, and Visual AI teams at The Home Depot. With a PhD in computer science, he previously led the design and implementation of CareerBuilder’s language-agnostic semantic search engine.
Multi-Objective Optimization to Boost Exploration in Recommender Systems
Serdar Kadioglu, Vice President AI | Adjunct Assistant Professor, Fidelity Investments | Brown University
How can one use combinatorial optimization to formalize item universe selection in new applications with limited or no datasets? Serdar Kadioglu, will provide insights on how to apply techniques like unsupervised clustering and latent text embeddings to create a multilevel framework for your business. Kadioglu previously led the Advanced Constraint Technology R&D team at Oracle and worked at Adobe. As an adjunct professor at Brown University for computer science, Kadioglu’s algorithmic research is at the intersection of AI and discrete optimization with an interest in building robust and scalable products.
Delivering AI Robotics at Scale: A Behind-the-Scenes Look
Mostafa Rohaninejad, Founding Researcher, Covariant.ai
Bringing practical AI robotics into the physical world, such as on a factory floor, is hard. Covariant is working to solve this problem. Mostafa Rohaninejad is part of the core team that built the full AI stack at Covariant from the ground up. In his session, he will share both the technical challenges and the exciting commercial possibilities of AI Robotics.
Leveraging Embedded Computing to Unlock Autonomy in Human Environments
Andrea Thomaz, Co-Founder and CEO, Diligent Robotics
During the COVID-19 pandemic, hospitals faced high nurse turnover, record burnout, and crisis-level labor shortages. Hospitals must alleviate this staffing crisis. Enter Diligent Robotics, and their robot Moxi, which completes routine tasks to assist nursing staff. Andrea Thomaz will share the unique challenges in achieving robot autonomy in a busy hospital, like maneuvering around objects or navigating to a patient room, all while integrating multiple camera streams that feed into embedded GPUs.
Building Autonomy Off-Road from the Ground Up with Jetson
Nick Peretti, CV/ML Engineer, Scythe Robotics
Autonomy is critical when it comes to outdoor and off-road robotics, but environmental and task-specific demands require a different approach than indoor or on-road environments. Nick Peretti will share the NVIDIA Jetson-centered approach that Scythe Robotics uses to run its full sense-plan-act software suite with their autonomous commercial mowers. He will highlight tools and approaches that have enabled Scythe to move quickly to field units, and lessons learned along the way.
Airlines around the world are exploring several tactics to meet aggressive CO2 commitments set by the International Civil Aviation Organization (ICAO). This effort has been emphasized in Europe, where aviation accounts for 13.9% of the transportation industry’s carbon emissions. The largest push comes from the European Green Deal, which aims to decrease carbon emissions from transportation by 90% by 2051. The Lufthansa Group has gone even further, committing to a 50% reduction in emissions compared to 2019 by the year 2030 and to reach net-zero emissions by 2050.
One unexpected approach that airlines can use to lower carbon emissions is through optimizing their tail assignment, i.e., how to assign aircraft (identified by the aircraft registration painted on their tails) to legs in a way that minimizes the total operating cost, of which fuel is a major contributor. More fuel needed to operate the aircraft means higher operating costs and more carbon ejected into the atmosphere. For example, a typical long-haul flight (longer than ~4,100km or ~2,500mi) emits about a ton of CO2.
The amount of fuel needed to fly between origin and destination can vary widely — e.g., larger aircraft weigh more and therefore require more fuel, while modern and younger aircraft tend to be more fuel-efficient because they use newer technology. The mass of the fuel itself is also significant. Aircraft are less fuel-efficient early in their flights when their fuel tanks are full than later when the volume of fuel is reduced. Another important factor for the tail assignment is the number of passengers on board; as the number of bookings changes, a smaller or larger aircraft might be required. Other factors can affect fuel consumption, both negative (e.g., headwinds or the age of the engines) or positive (e.g., tailwinds, sharklets, skin).
During the past year, Google’s Operations Research team has been working with the Lufthansa Group to optimize their tail assignment to reduce carbon emissions and the cost of operating their flights. As part of this collaboration, we developed and launched a mathematical tail assignment solver that has been fully integrated to optimize the fleet schedule for SWISS International Air Lines (a Lufthansa Group subsidiary), which we estimate will result in significant reductions in carbon emissions. This solver is the first step of a multi-phase project that started at SWISS.
A Mathematical Model for Tail Assignment
We structure the task of tail assignment optimization as a network flow problem, which is essentially a directed graph characterized by a set of nodes and a set of arcs, with additional constraints related to the problem at hand. Nodes may have either a supply or a demand for a commodity, while arcs have a flow capacity and a cost per unit of flow. The goal is to determine flows for every arc that minimize the total flow cost of each commodity, while maintaining flow balance in the network.
We decided to use a flow network because it is the most common way of modeling this problem in literature, and the commodities, arcs, and nodes of the flow network have a simple one-to-one correspondence to tails, legs, and airports in the real-life problem. In this case, the arcs of the network correspond to each leg of the flight schedule, and each individual tail is a single instance of a commodity that “flows” along the network. Each leg and tail pair in the network has an associated assignment cost, and the model’s objective is to pick valid leg and tail pairs such that these assignment costs are minimized.
Aside from the standard network flow constraints, the model takes into account additional airline-specific constraints so that the solution is tailored to Lufthansa Group airlines. For example, aircraft turnaround times — i.e., the amount of time an aircraft spends on the ground between two consecutive flights — are airline-specific and can vary for a number of reasons. Catering might be loaded at an airline’s hub, reducing the turnaround time needed at outstations, or a route could have a higher volume of vacation travelers who often take longer to board and disembark than business travelers. Another constraint is that each aircraft must be on the ground for a nightly check at a specified airport’s maintenance hub to receive mandated maintenance work or cleaning. Furthermore, each airline has their own maintenance schedule, which can require aircraft to undergo routine maintenance checks every few nights, in part to help maintain the aircraft’s fuel efficiency.
Preliminary Results & Next Steps
After using our solver to optimize their fleet schedule in Europe, SWISS Airlines estimates an annual savings of over 3.5 million Swiss Francs and a 6500 ton reduction in CO2 emitted. We expect these savings will multiply when the model is rolled out to the rest of the airlines in the Lufthansa Group and again when traffic returns to pre-COVID levels. Future work will include ensuring this model is usable with larger sets of data, and adding crew and passenger assignment to the optimization system to improve the flight schedules for both passengers and flight crew.
If you are interested in experimenting with your own network flow models, check out OR-Tools, our open source software suite that can be used to build optimization solutions similar to the solver presented in this post. Refer to OR-Tools related documentation for more information.
Acknowledgements
Thanks to Jon Orwant for collaborating extensively on this blog post and for establishing the partnership with Lufthansa and SWISS, along with Alejandra Estanislao. Thanks to the Operations Research Team and to the folks at SWISS, this work could not be possible without their hard work and contributions.
Is your network getting long in the tooth and are you thinking about an upgrade? This blog will cover three areas to consider when updating your data center network.
Normally, data center networks are updated when new applications or servers are installed in the infrastructure. But independent of new server and application infrastructure forcing an update, there are other areas to consider. Three questions to ask when assessing if you need to update your network are:
Network device selection typically starts with understanding how the server Network Interface Cards (NICs) are configured. In the past, server NICs at 10 gigabits/second (10G) were considered the norm. But over the past 5 years we’ve seen a real growth in server computing power. In the accelerated computing world we tend to see 25 to 100G network speeds as the new norm for servers, with the latest servers able to use even 200G NICs.
With higher NIC speeds, the top-of-rack (leaf) switch needs to be upgraded. Failure to update your legacy core (spine) switches will cause oversubscription ratios to move unfavorably, introducing excess congestion and unpredictable latency. If you’re upgrading the leaf switches, you’ll need to upgrade the spine switches as well. Maintaining the same oversubscription ratio should be the goal.
In addition to the hardware, it may be time to upgrade your network operating system (NOS), especially if you’re using legacy network features and protocols that are hindering your network through inefficient management and packet forwarding. Legacy networks were often built using layer 2 infrastructure everywhere. The entire network would be on a single broadcast domain, and solutions like MLAG would provide one layer of redundancy for every link.
Updating the network to leverage solutions like Layer 3 to the host (also known as host based networking) or VXLAN+EVPN overlays, helps alleviate issues caused by inefficient broadcast domains. Host based networking and overlay networks enable better traffic management, easier provisioning, and more granular security. Anytime is a good time to update your network if it’s relying upon older technologies that aren’t modernized.
Beyond the infrastructure hardware and software, an area of optimization that benefits from modern updates is the operational workflow. Classic networks tend to have network admins log in to each switch, router or firewall individually. Configurations are applied unique to the node, and are backed up using a plain text file and stored locally on the network admin’s computer. These workflows tend to be error prone and can lead to typos or inefficiencies such as lost backups and slow maintenance windows. Configuration errors can open security gaps that a cyber adversary could exploit.
Modern networks leverage more advanced tooling and technologies that solve most of those problems.
Updating your networking can add benefits, including:
These optimized workflows reduce maintenance window times and errors in deployments, reducing the risk of security gaps while saving time and effort.
There are many reasons to update a network. It’s important to take stock of your network’s hardware, software and operational efficiencies and look at the big picture. With this information you can determine if updating your network will result in faster throughput, more productivity, and lower ownership costs.
I just need a little help regarding my project. I’ve already written a code but I am facing an error in that. I am using a few-shot learning technique, triplet neural network. The triplet neural network(TNN) is a horizontal concatenation triplet consisting of three identical Convolutional Neural Networks (with common parameters) that are trained with triplets of inputs. An anchor instance, a positive instance (of the same class as the anchor), and a negative instance make up the input triplet (different class from the anchor). After that, the network is trained to learn a triplet loss embedding function. To compute triplet loss, three training examples are required. Each triplet is formed by intentionally selecting training examples such that each triplet has: • a reference image called anchor image • an image having the same label as anchor is called a positive image • an image has a different label than the anchor called a negative image.
The TNN learns to create k-dimensional feature vector representation of images in such a manner that similar images lie closer in an embedding space of k-dimensions.
embedding_dimension = 128
from tensorflow.keras.applications.vgg16 import VGG16
pre_trained_vgg16 = tf.keras.applications.VGG16( input_shape=(size, size,3), include_top=False, weights=”imagenet”, input_tensor=None, pooling=None, classes=1000, classifier_activation=None )
pre_trained_vgg16.save(‘vgg16.h5’)
pre_trained_vgg16 = tf.keras.models.load_model(‘vgg16.h5’)
def build_embedding_network(embedding_dimension):
embedding_network = pre_trained_vgg16
embedding_network.trainable = False
x = embedding_network.output
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.Dense(2*embedding_dimension,
activation=’sigmoid'(x)
x = tf.keras.layers.Dense(embedding_dimension, activation=’sigmoid’)
(x)
embedding_network = tf.keras.Model(embedding_network.input, x,
name=”embedding_network”)return embedding_network
def build_metric_network(single_embedding_dim):
input1 = tf.keras.layers.Input((single_embedding_dim), name=”input1″)
input2 = tf.keras.layers.Input((single_embedding_dim), name=”input2″)
embedded_distance =
tf.keras.layers.Subtract(name=’subtract_embeddings’) ([input1, input2])embedded_distance = tf.keras.layers.Lambda(lambda
x:K.sqrt(K.sum(K.square(x), axis=-1, keepdims=True)),
name=’euclidean_distance’)(embedded_distance)metric_network = tf.keras.Model(inputs=[input1, input2],
outputs=[embedded_distance],
name=”metric_network”)return metric_network
class TripletLossLayer(tf.keras.layers.Layer):
def __init__(self, margin, **kwargs):
self.margin = margin
super(TripletLossLayer, self).__init__(**kwargs)
def triplet_loss(self, inputs):
ap_dist, an_dist = inputs
square
ap_dist2 = K.square(ap_dist)
an_dist2 = K.square(an_dist)
return K.sum(K.maximum(ap_dist2 – an_dist2 + self.margin, 0))
def call(self, inputs):
loss = self.triplet_loss(inputs)
self.add_loss(loss)
return loss
def get_config(self):
config = super().get_config().copy()
config.update({‘margin’: self.margin})
return config
def build_triplet_snn(input_shape, embedding_network, metric_network, margin=0.1):
Define the tensors for the three input images
anchor_input = tf.keras.layers.Input(input_shape, name=”anchor_input”)
positive_input = tf.keras.layers.Input(input_shape,
name=”positive_input”)negative_input = tf.keras.layers.Input(input_shape,
name=”negative_input”)Generate the embeddings (feature vectors) for the three images
embedding_a = embedding_network(anchor_input)
embedding_p = embedding_network(positive_input)
embedding_n = embedding_network(negative_input)
ap_dist = metric_network([embedding_a,embedding_p])
an_dist = metric_network([embedding_a,embedding_n])
Triplet loss layer
loss_layer = TripletLossLayer(margin=margin, name=’TripletLossLayer’)([ap_dist, an_dist])
Compute the concatenated pairs
all_concatenated = tf.keras.layers.Concatenate(axis=-1,name=”All-Embeddings”)([embedding_a,embedding_p,embedding_n])
Connect the inputs with the outputs
triplet_snn = tf.keras.Model(inputs=[anchor_input, positive_input,
negative_input],outputs=[loss_layer, all_concatenated],
name=”triplet_snn”)Return the model
return triplet_snn
embedding_network = build_embedding_network(embedding_dimension) metric_network = build_metric_network(embedding_dimension)
triplet_snn = build_triplet_snn(input_shape=(size, size,3), embedding_network=embedding_network, metric_network=metric_network,margin=0.1)
learning_rate = 0.0001
epochs = 5
class TripletDataGenerator(tf.keras.utils.Sequence):
def __init__(self, triplet_dataset, shuffle=False):
self.triplet_dataset = triplet_dataset self.shuffle = shuffle
self.on_epoch_end() def __len__(self):return len(self.triplet_dataset)
def __getitem__(self, index):
return triplet_dataset[index][0]
return (np.array(triplet_dataset[index][0]).reshape(1,224,224,3))
def on_epoch_end(self):
if self.shuffle == True:
random.shuffle(self.triplet_dataset)data_gen = TripletDataGenerator(triplet_dataset)
filepath = ‘C:\Users\Y540\Desktop\Retinal Disease\TrainedSNN\temp\weights.{epoch}’
save_model_weights_at_every_epoch = tf.keras.callbacks.ModelCheckpoint( filepath,monitor=”loss”,verbose=1,save_best_only=False,save_weights_only=True,mode=”auto”,save_freq=”epoch” )
optimizer = tf.keras.optimizers.Adam(lr=learning_rate) triplet_snn.compile(loss=None, optimizer=optimizer, run_eagerly=True)
%%time
history = triplet_snn.fit(data_gen, epochs=epochs, verbose=1, callbacks=[save_model_weights_at_every_epoch])
submitted by /u/Intelligent_Term6689
[visit reddit] [comments]
Hi guys!
We are an GPU cloud platform based on blockchain, we have a plan to support developers about GPU. We want to know what’s the most important reason you choose GPU cloud platform if you are a user of GPU cloud platform. Compared to other GPU cloud platform, do your GPU cloud platform have any special function?
It would be great to get some insight from people who know this and willing to share the comments on this! We will invite 3 of you to get a free GPU for 72 hours. THX
submitted by /u/May-Feng
[visit reddit] [comments]
This is my first post here, so hello everyone!
I am using SSD MobileNet V2 FPNLite 320×320 as model currently, but I am just prototyping, so it could change. Basically – I need a relatively low power device which will run my model in real time. Raspberry Pi is out of stock everywhere and I found possible alternative – second hand thin clients. Most of cheap ones have AMD G Embedded CPUs – G-T56N or G-T48E and I couldn’t find anything about them related in any way to machine learning. Will they have enough power to run object detection in real time? How do they compare to RPi 4 or 3 performance? I am obviously fine with bigger form factor and power consumption.
Any help will be appreciated!
submitted by /u/Own-Combination-4238
[visit reddit] [comments]
NVIDIA will present the following virtual event for the financial community: NVIDIA Investor Day Tuesday, March 22, 2022, at 10 a.m. …
Graph Neural Networks (GNNs) are powerful tools for leveraging graph-structured data in machine learning. Graphs are flexible data structures that can model many different kinds of relationships and have been used in diverse applications like traffic prediction, rumor and fake news detection, modeling disease spread, and understanding why molecules smell.
Graphs can model the relationships between many different types of data, including web pages (left), social connections (center), or molecules (right). |
As is standard in machine learning (ML), GNNs assume that training samples are selected uniformly at random (i.e., are an independent and identically distributed or “IID” sample). This is easy to do with standard academic datasets, which are specifically created for research analysis and therefore have every node already labeled. However, in many real world scenarios, data comes without labels, and labeling data can be an onerous process involving skilled human raters, which makes it difficult to label all nodes. In addition, biased training data is a common issue because the act of selecting nodes for labeling is usually not IID. For example, sometimes fixed heuristics are used to select a subset of data (which shares some characteristics) for labeling, and other times, human analysts individually choose data items for labeling using complex domain knowledge.
To quantify the amount of bias present in a training set, one can use methods that measure how large the shift is between two different probability distributions, where the size of the shift can be thought of as the amount of bias. As the shift grows in size, machine learning models have more difficulty generalizing from the biased training set. This situation can meaningfully hurt generalizability — on academic datasets, we’ve observed domain shifts causing a performance drop of 15-20% (as measured by the F1 score).
In “Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training Data”, presented at NeurIPS 2021, we introduce a solution for using GNNs on biased data. Called Shift-Robust GNN (SR-GNN), this approach is designed to account for distributional differences between biased training data and a graph’s true inference distribution. SR-GNN adapts GNN models to the presence of distributional shift between the nodes labeled for training and the rest of the dataset. We illustrate the effectiveness of SR-GNN in a variety of experiments with biased training datasets on common GNN benchmark datasets for semi-supervised learning and show that SR-GNN outperforms other GNN baselines in accuracy, reducing the negative effects of biased training data by 30–40%.
The Impact of Distribution Shifts on Performance
To demonstrate how distribution shift affects GNN performance, we first generate a number of biased training sets for known academic datasets. Then in order to understand the effect, we plot the generalization (test accuracy) versus a measure of distribution shift (the Central Moment Discrepancy1, CMD). For example, consider the well known PubMed citation dataset, which can be thought of as a graph where the nodes are medical research papers and the edges represent citations between them. When we generate biased training data for PubMed, the plot looks like this:
The effect of distribution shift on the PubMed dataset. Performance (F1) is shown on the y-axis vs. the distribution shift, Central Moment Discrepancy (CMD), on the x-axis, for 100 biased training set samples. As the distribution shift increases, the model’s accuracy falls. |
Here one can observe a strong negative correlation between the distribution shift in the dataset and the classification accuracy: as CMD increases, the performance (F1) decreases. That is, GNNs can have difficulty generalizing as their training data looks less like the test dataset.
To address this, we propose a shift-robust regularizer (similar in idea to domain-invariant learning) to minimize the distribution shift between training data and an IID sample from unlabeled data. To do this, we measure the domain shift (e.g., via CMD) in real time as the model is training and apply a direct penalty based on this that forces the model to ignore as much of the training bias as possible. This forces the feature encoders that the model learns for the training data to also work effectively for any unlabeled data, which might come from a different distribution.
The figure below shows what this looks like when compared to a traditional GNN model. We still have the same inputs (the node features X, and the Adjacency Matrix A), and the same number of layers. However at the final embedding Zk from layer (k) of the GNN is compared against embeddings from unlabeled data points to verify that the model is correctly encoding them.
We write this regularization as an additional term in the formula for the model’s loss based on the distance between the training data’s representations and the true data’s distribution (full formulas available in the paper).
In our experiments, we compare our method and a number of standard graph neural network models, to measure their performance on node classification tasks. We demonstrate that adding the SR-GNN regularization gives a 30–40% percent improvement on classification tasks with biased training data labels.
A comparison of SR-GNN using node classification with biased training data on the PubMed dataset. SR-GNN outperforms seven baselines, including DGI, GCN, GAT, SGC and APPNP. |
Shift-Robust Regularization for Linear GNNs via Instance Re-weighting
Moreover, it’s worth noting that there’s another class of GNN models (e.g., APPNP, SimpleGCN, etc) that are based on linear operations to speed up their graph convolutions. We also examined how to make these models more reliable in the presence of biased training data. While the same regularization mechanism can not be directly applied due to their different architecture, we can “correct” the training bias by re-weighting the training instances according to their distance from an approximated true distribution. This allows correcting the distribution of the biased training data without passing gradients through the model.
Finally, the two regularizations — for both deep and linear GNNs — can be combined into a generalized regularization for the loss, which combines both domain regularization and instance reweighting (details, including the loss formulas, available in the paper).
Conclusion
Biased training data is common in real world scenarios and can arise due to a variety of reasons, including difficulties of labeling a large amount of data, the various heuristics or inconsistent techniques that are used to choose nodes for labeling, delayed label assignment, and others. We presented a general framework (SR-GNN) that can reduce the influence of biased training data and can be applied to various types of GNNs, including both deeper GNNs and more recent linearized (shallow) versions of these models.
Acknowledgements
Qi Zhu is a PhD Student at UIUC. Thanks to our collaborators Natalia Ponomareva (Google Research) and Jiawei Han (UIUC). Thanks to Tom Small and Anton Tsitsulin for visualizations.
1We note that many measures of distribution shift have been proposed in the literature. Here we use CMD (as it is quick to calculate and generally shows good performance in the domain adaptation literature), but the concept generalizes to any measure of distribution distances/domain shift. ↩