Categories
Misc

GAN for vector images

I am trying to figure out how to generate images using vector graphics instead of raster images like normal. I can not find any resources that seem to be handling a similar goal.

I have built a system that follows the Pix2Pix tutorial, but there is not a nice way to create derivatives. I have tried a brute force method (subtract before image from after image divided by parameters) and a more clever method using triangle areas, but the images never stop looking like random messes.

I tried using TensorFlow agents to do RI learning, but once again just end up with random messes.

Is there maybe a paper or resource out there that I am missing because I do not know the right search terms?

submitted by /u/DrOchensati
[visit reddit] [comments]

Categories
Misc

Researchers Demo ‘Almost-Unlimited Size’ Brain Simulations Using GPUs

To improve brain simulation technology, a team of researchers from the University of Sussex developed a GPU-accelerated approach that can generate brain simulation models of almost-unlimited size.

To improve brain simulation technology, a team of researchers from the University of Sussex developed a GPU-accelerated approach that can generate brain simulation models of almost-unlimited size. 

Researchers Dr. James Knight and Thomas Nowotny from the University of Sussex’s School of Engineering and Informatics detailed the work in a paper published in Nature Computational Science journal. 

Using a GPU-accelerated system composed of an NVIDIA TITAN RTX GPU, the team created a cutting-edge model of a Macaque’s visual cortex with 4.13 x 106 neurons and 24.2 x 109 synaptic weights, a simulation that could previously only be done on a supercomputer. 

The neural network-based simulator uses the large amount of computational power of the GPU to procedurally generate connectivity and synaptic weights as spikes are triggered, without having to store connectivity data in memory, the researchers explained. 

“Large-scale simulations of spiking neural network models are an important tool for improving our understanding of the dynamics and ultimately the function of brains. However, even small mammals such as mice have on the order of 1 × 1012 synaptic connections meaning that simulations require several terabytes of data – an unrealistic memory requirement for a single desktop machine,” the researchers explained. 

Dr James Knight and Prof Thomas Nowotny of the University of Sussex School of Engineering and Informatics.

According to the team, the initialization of the model took six minutes, and the simulation of each biological second took 7.7 min in the ground state, and 8.4 min in the resting state – 35% less time than a previous supercomputer simulation. 

On the software side, the team used the CUDA-based GPU enhanced Neuronal Networks (GeNN) package. GeNN can also be used through external interfaces such as SpineML and SpineCreator, a Python interface (PyGeNN), and a Brian interface via Brian2GeNN.

Results of full-scale multi-area model simulation in ground and resting states

“This research is a game-changer for computational Neuroscience and AI researchers who can now simulate brain circuits on their local workstations, but it also allows people outside academia to turn their gaming PC into a supercomputer and run large neural networks.”

A pre-print of the paper is available on bioRxiv under open-access terms. The Nature Computational Science paper can be found here.

Read more>

Categories
Offsites

TracIn — A Simple Method to Estimate Training Data Influence

The quality of a machine learning (ML) model’s training data can have a significant impact on its performance. One measure of data quality is the notion of influence, i.e., the degree to which a given training example affects the model and its predictive performance. And while influence is a well-known concept to ML researchers, the complexity behind deep learning models, coupled with their growing size, features and datasets, have made the quantification of influence difficult.

A few methods have been proposed recently to quantify influence. Some rely on changes in accuracy when retraining with one or several data points dropped, and some use established statistical methods, e.g., influence functions that estimate the impact of perturbing input points or representer methods that decompose a prediction into an importance weighted combination of training examples. Still other approaches require use of additional estimators, such as data valuation using reinforcement learning. Though these approaches are theoretically sound, their use in products has been limited by the resources needed to run them at scale or the additional burdens they place on training.

In “Estimating Training Data Influence by Tracing Gradient Descent”, published as a spotlight paper at NeurIPS 2020, we proposed TracIn, a simple scalable approach to tackle this challenge. The idea behind TracIn is straightforward — trace the training process to capture changes in prediction as individual training examples are visited. TracIn is effective in finding mislabeled examples and outliers from a variety of datasets, and is useful in explaining predictions in terms of training examples (as opposed to features) by assigning an influence score to each training example.

The Ideas Underlying TracIn
Deep learning algorithms are typically trained using an algorithm called stochastic gradient descent (SGD), or a variant of it. SGD operates by making multiple passes over the data and making modifications to the model parameters that locally reduce the loss (i.e., the model’s objective) with each pass. An example of this is demonstrated for an image classification task in the figure below, where the model’s task is to predict the subject of the test image on the left (“zucchini”). As the model progresses through training, it is exposed to various training examples that affect the loss on the test image, where the loss is a function both of the prediction score and the actual label — the higher the prediction score for zucchini, the lower the loss.

Estimating training data influence of the images on the right by tracing the loss change of the zucchini in the seatbelt image during training.

Suppose that the test example is known at training time and that the training process visited each training example one at a time. During the training, visiting a specific training example would change the model’s parameters, and that change would then modify the prediction/loss on the test example. If one could trace the training example through the process, then the change in loss or prediction on the test example could be attributed to the training example in question, where the influence of a training example would be the cumulative attribution across visits to the training example.

There are two types of relevant training examples. Those that reduce loss, like the images of zucchinis above, are called proponents, while those that increase loss, like the images of seatbelts, are called opponents. In the example above, the image labeled “sunglasses” is also a proponent, because it has a seatbelt in the image, but is labeled as “sunglasses,” driving the model to better distinguish between zucchini and seatbelts.

In practice, the test example is unknown at training time, a limitation that can be overcome by using the checkpoints output by the learning algorithm as a sketch of the training process. Another challenge is that the learning algorithm typically visits several points at once, not individually, which requires a method to disentangle the relative contributions of each training example. This can be done by applying pointwise loss gradients. Together, these two strategies capture the TracIn method, which can be reduced to the simple form of the dot product of loss gradients of the test and training examples, weighted by the learning rate, and summed across checkpoints.

The simple expression for TracIn influence. The dot product of loss gradients of training example (z) and test example (z’) is weighted by learning rate (ηi) at different checkpoints and summed up.

Alternatively, one could instead examine the influence on the prediction score, which would be useful if the test example has no label. This form simply requires the substitution of the loss gradient at the test example with the prediction gradient.

Computing Top Influence Examples
We illustrate the utility of TracIn by first calculating the loss gradient vector for some training data and a test example for a specific classification — an image of a chameleon — and then leveraging a standard k-nearest neighbors library to retrieve the top proponents and opponents. The top opponents indicate the chameleon’s ability to blend in! For comparison, we also show the k nearest neighbors with embeddings from the penultimate layer. Proponents are images that are not only similar, but also belong to the same class, and opponents are similar images but in a different class. Note that there isn’t an explicit enforcement on whether proponents or opponents belong to the same class.

Top row: Top proponents and opponents of influence vectors. Bottom row: Most similar and dissimilar examples of embedding vectors from the penultimate layer.

Clustering
The simplistic breakdown of the loss of the test example into training example influences given by TracIn also suggests that the loss (or prediction) from any gradient descent based neural model can be expressed as a sum of similarities in the space of gradients. Recent work has demonstrated that this functional form is similar to that of a kernel, implying that this gradient similarity described here can be applied to other similarity tasks, like clustering.

In this case, TracIn can be used as a similarity function within a clustering algorithm. To bound the similarity metric so that it can be converted to a distance measure (1 – similarity), we normalize the gradient vectors to have unit norm. Below, we apply TracIn clustering on images of zucchini to obtain finer clusters.

Finer clusters within Zucchini images using TracIn similarity. Each row is a cluster with zucchini in similar forms within the cluster: cross-sectionally sliced zucchini (top), piles of zucchinis (middle), and zucchinis on pizzas (bottom).

Identifying Outliers with Self-Influence
Finally, we can also use TracIn to identify outliers that exhibit a high self-influence, i.e., the influence of a training point on its own prediction. This happens either when the example is mislabeled or rare, both of which make it difficult for the model to generalize over the example. Below are some examples with high self-influence.

Mislabeled examples. Assigned labels are striked out, correct labels are at bottom.

Left: A rare oscilloscope example with just the oscillations, and no instrument in the image gets high self-influence. Right: Other common oscilloscope images have the scope with knobs and wires. These have a low self-influence.

Applications
Having no requirement other than being trained using SGD (or related variants), TracIn is task-independent and applicable to a variety of models. For example, we have used TracIn to study training data for a deep learning model used to parse queries to the Google Assistant, queries of the kind “set my alarm for 7AM”. We were intrigued to see that the top opponent for the query “disable my alarm” with an alarm active on the device, was “disable my timer”, also with an alarm active on the device. This suggests that Assistant users often interchange the words “timer” and “alarm”. TracIn helped us interpret the Assistant data.

More examples can be found in the paper, including a regression task on structured data and a number of text classification tasks.

Conclusion
TracIn is a simple, easy-to-implement, scalable way to compute the influence of training data examples on individual predictions or to find rare and mislabeled training examples. For implementation references of the method, you can find a link to code examples for images from the github linked in the paper.

Acknowledgements
The NeurIPS paper was jointly co-authored with Satyen Kale and Mukund Sundararajan (corresponding author). A special thanks to Binbin Xiong for providing various conceptual and implementation insights. We also thank Qiqi Yan and Salem Haykal for numerous discussions. Images throughout this post sourced from Getty Images.

Categories
Misc

Image Captioning with Visual Attention: TF, TPU, BLEU, BEAM

Anyone who is interested in deep learning image captioning has probably come across the Show, Attend and Tell paper. And anyone who is interested in implementing the architecture in TensorFlow has probably come across TensorFlow’s tutorial. @ratthachat provided a great notebook that extends TensorFlow’s tutorial with additions such as TPU training, Efficientnet encoder, and Glove embeddings. When I was interested in image captioning for my own custom dataset his tutorial was the best starting point I could find online. While working on my own dataset I needed to customize his notebook to add the features listed below. After doing so, I felt many others could benefit from the extensions so I am deciding to share it. Hope you all find it helpful.

  • Bleu Score metrics
  • Decoders
  • 1. Pure Sampling
  • 2. Top K Sampling
  • 3. Greedy Search
  • 4. Beam Search
  • Scheduled Sampling from https://arxiv.org/pdf/1506.03099.pdf
  • Early Stopping based off of validation bleu score

https://www.kaggle.com/kagglethomas88/flickr-image-captioning-tpu-tf2-glove-extended

submitted by /u/International_Fix_94
[visit reddit] [comments]

Categories
Misc

Latest from Stanford researchers: Embodied Intelligence via Learning and Evolution!

Latest from Stanford researchers: Embodied Intelligence via Learning and Evolution! submitted by /u/MLtinkerer
[visit reddit] [comments]
Categories
Misc

The Truck Stops Here: How AI Is Creating a New Kind of Commercial Vehicle

For many, the term “autonomous vehicles” conjures up images of self-driving cars. Autonomy, however, is transforming much more than personal transportation. Autonomous trucks are commercial vehicles that use AI to automate everything from shipping yard operations to long-haul deliveries. Due to industry pressures from rising delivery demand and driver shortages, as well as straightforward operational Read article >

The post The Truck Stops Here: How AI Is Creating a New Kind of Commercial Vehicle appeared first on The Official NVIDIA Blog.

Categories
Misc

Inception Spotlight: DarwinAI Achieves 96% Screening Accuracy for COVID-19 with Diverse CT Dataset

NVIDIA Inception partner DarwinAI developed a new AI model to detect COVID-19 in CT scans with 96% accuracy across a wide and diverse number of scenarios. The model, COVID-Net CT-2, was built using a number of large and diverse datasets created over several months with the University of Waterloo and is publicly available on GitHub.

Last year, the startup launched an open source neural network for COVID-19 detection called COVID-Net. Their new model builds upon the initiative with a more robust model, trained on the largest quantity and diversity of multinational patient cases in research literature. Their academic study detailing the construction and validation of the model can be found here.

DarwinAI used an NVIDIA RTX 6000 for training their neural network and the NVIDIA Jetson Nano embedded AI platform to run their inference workloads.

“By building COVID-NET CT-2 from such rich and voluminous data, we’ve been able to achieve a new level of screening accuracy, with COVID-19 sensitivity and positive predictive value exceeding 96% across a wide and diverse number of scenarios,” said Sheldon Fernandez, CEO of Darwin-AI. 

“Our XAI platform was instrumental in the construction of the original COVID-Net model. For this version, we have engaged two senior radiologists in Canada to validate the way in which COVID-Net CT-2 makes its decisions. Much to our delight, both confirmed the decision-making process of COVID-Net CT-2 is consistent with their own expert interpretations,” Fernandez added.  “In addition to illustrating the emerging cooperation between our respective domains, their validation exemplifies the importance of our XAI technology in constructing transparent and trustworthy AI.” 

The largest of the datasets used to develop the new AI model consists of over 4,500 patients across 15 countries with 200,000 CT slices.

Both DarwinAI and NVIDIA are enabling researchers to build neural networks to fight COVID-19 with the help of open source AI and publicly available pre-trained models. Medical imaging AI models for detecting COVID-19 in X-rays and CTs can be accessed through DarwinAI’s COVID-Net Initiative and the NVIDIA COVID-19 NGC Catalog.

Learn more about how AI, accelerated computing, and GPU technology are contributing to the worldwide battle against the novel coronavirus in the COVID-19 Research Hub.

Categories
Offsites

Machine Learning for Computer Architecture

One of the key contributors to recent machine learning (ML) advancements is the development of custom accelerators, such as Google TPUs and Edge TPUs, which significantly increase available compute power unlocking various capabilities such as AlphaGo, RankBrain, WaveNets, and Conversational Agents. This increase can lead to improved performance in neural network training and inference, enabling new possibilities in a broad range of applications, such as vision, language, understanding, and self-driving cars.

To sustain these advances, the hardware accelerator ecosystem must continue to innovate in architecture design and acclimate to rapidly evolving ML models and applications. This requires the evaluation of many different accelerator design points, each of which may not only improve the compute power, but also unravel a new capability. These design points are generally parameterized by a variety of hardware and software factors (e.g., memory capacity, number of compute units at different levels, parallelism, interconnection networks, pipelining, software mapping, etc.). This is a daunting optimization task, due to the fact that the search space is exponentially large1 while the objective function (e.g., lower latency and/or higher energy efficiency) is computationally expensive to evaluate through simulations or synthesis, making identification of feasible accelerator configurations challenging .

In “Apollo: Transferable Architecture Exploration”, we present the progress of our research on ML-driven design of custom accelerators. While recent work has demonstrated promising results in leveraging ML to improve the low-level floorplanning process (in which the hardware components are spatially laid out and connected in silicon), in this work we focus on blending ML into the high-level system specification and architectural design stage, a pivotal contributing factor to the overall performance of the chip in which the design elements that control the high-level functionality are established. Our research shows how ML algorithms can facilitate architecture exploration and suggest high-performing architectures across a range of deep neural networks, with domains spanning image classification, object detection, OCR and semantic segmentation.

Architecture Search Space and Workloads
The objective in architecture exploration is to discover a set of feasible accelerator parameters for a set of workloads, such that a desired objective function (e.g., the weighted average of runtime) is minimized under an optional set of user-defined constraints. However, the manifold of architecture search generally contains many points for which there is no feasible mapping from software to hardware. Some of these design points are known a priori and can be bypassed by formulating them as optimization constraints by the user (e.g., in the case of an area budget2 constraint, the total memory size must not pass over a predefined limit). However, due to the interplay of the architecture and compiler and the complexity of the search space, some of the constraints may not be properly formulated into the optimization, and so the compiler may not find a feasible software mapping for the target hardware. These infeasible points are not easy to formulate in the optimization problem, and are generally unknown until the whole compiler pass is performed. As such, one of main challenges for architecture exploration is to effectively sidestep the infeasible points for efficient exploration of the search space with a minimum number of cycle-accurate architecture simulations.

The following figure shows the overall architecture search space of a target ML accelerator. The accelerator contains a 2D array of processing elements (PE), each of which performs a set of arithmetic computations in a single instruction multiple data (SIMD) manner. The main architectural components of each PE are processing cores that include multiple compute lanes for SIMD operations. Each PE has shared memory (PE Memory) across all their compute cores, which is mainly used to store model activations, partial results, and outputs, while individual cores feature memory that is mainly used for storing model parameters. Each core has multiple compute lanes with multi-way multiply-accumulate (MAC) units. The results of model computations at each cycle are either stored back in the PE memory for further computation or are offloaded back into the DRAM.

Overview of the template-based ML accelerator used for architecture exploration.

Optimization Strategies
In this study, we explored four optimization strategies in the context of architecture exploration:

  1. Random:Samples the architecture search space uniformly at random.
  2. Vizier: Uses Bayesian optimization for the exploration in the search space in which the evaluation of the objective function is expensive (e.g. hardware simulation which can take hours to complete). Using a collection of sampled points from the search space, the Bayesian optimization forms a surrogate function, usually represented by a Gaussian process, that approximates the manifold of the search space. Guided by the value of the surrogate function, the Bayesian optimization algorithm decides, in an exploration and exploitation trade-off, whether to sample more from the promising regions in the manifold (exploitation) or sample more from the unseen regions in the search space (exploration). Then, the optimization algorithm uses these newly sampled points and further updates the surrogate function to better model the target search space. Vizier uses expected improvement as its core acquisition function.
  3. Evolutionary: Performs evolutionary search using a population of k individuals, where the genome of each individual corresponds to a sequence of discretized accelerator configurations. New individuals are generated by selecting for each individual two parents from the population using tournament selecting, recombining their genomes with some crossover rate, and mutating the recombined genome with some probability.
  4. Population-based black-box optimization (P3BO): Uses an ensemble of optimization methods, including evolutionary and model-based, which has been shown to increase sample-efficiency and robustness. The sampled data are exchanged between optimization methods in the ensemble, and optimizers are weighted by their performance history to generate new configurations. In our study, we use a variant of P3BO in which the hyper-parameters of the optimizers are dynamically updated using evolutionary search.

Accelerator Search Space Embeddings
To better visualize the effectiveness of each optimization strategy in navigating the accelerator search space, we use t-distributed stochastic neighbor embedding (t-SNE) to map the explored configurations into a two-dimensional space across the optimization horizon. The objective (reward) for all the experiments is defined as the throughput (inference/second) per accelerator area. In the figures below, the x and y axes indicate the t-SNE components (embedding 1 and embedding 2) of the embedding space. The star and circular markers show the infeasible (zero reward) and feasible design points, respectively, with the size of the feasible points corresponding to their reward.

As expected, the random strategy searches the space in a uniformly distributed way and eventually finds very few feasible points in the design space.

Visualization presenting the t-SNE of the explored design points (~4K) by random optimization strategy (max reward = 0.96). The maximum reward points (red cross markers) are highlighted at the last frame of the animation.

Compared to the random sampling approach, the Vizier default optimization strategy strikes a good balance between exploring the search space and finding the design points with higher rewards (1.14 vs. 0.96). However, this approach tends to get stuck in infeasible regions and, while it does find a few points with the maximum reward (indicated by the red cross markers), it finds few feasible points during the last iterations of exploration.

As above, with the Vizier (default) optimization strategy (max reward = 1.14). The maximum reward points (red cross markers) are highlighted at the last frame of the animation.

The evolutionary optimization strategy, on the other hand, finds feasible solutions very early in the optimization and assemble clusters of feasible points around them. As such, this approach mostly navigates the feasible regions (the green circles) and efficiently sidesteps the infeasible points. In addition, the evolutionary search is able to find more design options with maximum reward (the red crosses). This diversity in the solutions with high reward provides flexibility to the designer in exploring various architectures with different design trade-offs.

As above, with the evolutionary optimization strategy (max reward = 1.10). The maximum reward points (red cross markers) are highlighted at the last frame of the animation.

Finally, the population-based optimization method (P3BO) explores the design space in a more targeted way (regions with high reward points) in order to find optimal solutions. The P3BO strategy finds design points with the highest reward in search spaces with tighter constraints (e.g., a larger number of infeasible points), showing its effectiveness in navigating search spaces with large numbers of infeasible points.

As above, with the P3BO optimization strategy (max reward = 1.13). The maximum reward points (red cross markers) are highlighted at the last frame of the animation.

Architecture Exploration under Different Design Constraints
We also studied the benefits of each optimization strategy under different area budget constraints, 6.8 mm2, 5.8 mm2 and 4.8 mm2. The following violin plots show the full distribution of the maximum achievable reward at the end of optimization (after ten runs each with 4K trials) across the studied optimization strategies. The wider sections represent a higher probability of observing feasible architecture configurations at a particular given reward. This implies that we favor the optimization algorithm that yields increased width at the points with higher reward (higher performance).

The two top-performing optimization strategies for architecture exploration are evolutionary and P3BO, both in terms of delivering solutions with high reward and robustness across multiple runs. Looking into different design constraints, we observe that as one tightens the area budget constraint, the P3BO optimization strategy yields more high performing solutions. For example, when the area budget constraint is set to 5.8 mm2, P3BO finds design points with a reward (throughput / accelerator area) of 1.25 outperforming all the other optimization strategies. The same trend is observed when the area budget constraint is set to 4.8 mm2, a slightly better reward is found with more robustness (less variability) across multiple runs.

Violin plot showing the full distribution of the maximum achievable reward in ten runs across the optimization strategies after 4K trial evaluations under an area budget of 6.8 mm2. The P3BO and Evolutionary algorithm yield larger numbers of high-performing designs (wider sections). The x and y axes indicate the studied optimization algorithms and the geometric mean of speedup (reward) over the baseline accelerator, respectively.
As above, under an area budget of 5.8 mm2.
As above, under an area budget of 4.8 mm2.

Conclusion
While Apollo presents the first step towards better understanding of accelerator design space and building more efficient hardware, inventing accelerators with new capabilities is still an uncharted territory and a new frontier. We believe that this research is an exciting path forward to further explore ML-driven techniques for architecture design and co-optimization (e.g., compiler, mapping, and scheduling) across the computing stack to invent efficient accelerators with new capabilities for the next generation of applications.

Acknowledgments
This work was performed by Amir Yazdanbakhsh, Christof Angermueller, and Berkin Akin . We would like to also thank Milad Hashemi, Kevin Swersky, James Laudon, Herman Schmit, Cliff Young, Yanqi Zhou, Albin Jones, Satrajit Chatterjee, Ravi Narayanaswami, Ray (I-Jui) Sung, Suyog Gupta, Kiran Seshadri, Suvinay Subramanian, Matthew Denton, and the Vizier team for their help and support.


1 In our target accelerator, the total number of design points is around 5 x 108

2 The chip area is approximately the sum of total hardware components on the chip, including on-chip storage, processing engines, controllers, I/O pins, and etc.  

Categories
Misc

GFN Thursday — 30 Games Coming in February, 13 Available Today

Scientifically speaking, today is the best day of the week, because today is GFN Thursday. And that means more of the best PC games streaming right from the cloud across all of your devices. This is a special GFN Thursday, too — not just because it’s the first Thursday of the month, which means learning Read article >

The post GFN Thursday — 30 Games Coming in February, 13 Available Today appeared first on The Official NVIDIA Blog.

Categories
Misc

Achievement Unlocked: Celebrating Year One of GeForce NOW

It’s a celebration, gamers! One year ago to the day we launched GeForce NOW, our cloud gaming service that transforms ordinary hardware into an extraordinarily powerful GeForce gaming PC. It’s the always-on gaming rig that never needs upgrading or patching and can instantly play your library of games. We’ve been blown away by the passion Read article >

The post Achievement Unlocked: Celebrating Year One of GeForce NOW appeared first on The Official NVIDIA Blog.