DataBloom - Part 555

The ants and the pheromones

Post author By
Post date February 8, 2021
No Comments on The ants and the pheromones

TLDR; this is the last edition of The Morning Paper for now. Plus: one strand of research you won’t want to miss!

I was listening to a BBC Radio 4 podcast recently (More or Less: Behind the Stats – Ants and Algorithms) in which the host Tim Harford is interviewing David Sumpter about his recent book, ‘The ten equations that rule the world.’ One of those equations, the ‘reward equation’ models how ants communicate using pheromones, and our own brains keep track of rewards using dopamine.

About 4 and a half minutes into the podcast Tim asks a fascinating question: the reward equation includes a decay or ‘forgetting’ parameter, so what happens if you disrupt established solutions for long enough that their hold is broken? For example, the complete disruption to our established routines that Covid has caused over the last year? The answer for ants, if you disrupt all of the pheromone trails around their nest, is that they converge on a new solution in the environment, but it won’t necessarily look the same as the one they had before the disruption. (If you’re interested in the amazing problem-solving skills of ants and how we can learn from them in computer science, I covered ‘Ant algorithms for discrete optimization’ in a previous edition of The Morning Paper). It’s highly likely that the same thing will happen to us when we can eventually return to normal – the patterns that we establish won’t necessarily be the same as the ones we had before the series of lockdowns began.

The lockdowns (as I write this, we’re in another strict lockdown in England, with no end date given) have certainly disrupted my own routines. I’ve lost the time and space that I depended on for studying and writing The Morning Paper – the one-hour each way train journey on my morning commute, and more crucially with two older children both full-time studying from home, the time and space within the home for the many hours of concentrated work required. I don’t think my love of learning will ever leave me though, and at the same time I’ve been branching out and studying other things: philosophy, ethics, physics, a little maths, a little biology,… I’m really enjoying that. My love of computer science remains of course, but when we finally get to lay down our new pheromone trails and establish a new normal, I’m not sure I’m going to want to focus on computer science to the exclusion of all else. It’s been an intense six-and-a-half years doing largely that while writing the blog. For the time being then, I’m putting The Morning Paper back on pause.

Before I wrap up though, I can’t resist pointing you in the direction of one incredibly exciting research project from the Hydro team at Berkeley’s RISELab. Joe Hellerstein recently posted a whole bunch of links and resources in this Twitter thread:

I’m super excited about the new chapter emerging in our research on a programmable cloud. This is what comes after serverless, people.

In this thread, a few recent talks/papers on the vision. First off — 10 minute pitch from CIDR is here. https://t.co/fEMboOGF7Q

— Joe Hellerstein (@joe_hellerstein) January 28, 2021

The “PACT” paper is here: New Directions in Cloud Programming, Cheung et al., CIDR 2021.

Misc

Suggestions on building machine transliteration models

Post author By
Post date February 7, 2021
No Comments on Suggestions on building machine transliteration models

I’m trying to build a english to assamese transliteration model. I tried character level NMT with attention, but not satisfied with the results, considering the Assamese language consists of prefix/suffixes. Currently exploring WFST. Has anybody worked on something similar?

submitted by /u/ckraybpytao
[visit reddit] [comments]

Misc

C++ vs Python

If I redo all my python code in tf C++ API, Cython, and if necessary C++ would it actually be any faster if the most time consuming part is training the models? Does TF’s python API already execute the code in a similar way to how it would with the C++ API?

submitted by /u/BestUCanIsGoodEnough
[visit reddit] [comments]

Misc

tensorflow MIA

Hey guys,

I’ve been trying to install tensorflow on my computer in a venv. when I do pip list I am met with a list of modules. One of which is tensorflow 2.4.1 meaning that it should have install correctly(?).

However, when I do python3 and import tensorflow, it results in an error saying tensorflow.python doesn’t exist. Any ideas?

submitted by /u/Real_MichaelCera
[visit reddit] [comments]

Misc

From Google researchers: State of the art in Video Stabilization!

Post author By
Post date February 6, 2021
No Comments on From Google researchers: State of the art in Video Stabilization!

submitted by /u/MLtinkerer
[visit reddit] [comments]

Misc

COMPUTER VISION OBJECT DETECTION

Post author By
Post date February 6, 2021
No Comments on COMPUTER VISION OBJECT DETECTION

Hi everyone!

I have a project that I need help with. This project includes detecting object in a video down to an accuracy of a few pixels (stable background). If anyone one has any expertise please message me. I would love to get some help from this community. Thank you all 🙏🏻

submitted by /u/nomeaningg
[visit reddit] [comments]

Misc

Neural Networks Generate New Dwight Schrute Quotes

Post author By
Post date February 6, 2021
No Comments on Neural Networks Generate New Dwight Schrute Quotes

Neural Networks Generate New Dwight Schrute Quotes

submitted by /u/Snoo28889
[visit reddit] [comments]

Misc

GAN for vector images

I am trying to figure out how to generate images using vector graphics instead of raster images like normal. I can not find any resources that seem to be handling a similar goal.

I have built a system that follows the Pix2Pix tutorial, but there is not a nice way to create derivatives. I have tried a brute force method (subtract before image from after image divided by parameters) and a more clever method using triangle areas, but the images never stop looking like random messes.

I tried using TensorFlow agents to do RI learning, but once again just end up with random messes.

Is there maybe a paper or resource out there that I am missing because I do not know the right search terms?

submitted by /u/DrOchensati
[visit reddit] [comments]

Misc

Researchers Demo ‘Almost-Unlimited Size’ Brain Simulations Using GPUs

Post author By
Post date February 5, 2021
No Comments on Researchers Demo ‘Almost-Unlimited Size’ Brain Simulations Using GPUs

To improve brain simulation technology, a team of researchers from the University of Sussex developed a GPU-accelerated approach that can generate brain simulation models of almost-unlimited size.

Researchers Dr. James Knight and Thomas Nowotny from the University of Sussex’s School of Engineering and Informatics detailed the work in a paper published in Nature Computational Science journal.

Using a GPU-accelerated system composed of an NVIDIA TITAN RTX GPU, the team created a cutting-edge model of a Macaque’s visual cortex with 4.13 x 106 neurons and 24.2 x 109 synaptic weights, a simulation that could previously only be done on a supercomputer.

The neural network-based simulator uses the large amount of computational power of the GPU to procedurally generate connectivity and synaptic weights as spikes are triggered, without having to store connectivity data in memory, the researchers explained.

“Large-scale simulations of spiking neural network models are an important tool for improving our understanding of the dynamics and ultimately the function of brains. However, even small mammals such as mice have on the order of 1 × 1012 synaptic connections meaning that simulations require several terabytes of data – an unrealistic memory requirement for a single desktop machine,” the researchers explained.

*Dr James Knight and Prof Thomas Nowotny of the University of Sussex School of Engineering and Informatics.*

According to the team, the initialization of the model took six minutes, and the simulation of each biological second took 7.7 min in the ground state, and 8.4 min in the resting state – 35% less time than a previous supercomputer simulation.

On the software side, the team used the CUDA-based GPU enhanced Neuronal Networks (GeNN) package. GeNN can also be used through external interfaces such as SpineML and SpineCreator, a Python interface (PyGeNN), and a Brian interface via Brian2GeNN.

*Results of full-scale multi-area model simulation in ground and resting states*

“This research is a game-changer for computational Neuroscience and AI researchers who can now simulate brain circuits on their local workstations, but it also allows people outside academia to turn their gaming PC into a supercomputer and run large neural networks.”

A pre-print of the paper is available on bioRxiv under open-access terms. The Nature Computational Science paper can be found here.

TracIn — A Simple Method to Estimate Training Data Influence

Post author By
Post date February 5, 2021
No Comments on TracIn — A Simple Method to Estimate Training Data Influence

Posted by Frederick Liu and Garima Pruthi, Software Engineers, Google Research

The quality of a machine learning (ML) model’s training data can have a significant impact on its performance. One measure of data quality is the notion of influence, i.e., the degree to which a given training example affects the model and its predictive performance. And while influence is a well-known concept to ML researchers, the complexity behind deep learning models, coupled with their growing size, features and datasets, have made the quantification of influence difficult.

A few methods have been proposed recently to quantify influence. Some rely on changes in accuracy when retraining with one or several data points dropped, and some use established statistical methods, e.g., influence functions that estimate the impact of perturbing input points or representer methods that decompose a prediction into an importance weighted combination of training examples. Still other approaches require use of additional estimators, such as data valuation using reinforcement learning. Though these approaches are theoretically sound, their use in products has been limited by the resources needed to run them at scale or the additional burdens they place on training.

In “Estimating Training Data Influence by Tracing Gradient Descent”, published as a spotlight paper at NeurIPS 2020, we proposed TracIn, a simple scalable approach to tackle this challenge. The idea behind TracIn is straightforward — trace the training process to capture changes in prediction as individual training examples are visited. TracIn is effective in finding mislabeled examples and outliers from a variety of datasets, and is useful in explaining predictions in terms of training examples (as opposed to features) by assigning an influence score to each training example.

The Ideas Underlying TracIn
Deep learning algorithms are typically trained using an algorithm called stochastic gradient descent (SGD), or a variant of it. SGD operates by making multiple passes over the data and making modifications to the model parameters that locally reduce the loss (i.e., the model’s objective) with each pass. An example of this is demonstrated for an image classification task in the figure below, where the model’s task is to predict the subject of the test image on the left (“zucchini”). As the model progresses through training, it is exposed to various training examples that affect the loss on the test image, where the loss is a function both of the prediction score and the actual label — the higher the prediction score for zucchini, the lower the loss.

Estimating training data influence of the images on the right by tracing the loss change of the zucchini in the seatbelt image during training.

Suppose that the test example is known at training time and that the training process visited each training example one at a time. During the training, visiting a specific training example would change the model’s parameters, and that change would then modify the prediction/loss on the test example. If one could trace the training example through the process, then the change in loss or prediction on the test example could be attributed to the training example in question, where the influence of a training example would be the cumulative attribution across visits to the training example.

There are two types of relevant training examples. Those that reduce loss, like the images of zucchinis above, are called proponents, while those that increase loss, like the images of seatbelts, are called opponents. In the example above, the image labeled “sunglasses” is also a proponent, because it has a seatbelt in the image, but is labeled as “sunglasses,” driving the model to better distinguish between zucchini and seatbelts.

In practice, the test example is unknown at training time, a limitation that can be overcome by using the checkpoints output by the learning algorithm as a sketch of the training process. Another challenge is that the learning algorithm typically visits several points at once, not individually, which requires a method to disentangle the relative contributions of each training example. This can be done by applying pointwise loss gradients. Together, these two strategies capture the TracIn method, which can be reduced to the simple form of the dot product of loss gradients of the test and training examples, weighted by the learning rate, and summed across checkpoints.

The simple expression for TracIn influence. The dot product of loss gradients of training example (z) and test example (z’) is weighted by learning rate (η_i) at different checkpoints and summed up.

Alternatively, one could instead examine the influence on the prediction score, which would be useful if the test example has no label. This form simply requires the substitution of the loss gradient at the test example with the prediction gradient.

Computing Top Influence Examples
We illustrate the utility of TracIn by first calculating the loss gradient vector for some training data and a test example for a specific classification — an image of a chameleon — and then leveraging a standard k-nearest neighbors library to retrieve the top proponents and opponents. The top opponents indicate the chameleon’s ability to blend in! For comparison, we also show the k nearest neighbors with embeddings from the penultimate layer. Proponents are images that are not only similar, but also belong to the same class, and opponents are similar images but in a different class. Note that there isn’t an explicit enforcement on whether proponents or opponents belong to the same class.

Top row: Top proponents and opponents of influence vectors. Bottom row: Most similar and dissimilar examples of embedding vectors from the penultimate layer.

Clustering
The simplistic breakdown of the loss of the test example into training example influences given by TracIn also suggests that the loss (or prediction) from any gradient descent based neural model can be expressed as a sum of similarities in the space of gradients. Recent work has demonstrated that this functional form is similar to that of a kernel, implying that this gradient similarity described here can be applied to other similarity tasks, like clustering.

In this case, TracIn can be used as a similarity function within a clustering algorithm. To bound the similarity metric so that it can be converted to a distance measure (1 – similarity), we normalize the gradient vectors to have unit norm. Below, we apply TracIn clustering on images of zucchini to obtain finer clusters.

Finer clusters within Zucchini images using TracIn similarity. Each row is a cluster with zucchini in similar forms within the cluster: cross-sectionally sliced zucchini (top), piles of zucchinis (middle), and zucchinis on pizzas (bottom).

Identifying Outliers with Self-Influence
Finally, we can also use TracIn to identify outliers that exhibit a high self-influence, i.e., the influence of a training point on its own prediction. This happens either when the example is mislabeled or rare, both of which make it difficult for the model to generalize over the example. Below are some examples with high self-influence.

Mislabeled examples. Assigned labels are striked out, correct labels are at bottom.

Left: A rare oscilloscope example with just the oscillations, and no instrument in the image gets high self-influence. Right: Other common oscilloscope images have the scope with knobs and wires. These have a low self-influence.

Applications
Having no requirement other than being trained using SGD (or related variants), TracIn is task-independent and applicable to a variety of models. For example, we have used TracIn to study training data for a deep learning model used to parse queries to the Google Assistant, queries of the kind “set my alarm for 7AM”. We were intrigued to see that the top opponent for the query “disable my alarm” with an alarm active on the device, was “disable my timer”, also with an alarm active on the device. This suggests that Assistant users often interchange the words “timer” and “alarm”. TracIn helped us interpret the Assistant data.

More examples can be found in the paper, including a regression task on structured data and a number of text classification tasks.

Conclusion
TracIn is a simple, easy-to-implement, scalable way to compute the influence of training data examples on individual predictions or to find rare and mislabeled training examples. For implementation references of the method, you can find a link to code examples for images from the github linked in the paper.

Acknowledgements
The NeurIPS paper was jointly co-authored with Satyen Kale and Mukund Sundararajan (corresponding author). A special thanks to Binbin Xiong for providing various conceptual and implementation insights. We also thank Qiqi Yan and Salem Haykal for numerous discussions. Images throughout this post sourced from Getty Images.