Categories
Misc

Cyber Security Analysis – Beginner’s Guide to Processing Security Logs in Python

This is the last installment of the series of articles on the RAPIDS ecosystem with this being the ninth installment. The series explores and discusses various aspects of RAPIDS that allow its users solve ETL (Extract, Transform, Load) problems, build ML (Machine Learning) and DL (Deep Learning) models, explore expansive graphs, process signal and system … Continued

This is the last installment of the series of articles on the RAPIDS ecosystem with this being the ninth installment. The series explores and discusses various aspects of RAPIDS that allow its users solve ETL (Extract, Transform, Load) problems, build ML (Machine Learning) and DL (Deep Learning) models, explore expansive graphs, process signal and system log, or use SQL language via BlazingSQL to process data.

Today’s interconnected world makes us more vulnerable to cyber attacks: ever-present IoT devices record and listen to what we do, spam and phishing emails threaten us every day, and attacks on networks that steal data can lead to serious consequences. These systems produce terabytes of logs full of information that can help detect and protect vulnerable systems. Estimating on the conservative side, a medium-sized company with hundreds to thousands of interconnected devices can produce upwards of 100GB of log files per day. Also, the rate of events that get logged can reach levels counted in tens of thousands per second.

CLX (pronounced clicks) is part of the RAPIDS ecosystem that accelerates the processing and analysis of cyber logs. As part of RAPIDS, it builds on top of RAPIDS DataFrames cuDF, and further extends the capabilities of the RAPIDS ML library cuML by tapping into the latest advances in natural language processing field to organize unstructured data and build classification models.

 The previous posts in the series showcased other areas:

In this post, we introduce CLX. To help with getting familiar with CLX, we also published a cheat sheet that can be downloaded here CLX-cheatsheet, and an interactive notebook with all the current functionality of CLX showcased here.

Cybersecurity

With the advent of personalized computers, the adversarial games shifted from pure reconnaissance missions and traditional warfare to interrupting the computer systems of one’s enemy. The organizations like the National Security Agency (NSA) are full of scientists of various backgrounds that daily try to keep our national networks safe so an adversary cannot access the power grid or our banking system. At this level stakes are exceedingly high as are the defense mechanisms to prevent such attacks.

Personal computers or business networks are a different measure: while an attack on a single computer or a network might not cripple or otherwise threaten the well-being of citizens, it can have profound effects on one person’s or a business’s finances and/or future opportunities.

Many of these attacks leave a trace, a breadcrumbs trail of information that can help a business detect an attack so it can defend itself against it. After all, any attack that compromises the ability of a company to conduct a business as usual leads to lost productivity. Worse yet, if an attacker gains access to and steals intellectual property, it may cripple or completely ruin such a business.

However, how can a business that generates upwards of 100GB per day of logs keep up with all this data flood?

cyBERT

Historically, the approach was to parse the logs using Regex. And while we’re big fans of Regex per se, such an approach becomes impractical if a business needs to maintain thousands of different patterns for every single type of a log such a business collects. And that’s where it all begins: in order to detect attacks on our network we need data, and in order to get data we need to parse logs. Without data we cannot train any machine learning model. No other way around it.

BERT (or Bi-directional Encoder Representations from Transformers) model is a deep neural network introduced by Google to build a better understanding of natural language. Unlike previous approaches for solving problems in the NLP field that relied on recurrent network architectures (like LSTM – Long Short-Term Memory), the BERT model is a feed-forward network that learns the context of a word by scanning a sentence in both directions. Thus, BERT would produce a different embedding (or numerical representation) for similarly sounding sentences, like ‘She is watching TV’ and ‘She is watching her kids grow’.

In the context of cybersecurity logs, such embedding can be helpful to distinguish between IP addresses, network endpoints, ports, or free flow comments or messages. In fact, by using BERT embeddings one can train a model to detect such entities. Enter cyBERT!

cyBERT is an automatic tool to parse logs and extract relevant information. To get started, we just need to load the model we intend to use:

cybert = Cybert()
cybert.load_model(
      'pytorch_model.bin'
    , 'config.json'
)

The pytorch_model.bin is a PyTorch model that was trained to recognize entities from Apache WebServer logs; it can be downloaded from the models.huggingface.co/bert/raykallen/cybert_apache_parser S3 bucket. In the same bucket we can find the config.json file.

Once we have the model loaded, CyBERT will utilize the power of NVIDIA GPUs to parse the logs at rapid speed to extract the useful information and produce a structured representation of the log information. The API makes it really simple.

logs_df = cudf.read_csv(‘apache_log.csv')
parsed_df, confidence_df = cybert.inference(logs_df["raw"])

The first dataframe returned contains all the parsed fields from the logs.

while the confidence_df DataFrame outlines how confident the CyBERT model is about each extracted piece of information.

As you can see the model is pretty confident and a glimpse at the data confirms that the extracted information matches the column name.

Want to try other functionality of CLX or simply run through the above examples? Go to the CLX cheatsheet here!

Categories
Misc

NVIDIA Deep Learning Institute Announces Public Workshop Summer Schedule

NVIDIA DLI Summer WorkshopsWorkshops are conducted live in a virtual classroom environment with expert guidance from NVIDIA-certified instructors.NVIDIA DLI Summer Workshops

Training for Success

Continuing its popular public workshop series, the NVIDIA Deep Learning Institute (DLI) released the schedule for June, July, and August of 2021. These workshops are conducted live in a virtual classroom environment with expert guidance from NVIDIA-certified instructors. Participants have access to fully configured GPU-accelerated servers in the cloud to perform hands-on exercises. 

Learning new and advanced software development skills is vital to staying ahead in a competitive job market. DLI offers a comprehensive learning experience on a wide range of important topics in AI, data science, and accelerated computing. Gain hands-on experience with the most widely used, industry-standard software, tools, and frameworks. Successful completion of the course and assessment earns an NVIDIA certificate of competency.

Among the workshops scheduled are two new courses: Accelerating Data Engineering Pipelines and Building Conversational AI Applications

To register, visit our website. Space is limited so we encourage you to sign up early.

Here is our current public workshop schedule:

June

Fundamentals of Accelerated Data Science
Tue, June 22, 9:00 a.m. to 5:00 p.m. CEST (EMEA)
Wed, June 23, 9:00 a.m. to 5:00 p.m. PDT (NALA)

Building Intelligent Recommender Systems
Wed, June 23, 9:00 a.m. to 5:00 p.m. CEST (EMEA)
Thu, June 24, 9:00 a.m. to 5:00 p.m. PDT (NALA)

July

Accelerating Data Engineering Pipelines
Tue, July 6, 9:00 a.m. to 5:00 p.m. CEST (EMEA)
Tue, July 13 9:00 a.m. to 5:00 p.m. PDT (NALA)

Building Transformer-Based Natural Language Processing Applications 
Wed, July 7, 9:00 a.m. to 5:00 p.m. CEST (EMEA)

Fundamentals of Accelerated Computing with CUDA Python
Wed, July 14, 9:00 a.m. to 5:00 p.m. PDT (NALA)

August

Building Conversational AI Applications
Tue, August 24, 9:00 a.m. to 5:00 p.m. PDT (NALA)
Tue, August 31, 9:00 a.m. to 5:00 p.m. CEST (EMEA)

Fundamentals of Deep Learning
Wed, August 25, 9:00 a.m. to 5:00 p.m. PDT (NALA)

Fundamentals of Accelerated Computing with CUDA Python
Thu, August 26, 9:00 a.m. to 5:00 p.m. CEST (EMEA)

Visit the DLI website for details on each course and the full schedule of upcoming instructor-led workshops, which is regularly updated with new training opportunities.

For more information, email nvdli@nvidia.com.

Categories
Misc

Unreal Engine 5 Early Access Available Now with DirectX Raytracing, NVIDIA DLSS, and NVIDIA Reflex Support

Unreal Engine 5 (UE5) is available in Early Access, delivering the next-generation engine from Epic Games that will further propel the industry forward.

NVIDIA RTX, Unreal Engine 5 Define Future of Game Development and Content Creation

Today, Unreal Engine 5 (UE5) is available in Early Access, delivering the next-generation engine from Epic Games that will further propel the industry forward.

Unreal Engine is used by more than 11 million creators, making it one of the most popular game engines in the world, and one that continuously pushes the boundaries of what’s possible with real-time technology.  UE5 represents a generational leap in both workflows and visual fidelity, extending the engine’s support for DirectX Raytracing, NVIDIA DLSS, and NVIDIA Reflex, and adding new features such as Nanite and Lumen that make it faster and easier for games to implement photorealistic visuals, large open worlds and advanced animation and physics.

In 2018, NVIDIA launched our RTX technology alongside a stunning Star Wars demo called Reflections, which was built on Epic Games’ Unreal Engine 4. This laid out a vision of a new era of computer graphics for video games that featured photorealistic, ray-traced lighting, AI-powered effects and complex worlds with massive amounts of geometry and high-resolution textures.

Enabling this vision were RTX GPUs with dedicated cores for ray tracing and AI, as well as new hardware capabilities for increased geometric detail and texture streaming. The RTX games that have been released over the last three years, including Fortnite, Metro Exodus and Cyberpunk 2077, have stepped us closer to this vision.

Dynamic Open Worlds Full of Geometric Detail

With the introduction of Nanite and Lumen in UE5, developers can create games that contain massive amounts of geometric detail with fully dynamic global illumination.

Nanite enables film-quality source art consisting of millions or billions of polygons to be directly imported into Unreal Engine — all while maintaining a real-time frame rate and without sacrificing fidelity. Nanite intelligently streams and processes only the detail you can perceive, largely removing poly count and draw call constraints, and eliminating time-consuming work like baking details to normal maps and manually authoring levels of detail. This allows users to focus less on tedious tasks and more on creativity.

With Lumen, developers can create more dynamic scenes where indirect lighting adapts on the fly, such as changing the sun angle with the time of day, turning on a flashlight or opening an exterior door. Lumen removes the need for authoring lightmap UVs, waiting for lightmaps to bake or placing reflection captures, which results in crucial time savings in the development process.

UE5 is making it easier to develop expansive open worlds and provides developers with the GPU-accelerated tools to better animate characters and build audio pipelines.

“We’ve utilized RTX GPUs extensively throughout the development of Unreal Engine 5 and all of the respective sample content released today,” said Nick Penwarden, Vice President of Engineering at Epic Games. “Thanks to a tight integration with NVIDIA’s tools and technologies our team is able to more easily optimize and stabilize UE5 for everyone.”

NVIDIA DLSS and Reflex in Unreal Engine 5

NVIDIA is working with Epic Games to enable a broad suite of technologies in UE5, starting with two of our most popular features — Deep Learning Super Sampling (DLSS) and NVIDIA Reflex.

DLSS taps into the power of a deep learning neural network to boost frame rates and generate beautiful, sharp images. Reflex aligns CPU work to complete just in time for the GPU to start processing, minimizing latency and improving system responsiveness.

DLSS source code and NVIDIA Reflex is available now in Unreal Engine 5, and we will be releasing the DLSS plugin for UE5 in the coming weeks.

Learn more about DLSS, and experience it by downloading the NVIDIA RTX Technology Showcase.

Get more information about NVIDIA Reflex and its impact in competitive games by watching this on-demand GTC session.

Take Unreal Engine 5 for a Test Drive on RTX GPUs Today

Developers can download Unreal Engine 5 Early Access from Epic Games here. Grab an NVIDIA RTX GPU and install the latest NVIDIA graphics driver on your system for the best experience.

With the processing power of RTX GPUs and the next-generation Unreal Engine 5, there’s no limit to what you can create.

Categories
Misc

Does Volatility Harvesting Really Work?

Categories
Offsites

Does Volatility Harvesting Really Work?

Categories
Misc

Installing TensorFlow on Windows with Cuda

I’ve been trying to start the mit deep learning lab1. I know it comes with the notebook you can run in google colab, but I’d like to be able to play with this locally and hopefully experiment easier. I had a ton of problems getting this to work, but I finally did it. Here are some notes.

  1. Install python 3.8 from the python website not from windows app marketplace. I was finding that the windows app one seems to be sandboxed and cannot fully find stuff in PATH.
  2. I grabbed tensorflow nightly as I could not get the default to work at this time.
  3. I used cuda 11.0 update 2. (This is the one I have working other cuda 11s might work). This was ” cuda_11.0.3_451.82_win10.exe ”
  4. I used cuDNN ” cudnn-11.0-windows-x64-v8.0.4.30.zip ” use ” cudnn-11.2-windows-x64-v8.1.0.77.zip ” now. Had a version error later during running a NN but not during install.
  5. nVidia cuDNN check that path is to do the bin folder. Settings -> System -> About -> Advanced System Settings. Advanced tab -> Environment variables.To compare my PATH looks like:C:toolscudabin (this should be the only one you need to manually add)C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.0binC:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.0extrasCUPTIlib64C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.0includeC:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.0libnvvpC:Program FilesNVIDIA CorporationNsight Compute 2020.1.2
  6. If you do all of this it still wont work, but pay attention to the errors! there is a dll in C:Program FilesNVIDIA GPU Computing ToolkitCUDAv11.0bin that it cannot find.
  7. This dll is infact xyz_10.dll in the folder but python wants to grab xyz_11.dll. I renamed this to _11.dll and all works. Note xyz is some random name as I for get the exact file name now, but there was only one.

Also I am running this on a non AVX supported cpu (X58 era W3690) with at GTX 1060 6GB. Hope this helps I found a lot of similar posts in various sites with abandonment or no answers and almost gave up. Reposted this as I forgot to join before submitting and this was marked as spam.

submitted by /u/robotStefan
[visit reddit] [comments]

Categories
Misc

Having trouble using customized generator in fit validation

I’m writing a customized generator DataGenerator for model.fit(). It works good if I don’t specify validation_data parameter of model.fit().

python model.fit(training_generator.data_generator(), steps_per_epoch=training_generator.steps, epochs=EPOCHS, verbose=VERBOSE)

If I try to specify validation_data parameter like following

python model.fit(training_generator.data_generator(), steps_per_epoch=training_generator.steps, validation_data=validation_generator.data_generator(), validation_steps=validation_generator.steps, epochs=EPOCHS, verbose=VERBOSE)

I will get the warning W tensorflow/core/kernels/data/ge nerator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled.

Train for 375 steps, validate for 93 steps Epoch 1/200 375/375 [==============================] – 1s 3ms/step – loss: 1.3730 – accuracy: 0.6803 – val_loss: 0.8896 – val_accuracy: 0.8324 Epoch 2/200 372/375 [============================>.] – ETA: 0s – loss: 0.7859 – accuracy: 0.83382021-05-25 13:37:10.626497: W tensorflow/core/kernels/data/ge nerator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled 375/375 [==============================] – 1s 2ms/step – loss: 0.7854 – accuracy: 0.8338 – val_loss: 0.6558 – val_accuracy: 0.8585 Epoch 3/200 373/375 [============================>.] – ETA: 0s – loss: 0.6376 – accuracy: 0.85292021-05-25 13:37:11.257041: W tensorflow/core/kernels/data/ge nerator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled 375/375 [==============================] – 1s 2ms/step – loss: 0.6371 – accuracy: 0.8530 – val_loss: 0.5558 – val_accuracy: 0.8727

Is there any wrong with my generator?

Full code is

“`python import numpy as np import tensorflow as tf from tensorflow import keras

Network and training parameters

EPOCHS = 200 BATCH_SIZE = 128 VERBOSE = 1 CLASSES_NUM = 10 # Number of outputs = number of digits VALIDATION_SPLIT=0.2 # How much TRAIN is reserved for VALIDATION

class DataGenerator: def init(self, x, y, classes_num, batch_size=32, shuffle=True): ‘Initialization’ self.x = x self.y = y

 self.batch_size = batch_size self.classes_num = classes_num self.shuffle = shuffle self.steps = int(np.floor(len(self.x) / self.batch_size)) def __iter__(self): indexes = np.arange(len(self.x)) if self.shuffle == True: np.random.shuffle(indexes) for start in range(0, len(self.x), self.batch_size): end = min(start + self.batch_size, len(self.x)) idxes = indexes[start:end] batch_x = [self.x[idx] for idx in idxes] batch_y = [self.y[idx] for idx in idxes] yield np.array(batch_x), tf.keras.utils.to_categorical(batch_y, num_classes=self.classes_num) def data_generator(self): while True: yield from self.__iter__() 

Loading MNIST dataset

mnist = keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data()

You can verify that the split between train and test is 60,000, and 10,000 respectively.

assert x_train.shape == (60000, 28, 28) assert x_test.shape == (10000, 28, 28) assert y_train.shape == (60000,) assert y_test.shape == (10000,)

X_train is 60000 rows of 28×28 values –> reshaped in 60000 x 784

RESHAPED = 784

x_train = x_train.reshape(x_train.shape[0], RESHAPED) x_test = x_test.reshape(x_test.shape[0], RESHAPED) x_train = x_train.astype(‘float32’) x_test = x_test.astype(‘float32’)

Normalize inputs to be within in [0, 1].

x_train /= 255 x_test /= 255

total = len(x_train)//BATCH_SIZE pivot = int((1-VALIDATION_SPLIT)*len(y_train)) x_train, x_validation = x_train[:pivot], x_train[pivot:] y_train, y_validation = y_train[:pivot], y_train[pivot:]

training_generator = DataGenerator(x_train, y_train, CLASSES_NUM, BATCH_SIZE) validation_generator = DataGenerator(x_validation, y_validation, CLASSES_NUM, BATCH_SIZE)

Build the model

model = tf.keras.models.Sequential() model.add(keras.layers.Dense(CLASSES_NUM, input_shape=(RESHAPED,), name=’dense_layer’, activation=’softmax’))

Summary of the model

model.summary()

Compile the model

model.compile(optimizer=’SGD’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])

model.fit(training_generator.data_generator(), steps_per_epoch=training_generator.steps, validation_data=validation_generator.data_generator(), validation_steps=validation_generator.steps, epochs=EPOCHS, verbose=VERBOSE)

y_test = tf.keras.utils.to_categorical(y_test, CLASSES_NUM)

Evaluate the model

test_loss, test_acc = model.evaluate(x_test, y_test) print(‘Test loss:’, test_loss) print(‘Test accuracy:’, test_acc)

Make prediction

predictions = model.predict(x_test) “`

submitted by /u/Ynjxsjmh
[visit reddit] [comments]

Categories
Offsites

Understanding Contextual Facial Expressions Across the Globe

It might seem reasonable to assume that people’s facial expressions are universal — so, for example, whether a person is from Brazil, India or Canada, their smile upon seeing close friends or their expression of awe at a fireworks display would look essentially the same. But is that really true? Is the association between these facial expressions and their relevant context across geographies indeed universal? What can similarities — or differences — between the situations where someone grins or frowns tell us about how people may be connected across different cultures?

Scientists seeking to answer these questions and to uncover the extent to which people are connected across cultures and geography often use survey-based studies that can rely heavily on local language, norms, and values. However, such studies are not scalable, and often end up with small sample sizes and inconsistent findings.

In contrast to survey-based studies, studying patterns of facial movement provides a more direct understanding of expressive behavior. But analyzing how facial expressions are actually used in everyday life would require researchers to go through millions of hours of real-world footage, which is too time-consuming to do manually. In addition, facial expressions and the contexts in which they are exhibited are complicated, requiring large sample sizes in order to make statistically sound conclusions. While existing studies have produced diverging answers to the question of the universality of facial expressions in given contexts, applying machine learning (ML) in order to appropriately scale the research has the potential to provide clarity.

In “Sixteen facial expressions occur in similar contexts worldwide”, published in Nature, we present research undertaken in collaboration with UC Berkeley to conduct the first large-scale worldwide analysis of how facial expressions are actually used in everyday life, leveraging deep neural networks (DNNs) to drastically scale up expression analysis in a responsible and thoughtful way. Using a dataset of six million publicly available videos across 144 countries, we analyze the contexts in which people use a variety of facial expressions and demonstrate that rich nuances in facial behavior — including subtle expressions — are used in similar social situations around the world.

A Deep Neural Network Measuring Facial Expression
Facial expressions are not static. If one were to examine a person’s expression instant by instant, what might at first appear to be “anger”, may instead end up being “awe”, “surprise” or “confusion”. The interpretation depends on the dynamics of a person’s face as their expression presents itself. The challenge in building a neural network to understand facial expressions, then, is that it must interpret the expression within its temporal context. Training such a system requires a large and diverse, cross-cultural dataset of videos with fully annotated expressions.

To build the dataset, skilled raters manually searched through a broad collection of publicly available videos to identify those likely to contain clips covering all of our pre-selected expression categories. To ensure that the videos matched the region they were assumed to represent, preference in video selection was given to those that included the geographic location of origin. The faces in the videos were then found using a deep convolutional neural network (CNN) — similar to the Google Cloud Face Detection API — that follows faces over the course of the clip using a method based on traditional optical flow. Using an interface similar to Google Crowdsource, annotators then labeled facial expressions across 28 distinct categories if present at any point during the clip. Because the goal was to sample how an average person would perceive an expression, the annotators were not coached or trained, nor were they provided examples or definitions of the target expressions. We discuss additional experiments to evaluate whether the model trained from these annotations was biased below.

Raters were presented videos with a single face highlighted for their attention. They observed the subject throughout the duration of the clip and annotated the facial expressions they exhibited. (source video)

The face detection algorithm established a sequence of locations of each face throughout the video. We then used a pre-trained Inception network to extract features representing the most salient aspects of facial expressions from the faces. The features were then fed into a long short-term memory (LSTM) network, a type of recurrent neural network that is able to model how a facial expression might evolve over time due to its ability to remember salient information from the past.

In order to ensure that the model was making consistent predictions across a range of demographic groups, we evaluated the model fairness on an existing dataset that was constructed using similar facial expression labels, targeting a subset of 16 expressions on which it exhibited the best performance.

The model’s performance was consistent across all of the demographic groups represented in the evaluation dataset, which provides supporting evidence that the model trained to annotated facial expressions is not measurably biased. The model’s annotations of those 16 facial expressions across 1,500 images can be explored here.

We modeled the selected face in each video by using a CNN to extract features from the face at each frame, which were then fed into an LSTM network to model the changes in the expression over time. (source video)

Measuring the Contexts Captured in Videos
To understand the context of facial expressions across millions of videos, we used DNNs that could capture the fine-grained content and automatically recognize the context. The first DNN modeled a combination of text features (title and description) associated with a video along with the actual visual content (video-topic model). In addition, we used a DNN that only relied on text features without any visual information (text-topic model). These models predict thousands of labels describing the videos. In our experiments these models were able to identify hundreds of unique contexts (e.g., wedding, sporting event, or fireworks) showcasing the diversity of the data we used for the analysis.

The Covariation Between Expressions and Contexts Around the World
In our first experiment, we analyzed 3 million public videos captured on mobile phones. We chose to focus on mobile uploads because they are more likely to contain natural expressions. We correlated the facial expressions that occurred in the videos to the context annotations derived from the video-topic model. We found 16 kinds of facial expressions had distinct associations with everyday social contexts that were consistent across the world. For instance, the expressions that people associate with amusement occurred more often in videos with practical jokes; expressions that people associate with awe, in videos with fireworks; and triumph, with sporting events. These results have strong implications for discussions about the relative importance of psychologically relevant context in facial expression, compared to other factors, such as those unique to an individual, culture, or society.

Our second experiment analyzed a separate set of 3 million videos, but this time we annotated the contexts with the text-topic model. The results verified that the findings in the first experiment were not driven by subtle influences of facial expressions in the video on the annotations of the video-topic model. In other words we used this experiment to verify our conclusions from the first experiment given the possibility that the video-topic model could implicitly be factoring in facial expressions when computing its content labels.

We correlated the expression and context annotations across all of the videos within each region. Each expression was found to have specific associations with different contexts that were preserved across 12 world regions. For example, here, in red, we can see that expressions people associate with awe were found more often in the context of fireworks, pets, and toys than in other contexts.

In both experiments, the correlations between expressions and contexts appeared to be well-preserved across cultures. To quantify exactly how similar the associations between expressions and contexts were across the 12 different world regions we studied, we computed second-order correlations between each pair of regions. These correlations identify the relationships between different expressions and contexts in each region and then compare them with other regions. We found that 70% of the context–expression associations found in each region are shared across the modern world.

Finally, we asked how many of the 16 kinds of facial expression we measured had distinct associations with different contexts that were preserved around the world. To do so, we applied a method called canonical correlations analysis, which showed that all 16 facial expressions had distinct associations that were preserved across the world.

Conclusions
We were able to examine the contexts in which facial expressions occur in everyday life across cultures at an unprecedented scale. Machine learning allowed us to analyze millions of videos across the world and discover evidence supporting hypotheses that facial expressions are preserved to a degree in similar contexts across cultures.

Our results also leave room for cultural differences. Although the correlations between facial expressions and contexts were 70% consistent around the world, they were up to 30% variable across regions. Neighboring world regions generally had more similar associations between facial expressions and contexts than distant world regions, indicating that the geographic spread of human culture may also play a role in the meanings of facial expressions.

This work shows that we can use machine learning to better understand ourselves and identify common communication elements across cultures. Tools such as DNNs give us the opportunity to provide vast amounts of diverse data in service of scientific discovery, enabling more confidence in the statistical conclusions. We hope our work provides a template for using the tools of machine learning in a responsible way and sparks more innovative research in other scientific domains.

Acknowledgements
Special thanks to our co-authors Dacher Keltner from UC Berkeley, along with Florian Schroff, Brendan Jou, and Hartwig Adam from Google Research. We are also grateful for additional support at Google provided by Laura Rapin, Reena Jana, Will Carter, Unni Nair, Christine Robson, Jen Gennai, Sourish Chaudhuri, Greg Corrado, Brian Eoff, Andrew Smart, Raine Serrano, Blaise Aguera y Arcas, Jay Yagnik, and Carson Mcneil.

Categories
Misc

On-Demand Technical Sessions: Develop and Deploy AI Solutions in the Cloud Using NVIDIA NGC

At GTC ’21, experts presented a variety of technical talks to help people new to AI, or those just looking for tools to speed-up their AI development using the various components of NGC.

At GTC ’21, experts presented a variety of technical talks to help people new to AI, or just those looking for tools to speed-up their AI development using the various components of the NGC catalog, including:

  • AI containers optimized to speed up AI/ML training and inference
  • Pretrained models that provide an advanced starting point to build custom models
  • Industry-specific AI SDKs that transform applications into AI-powered ones
  • Helm charts to provide consistent and faster deployments
  • Collections that bring together all the software needed for various use cases

Watch these on-demand sessions to learn how to build solutions in the cloud with NVIDIA AI software from NGC.

Building a Text-to-Speech Service that Sounds Like You

This session shows how to build a TTS model for expressive speech using pretrained models. The model is fine-tuned with speech samples and customized for the variability in speech performing style transfer from other speakers. The provided tools let developers create a model for their voice and style and make the TTS service sound like them!

Analyzing Traffic Video Streams at Scale

This session demonstrates how to use the Transfer Learning Toolkit and pretrained models to build computer vision models and run inference on over 1,000 live video feeds on a single AWS instance powered by NVIDIA A100 GPUs.

Deploy Compute and Software Resources to Run AI/ML Applications in Azure Machine Learning with Just Two Commands

This session shows how to building a taxi fare prediction application using RAPIDS and shows how to automatically set up a DASK cluster with multiple Azure virtual machines to support large datasets, mount data into the Dask scheduler and workers, deploy GPU-optimized AI software from the NGC catalog to train models, and then make taxi fare predictions.

Build and Deploy AI Applications Faster on Azure Machine Learning

This session demonstrates the basics of Azure Machine Learning (AzureML) Platform, the benefits of using the NGC catalog, and how to leverage the NGC-AzureML Quick Launch Toolkit to build an end-to-end AI application in AzureML.

If you’re building an AI solution from scratch or just want to replicate the use cases shown in the above sessions, start with the NGC catalog.

Categories
Misc

Generating High-Quality Labels for Speech Recognition with Label Studio and NVIDIA NeMo

Save time and produce a more accurate result when processing audio data with automated speech recognition (ASR) models from NVIDIA NeMo and Label Studio.

You can save time and produce a more accurate result when processing audio data with automated speech recognition (ASR) models from NVIDIA NeMo and Label Studio.

NVIDIA NeMo provides reusable neural modules that make it easy to create new neural network architectures, including prebuilt modules and ready-to-use models for ASR. With the power of NVIDIA NeMo, you can get audio transcriptions from the pretrained speech recognition models. Add Label Studio and its open-source data labeling capabilities to the mix and you can improve the transcription quality even further. 

Solution

Diagram showing audio data with speech and NeMo ASR models as inputs into Label Studio process with prelabeled data being reviewed and corrected, then exported in ASR manifest format.
Figure 1. ASR workflow with Label Studio and NeMo to annotate and correct transcripts.

Follow the steps in this post to set up NVIDIA NeMo ASR with Label Studio to produce high-quality audio transcripts.

  1. Connect the NVIDIA NeMo model to transcribe audio files in Label Studio automatically.
  2. Set up the audio transcription project.
  3. Validate and export revised audio transcripts from Label Studio.
  4. Fine-tune a NeMo ASR model with the revised audio transcripts from Label Studio.

Prerequisites

Before you start, make sure that you have the following resources:

  • Audio data files. This audio might be recordings of customer service calls, phone orders, sales conversations, or other recorded audio with people talking. The audio files must be in one of the following file formats:
    • WAV
    • AIFF
    • MP3
    • AU
    • FLAC
  • Label Studio installed. Install Label Studio using your preferred method on your local machine or a cloud server. For more information, see Quickstart in the Label Studio documentation.
  • NeMo toolkit installed

Free audio data

If you don’t have any audio data in mind, you can use an example dataset or a historical audio dataset:

There are a number of other ASR datasets that you can use. For more information, see Datasets — Introduction. You can also use public domain audio recording collections on the Library of Congress website, such as the Sports Byline collection of interviews with American baseball players.

After you identify the audio to transcribe, you can start processing it.

Install Label Studio ML backend

After you install Label Studio, install the Label Studio machine learning backend. From the command line, run the following command:

git clone https://github.com/heartexlabs/label-studio-ml-backend 

Set up the environment:

cd label-studio-ml-backend

# Install label-studio-ml and its dependencies
pip install -U -e .

# Install the nemo example dependencies
pip install -r label_studio_ml/examples/requirements.txt

Connect the NVIDIA NeMo model to transcribe audio files in Label Studio automatically

To prelabel the data with predictions from a pretrained ASR model, set up the NeMo toolkit as a machine learning backend in Label Studio. The Label Studio machine learning backend lets you use a pretrained model to prelabel your data.

Label Studio includes an example using the pretrained QuartzNet15x5 model developed with NeMo from the NGC cloud, but you can set up a different model with your data if another one is a better fit. For more information, see the list of ASR models available from NeMo.

From the command line, set up NeMo as a machine learning backend and start a new Label Studio project with the model.

  1. Install the NeMo toolkit in a Docker container or using pip.
  2. Download a NeMo ASR model. The provided Label Studio example script downloads the pretrained QuartzNet model from the NGC cloud. To use a different model, download that model from NGC.
  3. From the command line, start the Label Studio machine learning backend.
    label-studio-ml init my_model --from label_studio_ml/examples/nemo/asr.py
  4. Start the machine learning backend. By default, the model starts on localhost with port 9090.
    label-studio-ml start my_model
  5. Start Label Studio with the model.
    label-studio start my_project --ml-backends http://localhost:9090

Set up the audio transcription project

After you start Label Studio, import your audio data and set up the right template to configure labeling. The audio transcription template is the best one for automated speech recognition and makes it easy to annotate the audio data.

Open Label Studio, import your data, and select the template.

  1. Choose Import and import your audio data as plain text or JSON files referencing valid URLs for the audio files hosted in online storage such as Amazon S3. For more information, see Get data into Label Studio.
GIF showing how to import data into Label Studio. Content duplicated in surrounding text.
Figure 2. process of importing data into Label Studio..

2. From the Tasks list, choose Settings.
3. On the Labeling Interface tab, browse the templates and select the Automated Speech Recognition template.
4. Choose Save.

Validate and output the model predictions

As an annotator, review the tasks for the audio data on the task interface and validate. If necessary, correct the transcript predicted by the NeMo speech model.

  1. From the list of tasks in Label Studio, choose Label.
  2. For each audio sample, listen to the audio and review the transcription produced by the NeMo model as part of the prelabeling process.
  3. If any words in the transcript are incorrect, update them.
  4. Save the changes to the transcript. Choose Submit to submit the transcript and review the next audio sample.

Next, export the completed audio transcripts from Label Studio in the proper format expected by the NeMo model, as described in NeMo ASR collection in the NVIDIA NeMo documentation. 

To export the completed audio, do the following:

  1. From the list of tasks in Label Studio, choose Export.
  2. Select the audio transcript JSON format called ASR_MANIFEST.

For more information about the available export formats in Label Studio, see Export results from Label Studio.

Use high-quality transcripts to fine-tune your ML model

When you’re done processing the audio and adjusting the transcribed text, you’re left with audio transcripts that you can use to retrain the ASR models included in NeMo. Label Studio produces annotations that are fully compatible with NeMo training.

To update the QuartzNet model checkpoint, you can do it in a few lines of code, train the model from scratch, or use PyTorch Lightning. Examples are also available in the NeMo Jupyter notebook. For more information, see Transfer Learning in the ASR with NeMo Jupyter notebook.

By using Label Studio and NeMo together, you can save time processing each audio file from scratch. NeMo gives you a highly accurate prediction right away, and Label Studio helps make that prediction perfect. Try it today!