DataBloom - Part 410

A Fast WordPiece Tokenization System

Post author By
Post date December 10, 2021
No Comments on A Fast WordPiece Tokenization System

Posted by Xinying Song, Staff Software Engineer and Denny Zhou, Senior Staff Research Scientist, Google Research

Tokenization is a fundamental pre-processing step for most natural language processing (NLP) applications. It involves splitting text into smaller units called tokens (e.g., words or word segments) in order to turn an unstructured input string into a sequence of discrete elements that is suitable for a machine learning (ML) model. ln deep learning–based models (e.g., BERT), each token is mapped to an embedding vector to be fed into the model.

Tokenization in a typical deep learning model, like BERT.

A fundamental tokenization approach is to break text into words. However, using this approach, words that are not included in the vocabulary are treated as “unknown”. Modern NLP models address this issue by tokenizing text into subword units, which often retain linguistic meaning (e.g., morphemes). So, even though a word may be unknown to the model, individual subword tokens may retain enough information for the model to infer the meaning to some extent. One such subword tokenization technique that is commonly used and can be applied to many other NLP models is called WordPiece. Given text, WordPiece first pre-tokenizes the text into words (by splitting on punctuation and whitespaces) and then tokenizes each word into subword units, called wordpieces.

The WordPiece tokenization process with an example sentence.

In “Fast WordPiece Tokenization”, presented at EMNLP 2021, we developed an improved end-to-end WordPiece tokenization system that speeds up the tokenization process, reducing the overall model latency and saving computing resources. In comparison to traditional algorithms that have been used for decades, this approach reduces the complexity of the computation by an order of magnitude, resulting in significantly improved performance, up to 8x faster than standard approaches. The system has been applied successfully in a number of systems at Google and has been publicly released in TensorFlow Text.

Single-Word WordPiece Tokenization
WordPiece uses a greedy longest-match-first strategy to tokenize a single word — i.e., it iteratively picks the longest prefix of the remaining text that matches a word in the model’s vocabulary. This approach is known as maximum matching or MaxMatch, and has also been used for Chinese word segmentation since the 1980s. Yet despite its wide use in NLP for decades, it is still relatively computation intensive, with the commonly adopted MaxMatch approaches’ computation being quadratic with respect to the input word length (n). This is because two pointers are needed to scan over the input: one to mark a start position, and the other to search for the longest substring matching a vocabulary token at that position.

We propose an alternative to the MaxMatch algorithm for WordPiece tokenization, called LinMaxMatch, which has a tokenization time that is strictly linear with respect to n. First, we organize the vocabulary tokens in a trie (also called a prefix tree), where each trie edge is labeled by a character, and a tree path from the root to some node represents a prefix of some token in the vocabulary. In the figure below, nodes are depicted as circles and tree edges are black solid arrows. Given a trie, a vocabulary token can be located to match an input text by traversing from the root and following the trie edges to match the input character by character; this process is referred to as trie matching.

The figure below shows the trie created from the vocabulary consisting of “a”, “abcd”, “##b”, “##bc”, and “##z”. An input text “abcd” can be matched to a vocabulary token by walking from the root (upper left) and following the trie edges with labels “a”, “b”, “c”, “d” one by one. (The leading “##” symbols are special characters used in WordPiece tokenization that are described in more detail below.)

Trie diagram of the vocabulary [“a”, “abcd”, “##b”, “##bc”, “##z”]. Circles and arrows represent nodes and edges along the trie, respectively.

Second, inspired by the Aho-Corasick algorithm, a classical string-searching algorithm invented in 1975, we introduce a method that breaks out of a trie branch that fails to match the given input and skips directly to an alternative branch to continue matching. As in standard trie matching, during tokenization, we follow the trie edges to match the input characters one by one. When trie matching cannot match an input character for a given node, a standard algorithm would backtrack to the last character where a token was matched and then restart the trie matching procedure from there, which results in repetitive and wasteful iterations. Instead of backtracking, our method triggers a failure transition, which is done in two steps: (1) it collects the precomputed tokens stored at that node, which we call failure pops; and (2) it then follows the precomputed failure link to a new node from which the trie matching process continues.

For example, given a model with the vocabulary described above (“a”, “abcd”, “##b”, “##bc”, and “##z”), WordPiece tokenization distinguishes subword tokens matching at the start of the input word from the subword tokens starting in the middle (the latter being marked with two leading hashes “##”). Hence, for input text “abcz”, the expected tokenization output is [“a”, “##bc”, “##z”], where “a” matches at the beginning of the input while “##bc” and “##z” match in the middle. For this example, the figure below shows that, after successfully matching three characters ‘a’, ‘b’, ‘c’, trie matching cannot match the next character ‘z’ because “abcz” is not in the vocabulary. In this situation, LinMaxMatch conducts a failure transition by outputting the first recognized token (using the failure pop token “a”) and following the failure link to a new node to continue the matching process (in this case, node with “##bc” as the failure pop tokens).The process then repeats from the new node.

Trie structure for the same vocabulary as shown in the example above, now illustrating the approach taken by our new Fast WordPiece Tokenizer algorithm. Failure pops are bracketed and shown in purple. Failure links between nodes are indicated with dashed red line arrows.

Since at least n operations are required to read the entire input, the LinMaxMatch algorithm is asymptotically optimal for the MaxMatch problem.

End-to-End WordPiece Tokenization
Whereas the existing systems pre-tokenize the input text (splitting it into words by punctuation and whitespace characters) and then call WordPiece tokenization on each resulting word, we propose an end-to-end WordPiece tokenizer that combines pre-tokenization and WordPiece into a single, linear-time pass. It uses the LinMaxMatch trie matching and failure transitions as much as possible and only checks for punctuation and whitespace characters among the relatively few input characters that are not handled by the loop. It is more efficient as it traverses the input only once, performs fewer punctuation / whitespace checks, and skips the creation of intermediate words.

End-to-End WordPiece Tokenization.

Benchmark Results
We benchmark our method against two widely-adopted WordPiece tokenization implementations, HuggingFace Tokenizers, from the HuggingFace Transformer library, one of the most popular open-source NLP tools, and TensorFlow Text, the official library of text utilities for TensorFlow. We use the WordPiece vocabulary released with the BERT-Base, Multilingual Cased model.

We compared our algorithms with HuggingFace and TensorFlow Text on a large corpus (several million words) and found that the way the strings are split into tokens is identical to other implementations for both single-word and end-to-end tokenization.

To generate the test data, we sample 1,000 sentences from the multilingual Wikipedia dataset, covering 82 languages. On average, each word has four characters, and each sentence has 82 characters or 17 words. We found this dataset large enough because a much larger dataset (consisting of hundreds of thousands of sentences) generated similar results.

We compare the average runtime when tokenizing a single word or general text (end-to-end) for each system. Fast WordPiece tokenizer is 8.2x faster than HuggingFace and 5.1x faster than TensorFlow Text, on average, for general text end-to-end tokenization.

Average runtime of each system. Note that for better visualization, single-word tokenization and end-to-end tokenization are shown in different scales.

We also examine how the runtime grows with respect to the input length for single-word tokenization. Because of its linear-time complexity, the runtime of LinMaxMatch increases at most linearly with the input length, which is much slower than other quadratic-time approaches.

The average runtime of each system with respect to the input length for single-word tokenization.

Conclusion
We proposed LinMaxMatch for single-word WordPiece tokenization, which solves the decades-old MaxMatch problem in the asymptotically-optimal time with respect to the input length. LinMaxMatch extends the Aho-Corasick Algorithm, and the idea can be applied to more string search and transducer challenges. We also proposed an End-to-End WordPiece algorithm that combines pre-tokenization and WordPiece tokenization into a single, linear-time pass for even higher efficiency.

Acknowledgements
We gratefully acknowledge the key contributions and useful advices from other team members and colleagues, including Abbas Bazzi, Alexander Frömmgen, Alex Salcianu, Andrew Hilton, Bradley Green, Ed Chi, Chen Chen, Dave Dopson, Eric Lehman, Fangtao Li, Gabriel Schubiner, Gang Li, Greg Billock, Hong Wang, Jacob Devlin, Jayant Madhavan, JD Chen, Jifan Zhu, Jing Li, John Blitzer, Kirill Borozdin, Kristina Toutanova, Majid Hadian-Jazi, Mark Omernick, Max Gubin, Michael Fields, Michael Kwong, Namrata Godbole, Nathan Lintz, Pandu Nayak, Pew Putthividhya, Pranav Khaitan, Robby Neale, Ryan Doherty, Sameer Panwar, Sundeep Tirumalareddy, Terry Huang, Thomas Strohmann, Tim Herrmann, Tom Small, Tomer Shani, Wenwei Yu, Xiaoxue Zang, Xin Li, Yang Guo, Yang Song, Yiming Xiao, Yuan Shen, and many more.

Misc

Using Neural Nano-Optics Researchers Create a Camera the Size of a Salt Grain

Post author By
Post date December 9, 2021
No Comments on Using Neural Nano-Optics Researchers Create a Camera the Size of a Salt Grain

A picture of a finger with a half milimeter metasurface next to a blown-up version of the metasurface showing the optical cylinders spread out in a circular pattern. The groundbreaking technology uses an optical metasurface and machine-learning algorithms to produce high-quality color images with a wide field of view. A picture of a finger with a half milimeter metasurface next to a blown-up version of the metasurface showing the optical cylinders spread out in a circular pattern.

A team of researchers from Princeton and the University of Washington created a new camera that captures stunning images and measures in at only a half-millimeter—the size of a coarse grain of salt.

The new study, published in Nature Communications, outlines the use of optical metasurfaces with machine learning to produce high-quality color imagery, with a wide field of view. The device could be used across industries ranging from robotics to most notably the medical field, to help with disease diagnosis.

Optical metasurfaces rely on a new method of light manipulation, using cylindrical posts set on a small, square surface. The posts, which vary in geometry, work like antennas that can capture incoming photons (waves of electromagnetic radiation). These waves are then sent as signals from the metasurface to a computer to interpret and produce an image.

A graphic showing how a traditional camera with 6 lenses and a sensor works versus neural non-optics that illustrates a metasurface, sensor area, and software processing phase to create an image. — *Figure 1. Comparison of a conventional lens versus neural nano-optics.*

Tiny cameras offer vast potential for medical use, from brain imaging to minimally invasive endoscopies. But, up to this point, the technology has captured blurry, distorted images, with a limited field of view.

The researchers employed neural nano-optics—which combines optical metasurfaces with neural feature-based image reconstruction—to interpret the data and produce higher-quality images.

“To recover images from the measured data, we propose a feature-based deconvolution method that employs a differentiable inverse filter together with neural networks for feature extraction and refinement,” said senior author Felix Heide, an assistant professor of computer science at Princeton.

The team tested machine-learning algorithms with a simulator, comparing different configurations of the antennas. With 1.6 million cylinders on the optical surface and complex light interactions, the simulations demand massive amounts of memory.

The algorithm was tested and trained using the cuDNN-accelerated TensorFlow deep learning framework with an NVIDIA P100 GPU. The result is a new system capable of producing images comparable to a modern camera lens, from a device 500,000 times smaller. According to the study, the new camera is also 10 times better at filtering out errors than previous techniques.

Figure 2. A video comparing an image taken by the nano-camera (right) and the previous best (left). Courtesy of Princeton Computational Imaging Lab.

“It’s been a challenge to design and configure these little microstructures to do what you want,” Ethan Tseng, a computer science Ph.D. student at Princeton and study co-lead said in a press release. “For this specific task of capturing large field of view RGB images, it’s challenging because there are millions of these little microstructures, and it’s not clear how to design them in an optimal way.”

The team is now working to add more computational abilities to the camera and envisions features such as object detection and sensors for medicine or robotics in future iterations. Beyond that, they see a use case where ultra-compact imagers turn surfaces into sensors.

“We could turn individual surfaces into cameras that have ultra-high resolution. So you wouldn’t need three cameras on the back of your phone anymore, but the whole back of your phone would become one giant camera. We can think of completely different ways to build devices in the future,” said Heide.

The raw capture data and code used to design and evaluate the neural nano-optic is publicly accessible. More information is also available on the Neural Nano-Optics for High-quality Thin Lens Imaging webpage.

Read the published research in Nature Communications. >>
Read the press release. >>

Feature image courtesy of Princeton/Ethan Tseng, et al.

Offsites

More Efficient In-Context Learning with GLaM

Post author By
Post date December 9, 2021
No Comments on More Efficient In-Context Learning with GLaM

Posted by Andrew M Dai and Nan Du, Research Scientists, Google Research, Brain Team

Large language models (e.g., GPT-3) have many significant capabilities, such as performing few-shot learning across a wide array of tasks, including reading comprehension and question answering with very few or no training examples. While these models can perform better by simply using more parameters, training and serving these large models can be very computationally intensive. Is it possible to train and use these models more efficiently?

In pursuit of that question, today we introduce the Generalist Language Model (GLaM), a trillion weight model that can be trained and served efficiently (in terms of computation and energy use) thanks to sparsity, and achieves competitive performance on multiple few-shot learning tasks. GLaM’s performance compares favorably to a dense language model, GPT-3 (175B) with significantly improved learning efficiency across 29 public NLP benchmarks in seven categories, spanning language completion, open-domain question answering, and natural language inference tasks.

Dataset
To build GLaM, we began by building a high-quality 1.6 trillion token dataset containing language usage representative of a wide range of downstream use-cases for the model. Web pages constitute the vast quantity of data in this unlabelled corpus, but their quality ranges from professional writing to low-quality comment and forum pages. We then developed a text quality filter that was trained on a collection of text from Wikipedia and books (both of which are generally higher quality sources) to determine the quality of the content for a webpage. Finally, we applied this filter to generate the final subset of webpages and combined this with books and Wikipedia to create the final training dataset.

Model and Architecture
GLaM is a mixture of experts (MoE) model, a type of model that can be thought of as having different submodels (or experts) that are each specialized for different inputs. The experts in each layer are controlled by a gating network that activates experts based on the input data. For each token (generally a word or part of a word), the gating network selects the two most appropriate experts to process the data. The full version of GLaM has 1.2T total parameters across 64 experts per MoE layer with 32 MoE layers in total, but only activates a subnetwork of 97B (8% of 1.2T) parameters per token prediction during inference.

The architecture of GLaM where each input token is dynamically routed to two selected expert networks out of 64 for prediction.

Similar to the GShard MoE Transformer, we replace the single feedforward network (the simplest layer of an artificial neural network, “Feedforward or FFN” in the blue boxes) of every other transformer layer with a MoE layer. This MoE layer has multiple experts, each a feedforward network with identical architecture but different weight parameters. Even though this MoE layer has many more parameters, the experts are sparsely activated, meaning that for a given input token, only two experts are used, giving the model more capacity while limiting computation. During training, each MoE layer’s gating network is trained to use its input to activate the best two experts for each token, which are then used for inference. For a MoE layer of E experts, this essentially provides a collection of E×(E-1) different feedforward network combinations (instead of one as in the classic Transformer architecture), leading to more computational flexibility.

The final learned representation of a token will be the weighted combination of the outputs from the two experts. This allows different experts to activate on different types of inputs. To enable scaling to larger models, each expert within the GLaM architecture can span multiple computational devices. We use the GSPMD compiler backend to solve the challenges in scaling the experts and train several variants (based on expert size and number of experts) of this architecture to understand the scaling effects of sparsely activated language models.

Evaluation
We use a zero-shot and one-shot setting where the tasks are never seen during training. The benchmarks for evaluation include (1) cloze and completion tasks [1,2,3]; (2) Open-domain question answering [4,5,6]; (3) Winograd-style tasks [7,8]; (4) commonsense reasoning [9,10,11]; (5) in-context reading comprehension [12,13,14,15,16]; (6) the SuperGLUE tasks; and (7) natural language inference [17]. In total, there are eight natural language generation tasks (NLG) where the generated phrases are evaluated against the ground truth targets via Exact Match (EM) accuracy and F1 measure, and 21 language understanding tasks (NLU) where the prediction from several options is chosen via conditional log-likelihood. Some tasks have variants and SuperGLUE consists of multiple tasks. Both EM accuracy and F1 are scaled from 0 to 100 across all our results and averaged for the NLG score below. The NLU score is an average of accuracy and F1 scores.

Results
GLaM reduces to a basic dense Transformer-based language model architecture when each MoE layer only has one expert. In all experiments, we adopt the notation of (base dense model size) / (number of experts per MoE layer) to describe the GLaM model. For example, 1B/64E represents the architecture of a 1B parameter dense model with every other layer replaced by a 64 expert MoE layer. In the following sections, we explore GLaM’s performance and scaling properties, including baseline dense models trained on the same datasets. Compared with the recently announced Megatron-Turing model, GLaM is on-par on the seven respective tasks if using a 5% margin, while using 5x less computation during inference.

Below, we show the 1.2T-parameter sparsely activated model (GLaM) achieved higher results on average and on more tasks than the 175B-parameter dense GPT-3 model while using less computation during inference.

Average score for GLaM and GPT-3 on NLG (left) and NLU (right) tasks (higher is better).

Below we show a summary of the performance on 29 benchmarks compared to the dense model (GPT-3, 175B). GLaM exceeds or is on-par with the performance of the dense model on almost 80% of zero-shot tasks and almost 90% of one-shot tasks.

Evaluation	Higher (>+5%)	On-par (within 5%)	Lower (<-5%)
Zero-shot	13	11	5
One-shot	14	10	5

Moreover, while the full version of GLaM has 1.2T total parameters, it only activates a subnetwork of 97B parameters (8% of 1.2T) per token during inference.

	GLaM (64B/64E)	GPT-3 (175B)
Total Parameters	1.162T	0.175T
Activated Parameters	0.097T	0.175T

Scaling Behavior
GLaM has two ways to scale: 1) scale the number of experts per layer, where each expert is hosted within one computation device, or 2) scale the size of each expert to go beyond the limit of a single device. To evaluate the scaling properties, we compare the respective dense model (FFN layers instead of MoE layers) of similar FLOPS per token at inference time.

Average zero-shot and one-shot performance by increasing the size of each expert. The FLOPS per token prediction at inference time increases as the expert size grows.

As shown above, performance across tasks scales with the size of the experts. GLaM sparsely activated models also perform better than dense models for similar FLOPs during inference for generation tasks. For understanding tasks, we observed that they perform similarly at smaller scales, but sparsely activated models outperform at larger scales.

Data Efficiency
Training large language models is computationally intensive, so efficiency improvements are useful to reduce energy consumption.

Below we show the computation costs for the full version of GLaM.

Computation cost in GFLOPS both for inference, per token (left) and for training (right).

These compute costs show that GLaM uses more computation during training since it trains on more tokens, but uses significantly less computation during inference. We show comparisons using different numbers of tokens to train below.

We also evaluated the learning curves of our models compared to the dense baseline.

Average zero-shot and one-shot performance of sparsely-activated and dense models on eight generative tasks as more tokens are processed in training.

Average zero-shot and one-shot performance of sparsely-activated and dense models on 21 understanding tasks as more tokens are processed in training.

The results above show that sparsely activated models need to train with significantly less data than dense models to reach similar zero-shot and one-shot performance, and if the same amount of data is used, sparsely activated models perform significantly better.

Finally, we assessed the energy efficiency of GLaM.

Comparison of power consumption during training.

While GLaM uses more computation during training, thanks to the more efficient software implementation powered by GSPMD and the advantage of TPUv4, it uses less power to train than other models.

Conclusions
Our large-scale sparsely activated language model, GLaM, achieves competitive results on zero-shot and one-shot learning and is a more efficient model than prior monolithic dense counterparts. We also show quantitatively that a high-quality dataset is essential for large language models. We hope that our work will spark more research into compute-efficient language models.

Acknowledgements
We wish to thank Claire Cui, Zhifeng Chen, Yonghui Wu, Quoc Le, Macduff Hughes, Fernando Pereira, Zoubin Ghahramani‎ and Jeff Dean for their support and invaluable input. Special thanks to our collaborators: Yanping Huang, Simon Tong, Yanqi Zhou, Yuanzhong Xu, Dmitry Lepikhin, Orhan Firat, Maxim Krikun, Tao Wang, Noam Shazeer, Barret Zoph, Liam Fedus, Maarten Bosma, Kun Zhang, Emma Wang, David Patterson, Zongwei Zhou, Naveen Kumar, Adams Yu, Laurent Shafey, Jonathan Shen, Ben Lee, Anmol Gulati, David So, Marie Pellat, Kevin Robinson, Kathy Meier-Hellstern‎, Aakanksha Chowdhery, Sharan Narang, Erica Moreira and Eric Ni for helpful discussions and inspirations; and the larger Google Research team. We would also like to thank Tom Small for the animated figure used in this post.

Misc

Pre-processing layers in keras: What they are and how to use them

Post author By
Post date December 9, 2021
No Comments on Pre-processing layers in keras: What they are and how to use them

For keras, the last two releases have brought important new functionality, in terms of both low-level infrastructure and workflow enhancements. This post focuses on an outstanding example of the latter category: a new family of layers designed to help with pre-processing, data-augmentation, and feature-engineering tasks.

Offsites

Pre-processing layers in keras: What they are and how to use them

Post author By
Post date December 9, 2021
No Comments on Pre-processing layers in keras: What they are and how to use them

Misc

NVIDIA Partners with iD Tech on New AI and Machine Learning Course

Post author By
Post date December 9, 2021
No Comments on NVIDIA Partners with iD Tech on New AI and Machine Learning Course

NVIDIA has partnered with iD Tech to create the Artificial Intelligence and Machine Learning certification program, a boot-camp style course for teens.

NVIDIA has partnered with iD Tech, a leading STEM educator for kids and teens, to create a brand new certification program based on the NVIDIA Jetson Nano^TM. The Artificial Intelligence and Machine Learning certification program is a 10-week long, bootcamp-style virtual course for students ages 13-19 who are looking to boost their Python skills and dive into AI.

NVIDIA instructor stands next to student who is working on laptop. — Image 1. Students working with NVIDIA instructors.

Drawing from iD Tech’s in-house Python curriculum and the NVIDIA Jetson AI Fundamentals course, students will learn key concepts in machine learning, computer vision and neural networks. Throughout the program, participants will create projects and build their personal portfolio to demonstrate their mastery of the topics. Upon successful completion of the course, students will submit their work for evaluation and a chance to earn a Jetson AI Specialist certificate from NVIDIA’s Deep Learning Institute.

Image 2. This image shows an example of the NVIDIA DLI Certificate for Jetson AI Specialist Designation. — Image 2. Students who complete the course will be eligible for a “Jetson AI Specialist” certificate.

The NVIDIA Foundation is providing scholarships for 30 female students to attend the course. Plus, for every 10 students who participate, an additional student will attend on full scholarship.

The inaugural class will kick off in February 2022. Learn more about the course syllabus, enrollment and scholarship opportunities.

Misc

Forging New Pathways: Boys & Girls Clubs Teens Take AI From Idea to Application

Post author By
Post date December 9, 2021
No Comments on Forging New Pathways: Boys & Girls Clubs Teens Take AI From Idea to Application

Building AI projects to aid Alzheimer’s patients and monitor pedestrian safety might not be the typical way teens spend their summer. But that’s what a dozen teens with the Boys & Girls Clubs of Hudson County, in densely packed northeastern New Jersey, did as part of the AI Pathways Institute program. They spent three weeks Read article >

The post Forging New Pathways: Boys & Girls Clubs Teens Take AI From Idea to Application appeared first on The Official NVIDIA Blog.

Misc

Insertion Sort Explained–A Data Scientists Algorithm Guide

Post author By
Post date December 9, 2021
No Comments on Insertion Sort Explained–A Data Scientists Algorithm Guide

Learn a step-by-step breakdown of sorting algorithms. A fundamental tools used in data science.

Algorithms are commonplace in the world of data science and machine learning. Algorithms power social media applications, Google search results, banking systems and plenty more. Therefore, it’s paramount that Data Scientists and machine-learning practitioners have an intuition for analyzing, designing, and implementing algorithms.

Efficient algorithms have saved companies millions of dollars and reduced memory and energy consumption when applied to large-scale computational tasks. This article introduces a straightforward algorithm, Insertion Sort.

Although knowing how to implement algorithms is essential, this article also includes details of the insertion algorithm that Data Scientists should consider when selecting for utilization.Therefore, this article mentions factors such as algorithm complexity, performance, analysis, explanation, and utilization.

Why?

It’s important to remember why Data Scientists should study data structures and algorithms before going into explanation and implementation.

Data Science and ML libraries and packages abstract the complexity of commonly used algorithms. Furthermore, algorithms that take 100s of lines to code and some logical deduction are reduced to simple method invocations due to abstraction. This doesn’t relinquish the requirement for Data Scientists to study algorithm development and data structures.

When given a collection of pre-built algorithms to use, determining which algorithm is best for the situation requires understanding the fundamental algorithms in terms of parameters, performances, restrictions, and robustness. Data Scientists can learn all of this information after analyzing and, in some cases, re-implementing algorithms.

The selection of correct problem-specific algorithms and the capacity to troubleshoot algorithms are two of the most significant advantages of algorithm understanding.

K-Means, BIRCH and Mean Shift are all commonly used clustering algorithms, and by no means are Data Scientists possessing the knowledge to implement these algorithms from scratch. Still, there is a necessity that Data Scientists understand the properties of each algorithm and their suitability to specific datasets.

For example, centroid based algorithms are favorable for high-density datasets where clusters can be clearly defined. In contrast, density-based algorithms such as DBSCAN(Density-based spatial clustering of application with Noise) are preferred when dealing with a noisy dataset.

In the context of sorting algorithms, Data Scientists come across data lakes and databases where traversing through elements to identify relationships is more efficient if the containing data is sorted.

Identifying library subroutines suitable for the dataset requires an understanding of various sorting algorithms preferred data structure types. Quicksort algorithms are favorable when working with arrays, but if data is presented as linked-list, then merge sort is more performant, especially in the case of a large dataset. Still, both use the divide and conquer strategy to sort data.

Background

What’s a sorting algorithm?

The Sorting Problem is a well-known programming problem faced by Data Scientists and other software engineers. The primary purpose of the sorting problem is to arrange a set of objects in ascending or descending order. Sorting algorithms are sequential instructions executed to reorder elements within a list efficiently or array into the desired ordering.

What’s the purpose of sorting?

In the data realm, the structured organization of elements within a dataset enables the efficient traversing and quick lookup of specific elements or groups. At a macro level, applications built with efficient algorithms translate to simplicity introduced into our lives, such as navigation systems and search engines.

What’s insertion sort?

Insertion sort algorithm involves the sorted list created based on an iterative comparison of each element in the list with its adjacent element.

An index pointing at the current element indicates the position of the sort. At the beginning of the sort (index=0), the current value is compared to the adjacent value to the left. If the value is greater than the current value, no modifications are made to the list; this is also the case if the adjacent value and the current value are the same numbers.

However, if the adjacent value to the left of the current value is lesser, then the adjacent value position is moved to the left, and only stops moving to the left if the value to the left of it is lesser.

The diagram illustrates the procedures taken in the insertion algorithm on an unsorted list. The list in the diagram below is sorted in ascending order (lowest to highest).

An image depicting the step by step process of the insertion sort algorithm. — *Figure 1: Insertion Sort in Action GIF (This file is licensed under the Creative Commons).*

Algorithm Steps and Implementation (Python and JavaScript)

Steps

To order a list of elements in ascending order, the Insertion Sort algorithm requires the following operations:

Begin with a list of unsorted elements.
Iterate through the list of unsorted elements, from the first item to last.
The current element is compared to the elements in all preceding positions to the left in each step.
If the current element is less than any of the previously listed elements, it is moved one position to the left.

Implementation in Python

Implementation in JavaScript

Performance and Complexity

In the realm of computer science, ‘Big O notation is a strategy for measuring algorithm complexity. We won’t get too technical with Big O notation here. Still, it’s worth noting that computer scientists use this mathematical symbol to quantify algorithms according to their time and space requirements.

The Big O notation is a function that is defined in terms of the input. The letter ‘n’ often represents the size of the input to the function. Simply kept, n represents the number of elements in a list. In different scenarios, practitioners care about the worst-case, best-case, or average complexity of a function.

The worst-case (and average-case) complexity of the insertion sort algorithm is O(n²). Meaning that, in the worst case, the time taken to sort a list is proportional to the square of the number of elements in the list.

The best-case time complexity of insertion sort algorithm is O(n) time complexity. Meaning that the time taken to sort a list is proportional to the number of elements in the list; this is the case when the list is already in the correct order. There’s only one iteration in this case since the inner loop operation is trivial when the list is already in order.

Insertion sort is frequently used to arrange small lists. On the other hand, Insertion sort isn’t the most efficient method for handling large lists with numerous elements. Notably, the insertion sort algorithm is preferred when working with a linked list. And although the algorithm can be applied to data structured in an array, other sorting algorithms such as quicksort.

Summary

One of the simplest sorting methods is insertion sort, which involves building up a sorted list one element at a time. By inserting each unexamined element into the sorted list between elements that are less than it and greater than it. As demonstrated in this article, it’s a simple algorithm to grasp and apply in many languages.

By clearly describing the insertion sort algorithm, accompanied by a step-by-step breakdown of the algorithmic procedures involved. Data Scientists are better equipped to implement the insertion sort algorithm and explore other comparable sorting algorithms such as quicksort and bubble sort, and so on.

Algorithms may be a touchy subject for many Data Scientists. It may be due to the complexity of the topic. The word “algorithm” is sometimes associated with complexity. With the appropriate tools, training, and time, even the most complicated algorithms are simple to understand when you have enough time, information, and resources. Algorithms are fundamental tools used in data science and cannot be ignored.

Misc

Jetson Project of the Month: Automate Your Household Duties with NVIDIA Jetson and DeepStack

Post author By
Post date December 9, 2021
No Comments on Jetson Project of the Month: Automate Your Household Duties with NVIDIA Jetson and DeepStack

The Jetson Project of the Month simplifies home automation projects using a combination of DeepStack and Home Assistant along with the NVIDIA Jetson.

The holidays should be a time for relaxation, and with NVIDIA Jetson technology, you can do just that. The latest Jetson Project of the Month comes from a developer who has created ways to simplify home automation projects, using a combination of DeepStack, Home Assistant, and NVIDIA Jetson.

Robin Cole, a senior data scientist at Satellite Vu with a background in physics, developed this project. He has a passion for problem solving using data and Python. Cole’s main activities on GitHub involve the practical application of machine learning.

As he notes on his GitHub profile: “I have a number of personal projects around training and deploying neural networks on edge devices such as the Raspberry Pi and Jetson Nano.”

In his spare time, he is actively involved in two community projects: Home Assistant and DeepStack AI, and is exploring ways to combine the two technologies. Cole’s Jetson project integrates DeepStack object and face detection and recognition services into the popular home automation platform Home Assistant.

DeepStack runs in a Docker container and exposes various computer vision models through a REST API. DeepStack object detection can identify 80 different kinds of objects, including people, vehicles, and animals. The current list does not include sleighs or reindeer, but you can use a custom object detection model. DeepStack is free to use and it is fully open source. You will need a machine with 8GB RAM or an NVIDIA Jetson to run DeepStack.

Home Assistant is an open-source and extensible platform for home automation that can run on local embedded devices or servers while maintaining local control and privacy. It can help automate tasks such as home energy management, intelligently turning on or off the lights, or sending alerts based on entrances being open. Home Assistant can be integrated with more than 1,000 different devices.

This project uses trained models in the DeepStack REST API to send responses to Home Assistant, which can then perform tasks around the home based on this and other input. DeepStack can run on a Jetson device and it achieves good inference times on its YOLOv5-based trained models, which are run with PyTorch. The DeepStack and Home Assistant mediate their communication through a Go server and a Redis layer.

“I hope this project inspires readers to think about novel applications in and around the home and business that could benefit from AI, and enables rapid prototyping and experimentation,” Cole wrote in a forum post describing his work.

These home automation processes could be applied to tasks like monitoring the Christmas tree lights, issuing alerts when special packages are delivered to the front door, or keeping an eye on the chimney to see if a big guy in a red suit tries to sneak inside the house late at night and eat all of your cookies.

Think of it as a spin on the old holiday classic: You can keep a lookout when you’re sleeping; there’s no need to be awake. Your home automation setup can do the work, which is certainly good for goodness’ sake.

The folks at Everything Smart Home put together demos of each part of this project:

Person and face recognition

Object detection

By combining DeepStack with Home Assistant, Cole has shown developers how to create their own home security or automation systems and the ability to integrate Jetson into larger projects. For those who want to dig deeper, this project opens further possibilities to integrate other NVIDIA technologies such as DeepStream, Metropolis, or even NVIDIA Isaac with Home Assistant.

As Cole points out in his forum post about this project, the developer community is all in and finding smart ways to use this DeepStack-Home Assistant combo, including:

Monitoring activity in a brick factory in Latin America.
Watching for intruding snakes in Thailand.
Monitoring parcel deliveries.
Checking that a motorcycle is locked.
Checking when a chicken lays an egg.
Greeting people when they return home and playing a theme tune.
Counting visitor numbers at a shop.
Checking when a parking spot has become available.

The possibilities are only limited by your imagination. We still think a setup to keep tabs on Santa could be very handy. Share your ideas for how to combine DeepStack and Home Assistant with NVIDIA Jetson in the developer forums.

Cole has also been developing a custom model to detect fires using camera feeds.

Details about Cole’s projects featured in this post are available on GitHub: object detection and face recognition.

You can see a more detailed demo of Cole’s integration of DeepStack with Home Assistant in this presentation he made at IceVision earlier this year.

Learn more about using DeepStack with NVIDIA Jetson.

Misc

Latest GeForce NOW Upgrade Rolling Out With Ubisoft Connect Account Linking and Improved PC Gaming on Mac

Post author By
Post date December 9, 2021
No Comments on Latest GeForce NOW Upgrade Rolling Out With Ubisoft Connect Account Linking and Improved PC Gaming on Mac

Get into the game quicker with the latest GeForce NOW update starting to roll out this GFN Thursday. Learn more about our latest app update — featuring Ubisoft Connect account linking for faster game launches — now rolling out to members, and the six new games joining the GeForce NOW library. The update also improves Read article >

The post Latest GeForce NOW Upgrade Rolling Out With Ubisoft Connect Account Linking and Improved PC Gaming on Mac appeared first on The Official NVIDIA Blog.