Categories
Misc

Using Neural Nano-Optics Researchers Create a Camera the Size of a Salt Grain

A picture of a finger with a half milimeter metasurface next to a blown-up version of the metasurface showing the optical cylinders spread out in a circular pattern.The groundbreaking technology uses an optical metasurface and machine-learning algorithms to produce high-quality color images with a wide field of view.A picture of a finger with a half milimeter metasurface next to a blown-up version of the metasurface showing the optical cylinders spread out in a circular pattern.

A team of researchers from Princeton and the University of Washington created a new camera that captures stunning images and measures in at only a half-millimeter—the size of a coarse grain of salt. 

The new study, published in Nature Communications, outlines the use of optical metasurfaces with machine learning to produce high-quality color imagery, with a wide field of view. The device could be used across industries ranging from robotics to most notably the medical field, to help with disease diagnosis.

Optical metasurfaces rely on a new method of light manipulation, using cylindrical posts set on a small, square surface. The posts, which vary in geometry, work like antennas that can capture incoming photons (waves of electromagnetic radiation). These waves are then sent as signals from the metasurface to a computer to interpret and produce an image.

A graphic showing how a traditional camera with 6 lenses and a sensor works versus neural non-optics that illustrates a metasurface, sensor area, and software processing phase to create an image.
Figure 1. Comparison of a conventional lens versus neural nano-optics.

Tiny cameras offer vast potential for medical use, from brain imaging to minimally invasive endoscopies. But, up to this point, the technology has captured blurry, distorted images, with a limited field of view. 

The researchers employed neural nano-optics—which combines optical metasurfaces with neural feature-based image reconstruction—to interpret the data and produce higher-quality images.

“To recover images from the measured data, we propose a feature-based deconvolution method that employs a differentiable inverse filter together with neural networks for feature extraction and refinement,” said senior author Felix Heide, an assistant professor of computer science at Princeton.

The team tested machine-learning algorithms with a simulator, comparing different configurations of the antennas. With 1.6 million cylinders on the optical surface and complex light interactions, the simulations demand massive amounts of memory. 

The algorithm was tested and trained using the cuDNN-accelerated TensorFlow deep learning framework with an NVIDIA P100 GPU. The result is a new system capable of producing images comparable to a modern camera lens, from a device 500,000 times smaller. According to the study, the new camera is also 10 times better at filtering out errors than previous techniques.

Figure 2. A video comparing an image taken by the nano-camera (right) and the previous best (left). Courtesy of Princeton Computational Imaging Lab.

“It’s been a challenge to design and configure these little microstructures to do what you want,” Ethan Tseng, a computer science Ph.D. student at Princeton and study co-lead said in a press release. “For this specific task of capturing large field of view RGB images, it’s challenging because there are millions of these little microstructures, and it’s not clear how to design them in an optimal way.”

The team is now working to add more computational abilities to the camera and envisions features such as object detection and sensors for medicine or robotics in future iterations. Beyond that, they see a use case where ultra-compact imagers turn surfaces into sensors. 

“We could turn individual surfaces into cameras that have ultra-high resolution. So you wouldn’t need three cameras on the back of your phone anymore, but the whole back of your phone would become one giant camera. We can think of completely different ways to build devices in the future,” said Heide.

The raw capture data and code used to design and evaluate the neural nano-optic is publicly accessible. More information is also available on the Neural Nano-Optics for High-quality Thin Lens Imaging webpage.


Read the published research in Nature Communications. >>
Read the press release. >>

Feature image courtesy of Princeton/Ethan Tseng, et al.

Categories
Offsites

More Efficient In-Context Learning with GLaM

Large language models (e.g., GPT-3) have many significant capabilities, such as performing few-shot learning across a wide array of tasks, including reading comprehension and question answering with very few or no training examples. While these models can perform better by simply using more parameters, training and serving these large models can be very computationally intensive. Is it possible to train and use these models more efficiently?

In pursuit of that question, today we introduce the Generalist Language Model (GLaM), a trillion weight model that can be trained and served efficiently (in terms of computation and energy use) thanks to sparsity, and achieves competitive performance on multiple few-shot learning tasks. GLaM’s performance compares favorably to a dense language model, GPT-3 (175B) with significantly improved learning efficiency across 29 public NLP benchmarks in seven categories, spanning language completion, open-domain question answering, and natural language inference tasks.

Dataset
To build GLaM, we began by building a high-quality 1.6 trillion token dataset containing language usage representative of a wide range of downstream use-cases for the model. Web pages constitute the vast quantity of data in this unlabelled corpus, but their quality ranges from professional writing to low-quality comment and forum pages. We then developed a text quality filter that was trained on a collection of text from Wikipedia and books (both of which are generally higher quality sources) to determine the quality of the content for a webpage. Finally, we applied this filter to generate the final subset of webpages and combined this with books and Wikipedia to create the final training dataset.

Model and Architecture
GLaM is a mixture of experts (MoE) model, a type of model that can be thought of as having different submodels (or experts) that are each specialized for different inputs. The experts in each layer are controlled by a gating network that activates experts based on the input data. For each token (generally a word or part of a word), the gating network selects the two most appropriate experts to process the data. The full version of GLaM has 1.2T total parameters across 64 experts per MoE layer with 32 MoE layers in total, but only activates a subnetwork of 97B (8% of 1.2T) parameters per token prediction during inference.

The architecture of GLaM where each input token is dynamically routed to two selected expert networks out of 64 for prediction.

Similar to the GShard MoE Transformer, we replace the single feedforward network (the simplest layer of an artificial neural network, “Feedforward or FFN” in the blue boxes) of every other transformer layer with a MoE layer. This MoE layer has multiple experts, each a feedforward network with identical architecture but different weight parameters. Even though this MoE layer has many more parameters, the experts are sparsely activated, meaning that for a given input token, only two experts are used, giving the model more capacity while limiting computation. During training, each MoE layer’s gating network is trained to use its input to activate the best two experts for each token, which are then used for inference. For a MoE layer of E experts, this essentially provides a collection of E×(E-1) different feedforward network combinations (instead of one as in the classic Transformer architecture), leading to more computational flexibility.

The final learned representation of a token will be the weighted combination of the outputs from the two experts. This allows different experts to activate on different types of inputs. To enable scaling to larger models, each expert within the GLaM architecture can span multiple computational devices. We use the GSPMD compiler backend to solve the challenges in scaling the experts and train several variants (based on expert size and number of experts) of this architecture to understand the scaling effects of sparsely activated language models.

Evaluation
We use a zero-shot and one-shot setting where the tasks are never seen during training. The benchmarks for evaluation include (1) cloze and completion tasks [1,2,3]; (2) Open-domain question answering [4,5,6]; (3) Winograd-style tasks [7,8]; (4) commonsense reasoning [9,10,11]; (5) in-context reading comprehension [12,13,14,15,16]; (6) the SuperGLUE tasks; and (7) natural language inference [17]. In total, there are eight natural language generation tasks (NLG) where the generated phrases are evaluated against the ground truth targets via Exact Match (EM) accuracy and F1 measure, and 21 language understanding tasks (NLU) where the prediction from several options is chosen via conditional log-likelihood. Some tasks have variants and SuperGLUE consists of multiple tasks. Both EM accuracy and F1 are scaled from 0 to 100 across all our results and averaged for the NLG score below. The NLU score is an average of accuracy and F1 scores.

Results
GLaM reduces to a basic dense Transformer-based language model architecture when each MoE layer only has one expert. In all experiments, we adopt the notation of (base dense model size) / (number of experts per MoE layer) to describe the GLaM model. For example, 1B/64E represents the architecture of a 1B parameter dense model with every other layer replaced by a 64 expert MoE layer. In the following sections, we explore GLaM’s performance and scaling properties, including baseline dense models trained on the same datasets. Compared with the recently announced Megatron-Turing model, GLaM is on-par on the seven respective tasks if using a 5% margin, while using 5x less computation during inference.

Below, we show the 1.2T-parameter sparsely activated model (GLaM) achieved higher results on average and on more tasks than the 175B-parameter dense GPT-3 model while using less computation during inference.

Average score for GLaM and GPT-3 on NLG (left) and NLU (right) tasks (higher is better).

Below we show a summary of the performance on 29 benchmarks compared to the dense model (GPT-3, 175B). GLaM exceeds or is on-par with the performance of the dense model on almost 80% of zero-shot tasks and almost 90% of one-shot tasks.

Evaluation Higher (>+5%) On-par (within 5%) Lower (<-5%)
Zero-shot 13 11 5
One-shot 14 10 5

Moreover, while the full version of GLaM has 1.2T total parameters, it only activates a subnetwork of 97B parameters (8% of 1.2T) per token during inference.

GLaM (64B/64E) GPT-3 (175B)
Total Parameters 1.162T 0.175T
Activated Parameters 0.097T 0.175T

Scaling Behavior
GLaM has two ways to scale: 1) scale the number of experts per layer, where each expert is hosted within one computation device, or 2) scale the size of each expert to go beyond the limit of a single device. To evaluate the scaling properties, we compare the respective dense model (FFN layers instead of MoE layers) of similar FLOPS per token at inference time.

Average zero-shot and one-shot performance by increasing the size of each expert. The FLOPS per token prediction at inference time increases as the expert size grows.

As shown above, performance across tasks scales with the size of the experts. GLaM sparsely activated models also perform better than dense models for similar FLOPs during inference for generation tasks. For understanding tasks, we observed that they perform similarly at smaller scales, but sparsely activated models outperform at larger scales.

Data Efficiency
Training large language models is computationally intensive, so efficiency improvements are useful to reduce energy consumption.

Below we show the computation costs for the full version of GLaM.

Computation cost in GFLOPS both for inference, per token (left) and for training (right).

These compute costs show that GLaM uses more computation during training since it trains on more tokens, but uses significantly less computation during inference. We show comparisons using different numbers of tokens to train below.

We also evaluated the learning curves of our models compared to the dense baseline.

Average zero-shot and one-shot performance of sparsely-activated and dense models on eight generative tasks as more tokens are processed in training.
Average zero-shot and one-shot performance of sparsely-activated and dense models on 21 understanding tasks as more tokens are processed in training.

The results above show that sparsely activated models need to train with significantly less data than dense models to reach similar zero-shot and one-shot performance, and if the same amount of data is used, sparsely activated models perform significantly better.

Finally, we assessed the energy efficiency of GLaM.

Comparison of power consumption during training.

While GLaM uses more computation during training, thanks to the more efficient software implementation powered by GSPMD and the advantage of TPUv4, it uses less power to train than other models.

Conclusions
Our large-scale sparsely activated language model, GLaM, achieves competitive results on zero-shot and one-shot learning and is a more efficient model than prior monolithic dense counterparts. We also show quantitatively that a high-quality dataset is essential for large language models. We hope that our work will spark more research into compute-efficient language models.

Acknowledgements
We wish to thank Claire Cui, Zhifeng Chen, Yonghui Wu, Quoc Le, Macduff Hughes, Fernando Pereira, Zoubin Ghahramani‎ and Jeff Dean for their support and invaluable input. Special thanks to our collaborators: Yanping Huang, Simon Tong, Yanqi Zhou, Yuanzhong Xu, Dmitry Lepikhin, Orhan Firat, Maxim Krikun, Tao Wang, Noam Shazeer, Barret Zoph, Liam Fedus, Maarten Bosma, Kun Zhang, Emma Wang, David Patterson, Zongwei Zhou, Naveen Kumar, Adams Yu, Laurent Shafey, Jonathan Shen, Ben Lee, Anmol Gulati, David So, Marie Pellat, Kevin Robinson, Kathy Meier-Hellstern‎, Aakanksha Chowdhery, Sharan Narang, Erica Moreira and Eric Ni for helpful discussions and inspirations; and the larger Google Research team. We would also like to thank Tom Small for the animated figure used in this post.

Categories
Misc

Pre-processing layers in keras: What they are and how to use them

For keras, the last two releases have brought important new functionality, in terms of both low-level infrastructure and workflow enhancements. This post focuses on an outstanding example of the latter category: a new family of layers designed to help with pre-processing, data-augmentation, and feature-engineering tasks.

Categories
Offsites

Pre-processing layers in keras: What they are and how to use them

For keras, the last two releases have brought important new functionality, in terms of both low-level infrastructure and workflow enhancements. This post focuses on an outstanding example of the latter category: a new family of layers designed to help with pre-processing, data-augmentation, and feature-engineering tasks.

Categories
Misc

NVIDIA Partners with iD Tech on New AI and Machine Learning Course

NVIDIA has partnered with iD Tech to create the Artificial Intelligence and Machine Learning certification program, a boot-camp style course for teens.

NVIDIA has partnered with iD Tech, a leading STEM educator for kids and teens, to create a brand new certification program based on the NVIDIA Jetson NanoTM. The Artificial Intelligence and Machine Learning certification program is a 10-week long, bootcamp-style virtual course for students ages 13-19 who are looking to boost their Python skills and dive into AI. 

NVIDIA instructor stands next to student who is working on laptop.
Image 1. Students working with NVIDIA instructors.

Drawing from iD Tech’s in-house Python curriculum and the NVIDIA Jetson AI Fundamentals course, students will learn key concepts in machine learning, computer vision and neural networks. Throughout the program, participants will create projects and build their personal portfolio to demonstrate their mastery of the topics. Upon successful completion of the course, students will submit their work for evaluation and a chance to earn a Jetson AI Specialist certificate from NVIDIA’s Deep Learning Institute.

Image 2. This image shows an example of the NVIDIA DLI Certificate for Jetson AI Specialist Designation.
Image 2. Students who complete the course will be eligible for a “Jetson AI Specialist” certificate.

The NVIDIA Foundation is providing scholarships for 30 female students to attend the course.  Plus, for every 10 students who participate, an additional student will attend on full scholarship.

The inaugural class will kick off in February 2022. Learn more about the course syllabus, enrollment and scholarship opportunities.

Categories
Misc

Forging New Pathways: Boys & Girls Clubs Teens Take AI From Idea to Application

Building AI projects to aid Alzheimer’s patients and monitor pedestrian safety might not be the typical way teens spend their summer. But that’s what a dozen teens with the Boys & Girls Clubs of Hudson County, in densely packed northeastern New Jersey, did as part of the AI Pathways Institute program. They spent three weeks Read article >

The post Forging New Pathways: Boys & Girls Clubs Teens Take AI From Idea to Application appeared first on The Official NVIDIA Blog.

Categories
Misc

Insertion Sort Explained–A Data Scientists Algorithm Guide

Learn a step-by-step breakdown of sorting algorithms. A fundamental tools used in data science.

Algorithms are commonplace in the world of data science and machine learning. Algorithms power social media applications, Google search results, banking systems and plenty more. Therefore, it’s paramount that Data Scientists and machine-learning practitioners have an intuition for analyzing, designing, and implementing algorithms.

Efficient algorithms have saved companies millions of dollars and reduced memory and energy consumption when applied to large-scale computational tasks. This article introduces a straightforward algorithm, Insertion Sort.

Although knowing how to implement algorithms is essential, this article also includes details of the insertion algorithm that Data Scientists should consider when selecting for utilization.Therefore, this article mentions factors such as algorithm complexity, performance, analysis, explanation, and utilization.

Why?

It’s important to remember why Data Scientists should study data structures and algorithms before going into explanation and implementation.

Data Science and ML libraries and packages abstract the complexity of commonly used algorithms. Furthermore, algorithms that take 100s of lines to code and some logical deduction are reduced to simple method invocations due to abstraction. This doesn’t relinquish the requirement for Data Scientists to study algorithm development and data structures.

When given a collection of pre-built algorithms to use, determining which algorithm is best for the situation requires understanding the fundamental algorithms in terms of parameters, performances, restrictions, and robustness. Data Scientists can learn all of this information after analyzing and, in some cases, re-implementing algorithms.

The selection of correct problem-specific algorithms and the capacity to troubleshoot algorithms are two of the most significant advantages of algorithm understanding.

K-Means, BIRCH and Mean Shift are all commonly used clustering algorithms, and by no means are Data Scientists possessing the knowledge to implement these algorithms from scratch. Still, there is a necessity that Data Scientists understand the properties of each algorithm and their suitability to specific datasets.

For example, centroid based algorithms are favorable for high-density datasets where clusters can be clearly defined. In contrast, density-based algorithms such as DBSCAN(Density-based spatial clustering of application with Noise) are preferred when dealing with a noisy dataset.

In the context of sorting algorithms, Data Scientists come across data lakes and databases where traversing through elements to identify relationships is more efficient if the containing data is sorted.

Identifying library subroutines suitable for the dataset requires an understanding of various sorting algorithms preferred data structure types. Quicksort algorithms are favorable when working with arrays, but if data is presented as linked-list, then merge sort is more performant, especially in the case of a large dataset. Still, both use the divide and conquer strategy to sort data.

Background

What’s a sorting algorithm?

The Sorting Problem is a well-known programming problem faced by Data Scientists and other software engineers. The primary purpose of the sorting problem is to arrange a set of objects in ascending or descending order. Sorting algorithms are sequential instructions executed to reorder elements within a list efficiently or array into the desired ordering.

What’s the purpose of sorting?

In the data realm, the structured organization of elements within a dataset enables the efficient traversing and quick lookup of specific elements or groups. At a macro level, applications built with efficient algorithms translate to simplicity introduced into our lives, such as navigation systems and search engines.

What’s insertion sort?

Insertion sort algorithm involves the sorted list created based on an iterative comparison of each element in the list with its adjacent element.

An index pointing at the current element indicates the position of the sort. At the beginning of the sort (index=0), the current value is compared to the adjacent value to the left. If the value is greater than the current value, no modifications are made to the list; this is also the case if the adjacent value and the current value are the same numbers. 

However, if the adjacent value to the left of the current value is lesser, then the adjacent value position is moved to the left, and only stops moving to the left if the value to the left of it is lesser.

The diagram illustrates the procedures taken in the insertion algorithm on an unsorted list. The list in the diagram below is sorted in ascending order (lowest to highest).

An image depicting the step by step process of the insertion sort algorithm.
Figure 1: Insertion Sort in Action GIF (This file is licensed under the Creative Commons).

Algorithm Steps and Implementation (Python and JavaScript)

Steps

To order a list of elements in ascending order, the Insertion Sort algorithm requires the following operations:

  1. Begin with a list of unsorted elements.
  2. Iterate through the list of unsorted elements, from the first item to last.
  3. The current element is compared to the elements in all preceding positions to the left in each step.
  4. If the current element is less than any of the previously listed elements, it is moved one position to the left.

Implementation in Python

Implementation in JavaScript

Performance and Complexity

In the realm of computer science, ‘Big O notation is a strategy for measuring algorithm complexity. We won’t get too technical with Big O notation here. Still, it’s worth noting that computer scientists use this mathematical symbol to quantify algorithms according to their time and space requirements.

The Big O notation is a function that is defined in terms of the input. The letter ‘n’ often represents the size of the input to the function. Simply kept, n represents the number of elements in a list. In different scenarios, practitioners care about the worst-case, best-case, or average complexity of a function.

The worst-case (and average-case) complexity of the insertion sort algorithm is O(n²). Meaning that, in the worst case, the time taken to sort a list is proportional to the square of the number of elements in the list.

The best-case time complexity of insertion sort algorithm is O(n) time complexity. Meaning that the time taken to sort a list is proportional to the number of elements in the list; this is the case when the list is already in the correct order. There’s only one iteration in this case since the inner loop operation is trivial when the list is already in order.

Insertion sort is frequently used to arrange small lists. On the other hand, Insertion sort isn’t the most efficient method for handling large lists with numerous elements. Notably, the insertion sort algorithm is preferred when working with a linked list. And although the algorithm can be applied to data structured in an array, other sorting algorithms such as quicksort.

Summary

One of the simplest sorting methods is insertion sort, which involves building up a sorted list one element at a time. By inserting each unexamined element into the sorted list between elements that are less than it and greater than it. As demonstrated in this article, it’s a simple algorithm to grasp and apply in many languages.

By clearly describing the insertion sort algorithm, accompanied by a step-by-step breakdown of the algorithmic procedures involved. Data Scientists are better equipped to implement the insertion sort algorithm and explore other comparable sorting algorithms such as quicksort and bubble sort, and so on.

Algorithms may be a touchy subject for many Data Scientists. It may be due to the complexity of the topic. The word “algorithm” is sometimes associated with complexity. With the appropriate tools, training, and time, even the most complicated algorithms are simple to understand when you have enough time, information, and resources. Algorithms are fundamental tools used in data science and cannot be ignored.

Categories
Misc

Jetson Project of the Month: Automate Your Household Duties with NVIDIA Jetson and DeepStack

The Jetson Project of the Month simplifies home automation projects using a combination of DeepStack and Home Assistant along with the NVIDIA Jetson.

The holidays should be a time for relaxation, and with NVIDIA Jetson technology, you can do just that. The latest Jetson Project of the Month comes from a developer who has created ways to simplify home automation projects, using a combination of DeepStack, Home Assistant, and NVIDIA Jetson

Robin Cole, a senior data scientist at Satellite Vu with a background in physics, developed this project. He has a passion for problem solving using data and Python. Cole’s main activities on GitHub involve the practical application of machine learning. 

As he notes on his GitHub profile: “I have a number of personal projects around training and deploying neural networks on edge devices such as the Raspberry Pi and Jetson Nano.”

In his spare time, he is actively involved in two community projects: Home Assistant and DeepStack AI, and is exploring ways to combine the two technologies. Cole’s Jetson project integrates DeepStack object and face detection and recognition services into the popular home automation platform Home Assistant.

DeepStack runs in a Docker container and exposes various computer vision models through a REST API. DeepStack object detection can identify 80 different kinds of objects, including people, vehicles, and animals. The current list does not include sleighs or reindeer, but you can use a custom object detection model. DeepStack is free to use and it is fully open source. You will need a machine with 8GB RAM or an NVIDIA Jetson to run DeepStack.

Home Assistant is an open-source and extensible platform for home automation that can run on local embedded devices or servers while maintaining local control and privacy. It can help automate tasks such as home energy management, intelligently turning on or off the lights, or sending alerts based on entrances being open. Home Assistant can be integrated with more than 1,000 different devices.

This project uses trained models in the DeepStack REST API to send responses to Home Assistant, which can then perform tasks around the home based on this and other input. DeepStack can run on a Jetson device and it achieves good inference times on its YOLOv5-based trained models, which are run with PyTorch. The DeepStack and Home Assistant mediate their communication through a Go server and a Redis layer.

“I hope this project inspires readers to think about novel applications in and around the home and business that could benefit from AI, and enables rapid prototyping and experimentation,” Cole wrote in a forum post describing his work.

These home automation processes could be applied to tasks like monitoring the Christmas tree lights, issuing alerts when special packages are delivered to the front door, or keeping an eye on the chimney to see if a big guy in a red suit tries to sneak inside the house late at night and eat all of your cookies.

Think of it as a spin on the old holiday classic: You can keep a lookout when you’re sleeping; there’s no need to be awake. Your home automation setup can do the work, which is certainly good for goodness’ sake

The folks at Everything Smart Home put together demos of each part of this project:

Person and face recognition



Object detection



By combining DeepStack with Home Assistant, Cole has shown developers how to create their own home security or automation systems and the ability to integrate Jetson into larger projects. For those who want to dig deeper, this project opens further possibilities to integrate other NVIDIA technologies such as DeepStream, Metropolis, or even NVIDIA Isaac with Home Assistant.

As Cole points out in his forum post about this project, the developer community is all in and finding smart ways to use this DeepStack-Home Assistant combo, including:

  • Monitoring activity in a brick factory in Latin America.
  • Watching for intruding snakes in Thailand.
  • Monitoring parcel deliveries.
  • Checking that a motorcycle is locked.
  • Checking when a chicken lays an egg.
  • Greeting people when they return home and playing a theme tune.
  • Counting visitor numbers at a shop.
  • Checking when a parking spot has become available.

The possibilities are only limited by your imagination. We still think a setup to keep tabs on Santa could be very handy. Share your ideas for how to combine DeepStack and Home Assistant with NVIDIA Jetson in the developer forums

Cole has also been developing a custom model to detect fires using camera feeds.

Details about Cole’s projects featured in this post are available on GitHub: object detection and face recognition.

You can see a more detailed demo of Cole’s integration of DeepStack with Home Assistant in this presentation he made at IceVision earlier this year.

Learn more about using DeepStack with NVIDIA Jetson.

Categories
Misc

Latest GeForce NOW Upgrade Rolling Out With Ubisoft Connect Account Linking and Improved PC Gaming on Mac

Get into the game quicker with the latest GeForce NOW update starting to roll out this GFN Thursday. Learn more about our latest app update — featuring Ubisoft Connect account linking for faster game launches — now rolling out to members, and the six new games joining the GeForce NOW library. The update also improves Read article >

The post Latest GeForce NOW Upgrade Rolling Out With Ubisoft Connect Account Linking and Improved PC Gaming on Mac appeared first on The Official NVIDIA Blog.

Categories
Misc

Looking for tensorflow developer

Hi guys,

Anyone here able to help with this spec? We’re looking for a tensorflow familiar app developer who can make an android app.

Mobile App Specifications
We are seeking to quickly develop a prototype Android application that
separates vocals from pop songs. The required functionality for the app is as
follows:
1. App needs to load an audio file (wav/mp3 etc) into memory.
2. A segment of audio (@12 secs in duration) will then be passed to a
tensorflow lite model to separate the audio segment into vocal audio and
non-vocal audio.
3. Another segment of audio is then taken, starting from 9 seconds after the
start of the previous segment and passed to the tensorflow lite model to
again separate the audio segment into vocal audio and non-vocal audio.
This process will be repeated until the full audio file has been processed.
4. The recovered segments need to be combined to create two complete
signals for both vocals and non-vocals. This will involve cross-fading
between the audio segments.
5. The two audio signals need to be saved to disk as audio files (wav/mp3
etc)
Notes:
1. We will supply the tensorflow lite model.
2. We can provide python code which demonstrates how the segmentation of
the audio file and cross-fading should be done.
3. As this is a prototype we are more concerned with functionality rather
than design of the UI. The only concern is that it is easy and
straightforward to load the audio file and that the saved audio files are
easy to find and access.

submitted by /u/dublinjammers
[visit reddit] [comments]