Categories
Misc

Straight to ML or TF internships/jobs?

I am finishing up my CS program but have no interest in anything other than machine learning. Do you think getting the developer cert would be enough for applying to internships/jobs? TIA

submitted by /u/slowkevin
[visit reddit] [comments]

Categories
Offsites

The Importance of A/B Testing in Robotics

Disciplines in the natural sciences, social sciences, and medicine all have to grapple with how to evaluate and compare results within the context of the continually changing real world. In contrast, a significant body of machine learning (ML) research uses a different method that relies on the assumption of a fixed world: measure the performance of a baseline model on fixed data sets, then build a new model aimed at improving on the baseline, and evaluate its performance (on the same fixed data) by comparing its performance to the baseline.

Research into robotics systems and their applications to the real world requires a rethinking of this experiment design. Even in controlled robotic lab environments, it is possible that real-world changes cause the baseline model to perform inconsistently over time, making it unclear whether new models’ performance is an improvement compared to the baseline, or just the result of unintentional, random changes in the experiment setup. As robotics research advances into more complex and challenging real-world scenarios, there is a growing need for both understanding the impact of the ever-changing world on baselines and developing systematic methods to generate informative and clear results.

In this post, we demonstrate how robotics research, even in the relatively controlled environment of a lab, is meaningfully affected by changes in the environment, and discuss how to address this fundamental challenge using random assignment and A/B testing. Although these are classical research methods, they are not generally employed by default in robotics research — yet, they are critical to producing meaningful and measurable scientific results for robotics in real-world scenarios. Additionally, we cover the costs, benefits, and other considerations of using these methods.

The Ever-Changing Real World in Robotics
Even in a robotics lab environment, which is designed to minimize all changes that are not experimental conditions, it is notoriously difficult to set up a perfectly reproducible experiment. Robots get bumped and are subject to wear and tear, lighting changes affect perception, battery charge influences the torque applied to motors — all things that can affect results in ways large and small.

To illustrate this on real robot data, we collected success rate data on one of our simplest setups — moving identical foam dice from one bin to another. For this task, we ran about 33k task trials on two robots over more than five months with the same software and ML model, and took the overall success rate of the last two weeks as baseline. We then measured the historic performance over time in this “very well controlled” environment.

Video of a real robot completing the task: moving identical foam dice from one bin to another.

Given that we did not purposefully change anything during data collection, one would expect the success rate to be statistically similar over time. And yet, this is not what was observed.

The y-axis represents the 95% confidence interval of % change in success rate relative to baseline. If the confidence intervals contain zero, that indicates the success rate is statistically similar to the success rate of baseline. Confidence intervals were computed using Jackknife, with Cochran-Mantel-Haenszel correction to remove operator bias.

Using the sequential data from the plot above, one might conclude that the model ran during weeks 13-14 performed best and that ran during weeks 9-10 performed the worst. One might also expect most, if not all, of the confidence intervals above to contain 0, but only one did. Because no changes were made at any time during these trials, this example effectively demonstrates the impact of unintentional, random real-world changes on even very simple setups. It’s also worth noting that having more trials per experiment wouldn’t remove these differences, instead they will more likely produce a narrower confidence interval making the impact more obvious.

However, what happens when one uses random assignment to compare results, grouping the data randomly rather than sequentially? To answer this, we randomly assigned the above data to the same number of groups for comparison with the baseline. This is equivalent to performing A/B testing where all groups receive the same treatment.

Looking at the chart, we observe that the confidence intervals include zero, indicating success similar to the baseline, as expected.

We performed similar studies with a few other robotics tasks, comparing between sequential and random assignments. They all yielded similar results.

We see that even with no intentional changes, there are statistically significant differences observed for sequential assignment, while random assignment shows the expected result of no statistically significant differences.

Considerations for A/B testing in robotics
While it’s clear based on the above that A/B testing with random assignment is an effective way to control for the unexplainable variance of the real world in robotics, there are some considerations when adopting this approach. Here are several, along with their accompanying pros, cons, and solutions:

  • Absolute vs relative performance: Each experiment needs to be measured against a baseline that is run concurrently. The relative performance metric between baseline and experiment is published with a confidence interval. The absolute performance metric (in baseline or experiment) is less informative, because it depends to an unknown degree on the state of the world when the measurement was taken. However, the statistical differences we’ve measured between the experiment and baseline are sound and robust to reproduction.
  • Data efficiency: With this approach, the baseline always needs to run in parallel with the experimental conditions so they can be compared against each other. Although this may seem wasteful, it is worth the cost when compared against the drawbacks of making an invalid inference against a stale baseline. Furthermore, as the number of random assignment experiments scale up, we can use a single baseline arm with multiple simultaneous experiment arms across independent factors leveraging Google’s overlapping experiment infrastructure. Data efficiency improves with scale.
  • Environmental biases: If there’s any external factor affecting performance overall (lighting, slicker surfaces, etc.), both the baseline and all experiment arms will encounter this factor with similar probability, so its effect will cancel if there’s no relative impact. If there is a correlation between environmental factors and experiment arms, this will show up as differences over time (each environmental factor accumulates in the episodes collected). This can substantially reduce or eliminate the need for effortful environmental resets, and lets us run lifelong experiments and still measure improvements across experimental arms.
  • Human biases: One advantage of random assignment is a reduction in biases introduced by humans. Since human operators cannot know which data sample gets routed to which arm of the experiment, it is harder to have biased experimenters influence any particular outcome.

The Path Forward
The A/B testing experiment framework has been successfully used for a long time in many scientific disciplines to measure performance against changing, unpredictable real-world environments. In this blog post, we show that robotics research can benefit from using this same methodology: it improves the quality and confidence of research results, and avoids the impossible task of perfectly controlling all elements of a fundamentally changing environment. Doing this well requires infrastructure to continuously operate robots, collect data, and tools to make the statistical framework easily accessible to researchers.

Acknowledgements
Arnab Bose, Tuna Toksoz, Yuheng Kuang, Anthony Brohan, Razvan Sudulescu developed the experiment infrastructure and conducted the research. Matthieu Devin suggested the A/A analysis to showcase the differences using existing data. Special thanks to Bill Heavlin, Chris Harris, Vincent Vanhoucke who provided invaluable feedback and support to the work.

Categories
Offsites

FRILL: On-Device Speech Representations using TensorFlow-Lite

Representation learning is a machine learning (ML) method that trains a model to identify salient features that can be applied to a variety of downstream tasks, ranging from natural language processing (e.g., BERT and ALBERT) to image analysis and classification (e.g., Inception layers and SimCLR). Last year, we introduced a benchmark for comparing speech representations and a new, generally-useful speech representation model (TRILL). TRILL is based on temporal proximity, and tries to map speech that occurs close together in time to a lower-dimensional embedding that captures temporal proximity in the embedding space. Since its release, the research community has used TRILL on a diverse set of tasks, such as age classification, video thumbnail selection, and language identification. However, despite achieving state-of-the-art performance, TRILL and other neural network-based approaches require more memory and take longer to compute than signal processing operations that deal with simple features, like loudness, average energy, pitch, etc.

In our recent paper “FRILL: A Non-Semantic Speech Embedding for Mobile Devices“, to appear at Interspeech 2021, we create a new model that is 40% the size of TRILL and and a feature set that can be computed over 32x faster on mobile phone, with an average decrease in accuracy of less than 2%. This marks an important step towards fully on-device applications of speech ML models, which will lead to better personalization, improved user experiences and greater privacy, an important aspect of developing AI responsibly. We release the code to create FRILL on github, and a pre-trained FRILL model on TensorFlow Hub.

FRILL: Smaller, Faster TRILL
The TRILL architecture is based on a modified version of ResNet50, an architecture that is computationally taxing for constrained hardware, like mobile phones or smart home devices. On the other hand, architectures like MobileNetV3 have been designed with hardware-aware AutoML to perform well on mobile devices. To take advantage of this, we leverage knowledge distillation to combine the benefits of MobileNetV3’s performance with TRILL’s representations.

In the distillation process, the smaller model (i.e., the “student”) tries to match the output of the larger model (“teacher”) on the AudioSet dataset. Whereas the original TRILL model learned its weights by optimizing a self-supervised loss that clustered audio segments close in time, the student model learns its weights through a fully-supervised loss that ignores temporal matching and instead tries to match TRILL outputs on the training data. The fully-supervised learning signal is often stronger than self-supervision, and allows us to train more quickly.

Knowledge distillation for non-semantic speech embeddings. The dashed line shows the student model output. The “teacher network” is the TRILL network, where “Layer 19” was the best-performing internal representation. The “Student Hyperparameters” on the left are the options explored in this study, the result of which are 144 distinct models. These models were trained with mean-squared error (MSE) to try to match TRILL’s Layer 19.

Choosing the Best Student Model
We perform distillation with a variety of student models, each trained with a specific combination of architecture choices (explained below). To measure each student model’s latency, we leverage TensorFlow Lite (TFLite), a framework that enables execution of TensorFlow models on edge devices. Each candidate model is first converted into TFLite’s flatbuffer format for 32-bit floating point inference and then sent to the target device (in this case, a Pixel 1) for benchmarking. These measurements help us to accurately assess the latency versus quality tradeoffs across all student models and to minimize the loss of quality in the conversion process.

Architecture Choices and Optimizations
We explored different neural network architectures and features that balance latency and accuracy — models with fewer parameters are usually smaller and faster, but have less representational power and therefore generate less generally-useful representations. We trained 144 different models across a number of hyperparameters, all based on the MobileNetV3 architecture:

  1. MobileNetV3 size and width: MobileNetV3 was released in different sizes for use in different environments. The size refers to which MobileNetV3 architecture we used. The width, sometimes known as alpha, proportionally decreases or increases the number of filters in each layer. A width of 1.0 corresponds to the number of filters in the original paper.
  2. Global average pooling: MobileNetV3 normally produces a set of two-dimensional feature maps. These are flattened, concatenated, and passed to the bottleneck layer. However, this bottleneck is often still too large to be computed quickly. We reduce the size of the bottleneck layer kernel by taking the global average of all ”pixels” in each output feature map. Our intuition is that the discarded temporal information is less important for learning a non-semantic speech representation due to the fact that relevant aspects of the signal are stable across time.
  3. Bottleneck compression: A significant portion of the student model’s weights are located in the bottleneck layer. To reduce the size of this layer, we apply a compression operator based on singular value decomposition (SVD) that learns a low-rank approximation of the bottleneck weight matrix.
  4. Quantization-aware training: Since the bottleneck layer has most of the model weights, we use quantization-aware training (QAT) to gradually reduce the numerical precision of the bottleneck weights during training. QAT allows the model to adjust to the lower numerical precision during training, instead of potentially causing performance degradation by introducing quantization after training finishes.

Results
We evaluated each of these models on the Non-Semantic Speech Benchmark (NOSS) and two new tasks — a challenging task to detect whether a speaker is wearing a mask and the human-noise subset of the Environment Sound Classification dataset, which includes labels like “coughing” and “sneezing”. After eliminating models that have strictly better alternatives, we are left with eight ”frontier” models on the quality vs. latency curve, which are the models that had no faster and better performance alternatives at a corresponding quality threshold or latency in our batch of 144 models. We plot the latency vs. quality curve of only these “frontier” models below, and we ignore models that are strictly worse.

Embedding quality and latency tradeoff. The x-axis represents the inference latency and the y-axis shows the difference in accuracy from TRILL’s performance, averaged across benchmark datasets.

FRILL is the best performing sub-10ms inference model, with an inference time of 8.5 ms on a Pixel 1 (about 32x faster than TRILL), and is also roughly 40% the size of TRILL. The frontier curve plateaus at about 10ms latency, which means that at low latency, one can achieve much better performance with minimal latency costs, while achieving improved performance at latencies beyond 10ms is more difficult. This supports our choice of experiment hyperparameters. FRILL’s per-task performance is shown in the table below.

FRILL TRILL
Size (MB) 38.5 98.1
Latency (ms) 8.5 275.3
Voxceleb1* 45.5 46.8
Voxforge 78.8 84.5
Speech Commands 81.0 81.7
CREMA-D 71.3 65.9
SAVEE 63.3 70.0
Masked Speech 68.0 65.8
ESC-50 HS 87.9 86.4
Accuracy on each of the classification tasks (higher is better).
*Results in our study use a small subset of Voxceleb1 filtered according to internal privacy guidelines. Interested readers can run our study on the full dataset using TensorFlow Datasets and our open-source evaluation code.

Finally, we evaluate the relative contribution of each of our hyperparameters. We find that for our experiments, quantization-aware training, bottleneck compression and global average pooling most reduced the latency of the resulting models. At the same time bottleneck compression most reduced the quality of the resulting model, while pooling reduced the model performance the least. The architecture width parameter was an important factor in reducing the model size, with minimal performance degradation.

Linear regression weight magnitudes for predicting model quality, latency, and size. The weights indicate the expected impact of changing the input hyperparameter. A higher weight magnitude indicates a greater expected impact.

Our work is an important step in bringing the full benefits of speech machine learning research to mobile devices. We also provide our public model, corresponding model card, and evaluation code to help the research community responsibly develop even more applications for on-device speech representation research.

Acknowledgements
We’d like to thank our paper co-authors: Jacob Peplinksi and Shwetak Patel. We’d like to thank Aren Jansen for his technical support on this project, Françoise Beaufays, and Tulsee Doshi for help open sourcing the model, and Google Research, Tokyo for logistical support.

Categories
Misc

Startup’s AI Intersects With U.S. Traffic Lights for Better Flow, Safety

Thousands of U.S. traffic lights may soon be getting the green light on AI for safer streets. That’s because startup CVEDIA has designed better and faster vehicle and pedestrian detections to improve traffic flow and pedestrian safety for Cubic Transportation Systems. These new AI capabilities will be integrated into Cubic’s GRIDSMART Solution, a single-camera intersection Read article >

The post Startup’s AI Intersects With U.S. Traffic Lights for Better Flow, Safety appeared first on The Official NVIDIA Blog.

Categories
Misc

Accelerating Model Development and AI Training with Synthetic Data, SKY ENGINE AI platform, and NVIDIA Transfer Learning Toolkit

In AI and computer vision, data acquisition is costly and time-consuming and human-based labeling can be error-prone. The accuracy of the models is also affected by insufficient and poorly balanced data and the prolonged time required to improve the deep learning models. It always requires the reacquisition of data in the real world. The collection, … Continued

In AI and computer vision, data acquisition is costly and time-consuming and human-based labeling can be error-prone. The accuracy of the models is also affected by insufficient and poorly balanced data and the prolonged time required to improve the deep learning models. It always requires the reacquisition of data in the real world.

The collection, preparation of data, and development of accurate and reliable software solutions based on AI training is an extremely laborious process. The required investment costs offset the expected benefits of deploying the system.

One way to bridge the data gap and accelerate model training is by using synthetic data instead of real data for training. SKY ENGINE provides an AI platform to move deep learning to virtual reality. It is possible to generate synthetic data using simulations where the synthetic images come with the annotation that can be used directly in training AI models.

Synthetic data can now be directly exported to run on the NVIDIA Transfer Learning Toolkit (TLT), an AI training toolkit that simplifies training by abstracting away the AI/DL framework complexity. This enables you to build production-quality models faster without needing any AI expertise. With the SKY ENGINE AI platform and TLT, you can quickly iterate and build AI.

In this post, you learn how you can harness the power of synthetic data by taking preannotated synthetic data and training it on TLT. I demonstrate a simple inspection use case to identify antennas on a telco tower using segmentation.

About the SKY ENGINE AI approach

SKY ENGINE introduces a full-stack AI platform for deep learning in virtual reality, which is the next-generation active learning AI system for image and video analysis applications. The SKY ENGINE AI platform can generate data using a proprietary, dedicated simulation system where images come already annotated and ready for deep learning.

The output data stream can include any of the following:

  • Rendered images or other simulated sensor data in selected modalities
  • Object bounding boxes
  • 3D bounding boxes
  • Semantic masks
  • 2D or 3D skeletons
  • Depth maps
  • Normal vector maps

SKY ENGINE AI also includes advanced domain adaptation algorithms that can understand the characteristics of real data examples. They assure the high-quality performance of any trained AI model during the inference.

Graphical user interface of Sky Engine AI includes Code Editor with Sky Renderer configuration, Render Layers preview (Beauty Pass), Node Settings information, and Objects Tree.
Figure 1. SKY ENGINE AI platform user interface preview.

The SKY ENGINE simulation system enables physics-driven sensor simulations (cameras, thermal vision, IR, lidars, radars, and more) and sensor data fusion. It is tightly coupled with a deep learning pipeline to ensure evolution. During training, SKY ENGINE AI can spot ambiguous situations that deteriorate the accuracy of the AI model. It obtains more imagery data to reflect those problematic situations that the deep learning accuracy could instantaneously improve. SKY ENGINE AI learns more with every performed experiment.

SKY ENGINE AI delivers a garden of deep neural networks fully implemented, tested, and optimized. Provided models are dedicated to popular computer vision tasks like object detection and semantic segmentation. They can also serve as more sophisticated topologies designed and implemented for 3D position and pose estimation, 3D geometry reasoning, or representation learning.

SKY ENGINE AI also includes advanced domain adaptation algorithms that can understand the characteristics of real data examples and assure the performance of trained model inference. SKY ENGINE AI does not require sophisticated rendering and imaging knowledge, so the entry barrier is very low. It has a Python API, including a large number of helpers to quickly build and configure the environment.

Neural network optimization

The SKY ENGINE AI platform can generate the datasets and enable the training of deep learning models that can use input data originating from any source. The input stream for AI models training in NVIDIA TLT and AI-driven inference can effectively include low-quality images obtained using smartphones, data from CCTV cameras, or cameras mounted on drones.

You can deploy analytical modules for telecommunication network performance optimization on the cloud, including data storage and multi-GPU scaling. The majority of software projects driven by machine learning in this space are unable to reach the final stage of solution deployment. This could be because of the high dependence of machine learning capabilities on the quality of the input data. The development of AI models with deep training on synthetic data, offered by SKY ENGINE, is a solution with predictable project development and guaranteed deployment in several industrial business processes.

Telecommunication equipment detection and classification

One of the common computer vision tasks is the localization and classification of the equipment of interest. In this post, I present the process of neural network optimization for bounding box localization of antenna instances on a telecommunication tower using the NVIDIA TLT environment with MaskRCNN. You use the synthetic data from SKY ENGINE AI to train the MaskRCNN model. The high-level workflow is as follows:

  1. Generate synthetic data with annotations.
  2. Convert the data format to COCO as required by NVIDIA TLT MaskRCNN model.
  3. Configure the NGC environment and data preprocessing.
  4. Train and evaluate the MaskRCNN model on synthetic data.
  5. Perform inference using the trained AI model on synthetic and real telco towers.

To follow along, see the SKY ENGINE AI Jupyter notebook on GitHub.

Given the real samples of a telco tower, I used the SE Rendering Engine to create an annotated synthetic dataset.

To launch automatic generation of labeled data using SKY ENGINE AI and to prepare the data source object, you must define basic tools like empty renderer context, as well as paths where the assets for the synthetic scene are located.

In this rendering scenario, I randomized the following:

  •   The number of antennas on a given telecommunication tower
  •   The direction of the light
  •   The positions of the camera
  •   The camera’s horizontal field of view
  •   A background map

There can be many projects in which the samples returned by SKY ENGINE are not shuffled enough. One example would be when your rendering process follows the camera trajectory. For this reason, I recommend extra shuffling of the data before dividing it into train and test sets.

After generating the images, convert them to COCO format using the data export module of SKY ENGINE. This is required by the NVIDIA TLT framework. After you prepare the configuration file according to the documentation, you can run the training for the TLT pretrained Mask RCNN model with the TensorFlow backend:

!tlt mask_rcnn train -e $SPECS_DIR/maskrcnn_train_telco_resnet50.txt 
                      -d $USER_EXPERIMENT_DIR/experiment_telco_anchors 
                      -k $KEY 
                      --gpus 1 

As a final step, run a trained deep learning model for inference on real data to see if the model is accurately performing tasks of interest.

!tlt mask_rcnn inference -i $DATA_DIR/valid_images 
                          -o $USER_EXPERIMENT_DIR/se_telco_maskrcnn_inference_synth 
                          -e $SPECS_DIR/maskrcnn_train_telco_resnet50.txt 
                          -m $USER_EXPERIMENT_DIR/experiment_telco_anchors/model.step-20000.tlt 
                          -l $SPECS_DIR/telco_labels.txt 
                          -t 0.5 
                          -b 1 
                          -k $KEY 
                          --include_mask 

Figure 3 shows some results of telecommunication antenna detection.

Summary

In this post, I demonstrated how you can reduce your data collection and annotation effort by using the synthetic data from SKY ENGINE and training and optimizing it with NVIDIA TLT. I presented a single SKY ENGINE AI use case for telecommunication industry. However, this platform unlocks the universe of further potential applications delivering several advanced functionalities:

  • Automated dataset balancing (active learning)
  • Domain adaptation
  • Pretrained deep learning models for 3D reasoning
  • Simulations of sensors and training of deep learning models for sensor fusion

For more information, see the SKY ENGINE AI solution on GitHub. For more computer vision use cases developed in the SKY ENGINE AI Platform, see the following videos:

Categories
Misc

Preparing Models for Object Detection with Real and Synthetic Data and the NVIDIA Transfer Learning Toolkit

The long, cumbersome slog of data procurement has been slowing down innovation in AI, especially in computer vision, which relies on labeled images and video for training. But now you can jumpstart your machine learning process by quickly generating synthetic data using AI.Reverie. With the AI.Reverie synthetic data platform, you can create the exact training … Continued

The long, cumbersome slog of data procurement has been slowing down innovation in AI, especially in computer vision, which relies on labeled images and video for training. But now you can jumpstart your machine learning process by quickly generating synthetic data using AI.Reverie.

With the AI.Reverie synthetic data platform, you can create the exact training data that you need in a fraction of the time it would take to find and label the right real photography. In AI.Reverie’s photorealistic 3D environments, you can generate data for all possible scenarios, including hard to reach places, unusual environmental conditions, and rare or unique events.

Training data generation includes labels. Choose the needed types, such as 2D or 3D bounding boxes, depth masks, and so on. After you test your model, you can return to the platform to quickly generate additional data to improve accuracy. Test and repeat in quick, iterative cycles.

We wanted to test performance of AI.Reverie synthetic data in NVIDIA Transfer Learning Toolkit 3.0. Originally, we set out to replicate the results in the research paper RarePlanes: Synthetic Data Takes Flight, which used synthetic imagery to create object detection models. We discovered new tools in TLT that made it possible to create more lightweight models that were as accurate as, but much faster than, those featured in the original paper.

In this post, we show you how we used the TLT quantized-aware training and model pruning to accomplish this, and how to replicate the results yourself. We show you how to create an airplane detector, but you should be able to fine-tune the model for various satellite detection scenarios of your own.

This synthetic image shows annotations with some bounding boxes on several aircrafts
Figure 1. A synthetic image featuring annotations that denote aircraft type, wing shape, and other distinguishing features.

Access the satellite detection model

To replicate these results, you can clone the GitHub repository and follow along with the included Jupyter notebook.

Clone the following repo:

git clone git@github.com:aireveries/rareplanes-tlt.git ~/Code/rareplanes-tlt 

Create a conda environment:

conda create -f env.yaml 

Activate the model:

source activate rareplanes-tlt 

Start Jupyter:

jupyter notebook 

Learning objectives

  • Generate synthetic data using the AI.Reverie platform and use it with NVIDIA TLT.
  • Train highly accurate models using synthetic data.
  • Optimize a model for inference using the TLT.

Prerequisites

We tested the code with Python 3.8.8, using Anaconda 4.9.2 to manage dependencies and the virtual environment. The code may work with different versions of Python and other virtual environment solutions, but we haven’t tested those configurations. We used Ubuntu 18.04.5 LTS and NVIDIA driver 460.32.03 and CUDA Version 11.2. TLT requires driver 455.xx or later.

  • Set up the NVIDIA Container Toolkit / nvidia-docker2. For more information, see the NVIDIA Container Toolkit Installation Guide.
  •  Set up NGC to be able to download NVIDIA Docker containers. Follow steps 4 and 5 in the TLT User Guide. For more information about the NGC CLI tool, see CLI Install.
  • Have available at least 250 GB hard disk space to store dataset and model weights.

Downloading the datasets

For more information about the contents of the RarePlanes dataset, see RarePlanes Public User Guide.

For this tutorial, you need only download a subset of the data. The following code example is meant to be executed from within the Jupyter notebook. First, create the folders:

!mkdir -p data/real/tarballs/{train,test}
 !mkdir -p data/synthetic 

Now use this function to download the datasets from Amazon S3, extract them, and verify:

def download(s3_path, out_folder, out_file_count):
     rel_file_path = Path('data') / Path(s3_path.replace('s3://rareplanes-public/', ''))
     rel_folder = rel_file_path.parent / out_folder
     num_files = !ls $rel_folder | wc -l
     try:
         if int(num_files[0]) == out_file_count:
             print(f'{s3_path} already downloaded and extracted')
         else:
             raise Exception
     except:
         if not rel_file_path.exists():
             print('Starting download')
             !aws s3 cp $s3_path $rel_file_path;   
         else:
             print(f'{s3_path} already downloaded')
         print('Extracting...')
         !cd {rel_folder.parent}; pv {rel_file_path.name} | tar xz;
         print('Removing compressed file.')
         !rm $rel_file_path 

Then download the dataset:

download('s3://rareplanes-public/real/tarballs/metadata_annotations.tar.gz', 
          'metadata_annotations', 9)
 download('s3://rareplanes-public/real/tarballs/train/RarePlanes_train_PS-RGB_tiled.tar.gz', 
          'PS-RGB_tiled', 11630)
 download('s3://rareplanes-public/real/tarballs/test/RarePlanes_test_PS-RGB_tiled.tar.gz', 
           'PS-RGB_tiled', 5420)
 !aws s3 cp --recursive s3://rareplanes-public/synthetic/ data/synthetic 

Converting from COCO to KITTI format

TLT uses the KITTI format for object detection model training. RarePlanes is in the COCO format, so you must run a conversion script from within the Jupyter notebook. This converts the real train/test and synthetic train/test datasets.

%run convert_coco_to_kitti.py 

There should now be a folder for each dataset split inside of data/kitti that contains the KITTI formatted annotation text files and symlinks to the original images.

Setting up TLT mounts

The notebook has a script to generate a ~/.tlt_mounts.json file. For more information about the various settings, see Running the launcher.

{
     "Mounts": [
         {
             "source": "/home/patrick.rodriguez/Code/rareplanes-tlt",
             "destination": "/workspace/tlt-experiments"
         }
     ],
     "Envs": [
         {
             "variable": "CUDA_VISIBLE_DEVICES",
             "value": "0"
         }
     ],
     "DockerOptions": {
         "shm_size": "16G",
         "ulimits": {
             "memlock": -1,
             "stack": 67108864
         },
         "user": "1001:1001"
     }
 } 

Processing datasets into TFRecords

You must turn the KITTI labels into the TFRecord format used by TLT. The convert_split function in the notebook helps you bulk convert all the datasets:

 def convert_split(name):
     !tlt detectnet_v2 dataset_convert --gpu_index 0 
         -d /workspace/tlt-experiments/specs/detectnet_v2_tfrecords_{name}.txt 
         -o /workspace/tlt-experiments/data/tfrecords/{name}/{name}
 You can then run the conversions:
 convert_split('kitti_real_train')
 convert_split('kitti_real_test')
 convert_split('kitti_synthetic_train')
 convert_split('kitti_synthetic_test') 

Download the ResNet18 convolutional backbone

Using your NGC account and command-line tool, you can now download the model:

Download the ResNet18 convolutional backbone

Using your NGC account and command-line tool, you can now download the model:

!ngc registry model download-version nvidia/tlt_pretrained_detectnet_v2:resnet18 

The model is now located at the following path:

./tlt_pretrained_detectnet_v2_vresnet18/resnet18.hdf5

Run a benchmark experiment using real data

The following command starts training and logs results to a file that you can tail:

!tlt detectnet_v2 train --key tlt --gpu_index 0 
     -e /workspace/tlt-experiments/specs/detectnet_v2_train_resnet18_kitti_real.txt 
     -r /workspace/tlt-experiments/detectnet_v2_outputs/resnet18_real_amp16 
     -n resnet18_real_amp16 
     --use_amp > out_resnet18_real_amp16.log 

Follow along with the following command:

tail -f ./out_resnet18_real_amp16.log 

After training is complete, you can use the functions defined in the notebook to get relevant statistics on your model:

 get_model_param_counts('./out_resnet18_real_amp16.log')
 best_epoch = get_best_epoch('./out_resnet18_real_amp16.log')
 best_epoch 

You get something like the following output:

 Total params: 11,197,893
 Trainable params: 11,188,165
 Non-trainable params: 9,728
 Best epoch and map50 metric: (79, 94.2296) 

To reevaluate your trained model on your test set or other dataset, run the following:

!tlt detectnet_v2 evaluate --gpu_index 0 
     -e /workspace/tlt-experiments/specs/detectnet_v2_evaluate_real.txt 
     -m /workspace/tlt-experiments/{best_checkpoint} 
     -k tlt 

The output should look something like this:

 Validation cost: 0.001133
 Mean average_precision (in %): 94.2563
  
 class name      average precision (in %)
 ------------  --------------------------
 aircraft                         94.2563
  
 Median Inference Time: 0.003877
 2021-04-06 05:47:00,323 [INFO] __main__: Evaluation complete.
 Time taken to run __main__:main: 0:00:27.031500.
 2021-04-06 05:47:02,466 [INFO] tlt.components.docker_handler.docker_handler: Stopping container. 

Running an experiment with synthetic data

  !tlt detectnet_v2 train --key tlt --gpu_index 0 
     -e /workspace/tlt-experiments/specs/detectnet_v2_train_resnet18_kitti_synth.txt 
     -r /workspace/tlt-experiments/detectnet_v2_outputs/resnet18_synth_amp16 
     -n resnet18_synth_amp16 
     --use_amp > out_resnet18_synth_amp16.log 

You can see the results for each epoch by running: !cat out_resnet18_synth_amp16.log | grep -i aircraft

Example output:

 aircraft                         58.1444
 aircraft                         65.1423
 aircraft                         64.3203
 aircraft                         68.1934
 aircraft                         71.5754
 aircraft                         68.5568 

Fine-tuning the synthetic-trained model with real data

Now, fine-tune your best-performing synthetic-data-trained model with 10% of the real data. To do so, you must first create the 10% split.

 %run ./create_train_split.py
 convert_split('kitti_real_train_10') 

You then use this function to replace the checkpoint in your template spec with the best performing model from the synthetic-only training.

 with open('./specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10.txt', 'r') as f_in:
     with open('./specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10_replaced.txt', 'w') as f_out:
         out = f_in.read().replace('REPLACE', best_checkpoint)
         f_out.write(out) 

You can now begin a TLT training. Start your fine-tuning with the best-performing epoch of the model trained on synthetic data alone, in the previous section.

!tlt detectnet_v2 train --key tlt --gpu_index 0 
     -e /workspace/tlt-experiments/specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10_replaced.txt 
     -r /workspace/tlt-experiments/detectnet_v2_outputs/resnet18_synth_finetune_10_amp16 
     -n resnet18_synth_finetune_10_amp16 
     --use_amp > out_resnet18_synth_finetune_10_amp16.log 

After training has completed, you should see a best epoch of between 91-93% mAP50, which gets you close to the real-only model performance with only 10% of the real data.

In the notebook, there’s a command to evaluate the best performing model checkpoint on the test set:

 !tlt detectnet_v2 evaluate --gpu_index 0 
     -e /workspace/tlt-experiments/specs/detectnet_v2_evaluate_real.txt 
     -m /workspace/tlt-experiments/{best_checkpoint} 
     -k tlt 

You should see something like the following output:

2021-04-06 18:05:28,342 [INFO] iva.detectnet_v2.evaluation.evaluation: step 330 / 339, 0.05s/step
 Matching predictions to ground truth, class 1/1.: 100%|█| 14719/14719 [00:00



This chart has three bars, including “real data”, “synthetic only”, and “synthetic + 10% real”.
Figure 2. Training on synthetic + 10% real data nearly matches the results of training on 100% of the real data.

Data enhancement is fine-tuning a model training on AI.Reverie’s synthetic data with just 10% of the original, real dataset. As you can see, this technique produces a model as accurate as one trained on real data alone. That represents roughly 90% cost savings on real, labeled data and saves you from having to endure a long hand-labeling and QA process.

Pruning the model

Having trained a well-performing model, you can now decrease the number of weights to cut down on file size and inference time. TLT includes an easy-to-use pruning tool.

The one argument to play with is -pth, which sets the threshold for neurons to prune. The higher you set this, the more parameters are pruned, but after a certain point your accuracy metric may drop too low. We found that a value of 0.5 worked for these experiments, but you may find different results on other datasets.

 !mkdir -p detectnet_v2_outputs/pruned
  
 !tlt detectnet_v2 prune 
     -m /workspace/tlt-experiments/{best_checkpoint} 
     -o /workspace/tlt-experiments/detectnet_v2_outputs/pruned/pruned-model.tlt 
     -eq union 
     -pth 0.5 
     -k tlt 

You can now evaluate the pruned model:

!tlt detectnet_v2 evaluate --gpu_index 0 
     -e /workspace/tlt-experiments/specs/detectnet_v2_evaluate_real.txt 
     -m /workspace/tlt-experiments/detectnet_v2_outputs/pruned/pruned-model.tlt 
     -k tlt > out_pruned.txt 

Now you can see how many parameters remain:

get_model_param_counts('./out_pruned.txt') 

You should see something like the following outputs:

 Total params: 3,372,973
 Trainable params: 3,366,573
 Non-trainable params: 6,400 

This is 70% smaller than the original model, which had 11.2 million parameters! Of course, you’ve lost performance by dropping so many parameters, which you can verify:

 !cat out_pruned.txt | grep -i aircraft
  
 aircraft                         68.8865 

Luckily, you can recover almost all the performance by retraining the pruned model.

Retraining the models

As before, there is a template spec to run this experiment that only requires you to fill in the location of the pruned model:

 with open('./specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10_pruned_retrain.txt', 'r') as f_in:
     with open('./specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10_pruned_retrain_replaced.txt', 'w') as f_out:
         out = f_in.read().replace('REPLACE', 'detectnet_v2_outputs/pruned/pruned-model.tlt')
         f_out.write(out) 

You can now retrain the pruned model:

 !tlt detectnet_v2 train --key tlt --gpu_index 0 
     -e /workspace/tlt-experiments/specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10_pruned_retrain_replaced.txt 
     -r /workspace/tlt-experiments/detectnet_v2_outputs/resnet18_synth_finetune_10_pruned_retrain_amp16 
     -n resnet18_synth_finetune_10_pruned_retrain_amp16 
     --use_amp > out_resnet18_synth_finetune_10_pruned_retrain_amp16.log 

On a run of this experiment, the best performing epoch achieved 91.925 mAP50, which is about the same as the original nonpruned experiment.

2021-04-06 19:33:39,360 [INFO] iva.detectnet_v2.evaluation.evaluation: step 330 / 339, 0.05s/step
 Matching predictions to ground truth, class 1/1.: 100%|█| 17403/17403 [00:01



Quantizing the models

The final step in this process is quantizing the pruned model so that you can achieve much higher levels of inference speed with TensorRT. We have a quantization aware training (QAT) spec template available:

with open('./specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10_pruned_retrain_qat.txt', 'r') as f_in:
     with open('./specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10_pruned_retrain_qat_replaced.txt', 'w') as f_out:
         out = f_in.read().replace('REPLACE', 'detectnet_v2_outputs/pruned/pruned-model.tlt')
         f_out.write(out) 

Run the QAT training:

!tlt detectnet_v2 train --key tlt --gpu_index 0 
     -e /workspace/tlt-experiments/specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10_pruned_retrain_qat_replaced.txt 
     -r /workspace/tlt-experiments/detectnet_v2_outputs/resnet18_synth_finetune_10_pruned_retrain_qat_amp16 
     -n resnet18_synth_finetune_10_pruned_retrain_qat_amp16 
     --use_amp > out_resnet18_synth_finetune_10_pruned_retrain_qat_amp16.log 

Use the TLT export tool to export to INT8 quantized TensorRT format:

!tlt detectnet_v2 export 
   -m /workspace/tlt-experiments/{best_checkpoint} 
   -o /workspace/tlt-experiments/detectnet_v2_outputs/qat/resnet18_detector_qat.etlt 
   -k tlt  
   --data_type int8 
   --batch_size 64 
   --max_batch_size 64
   --engine_file /workspace/tlt-experiments/detectnet_v2_outputs/qat/resnet18_detector_qat.trt.int8 
   --cal_cache_file /workspace/tlt-experiments/detectnet_v2_outputs/qat/calibration_qat.bin 
   --verbose 

At this point, you can now evaluate your quantized model using TensorRT:

!tlt detectnet_v2 evaluate -e /workspace/tlt-experiments/specs/detectnet_v2_train_resnet18_kitti_synth_finetune_10_pruned_retrain_qat_replaced.txt 
                            -m /workspace/tlt-experiments/detectnet_v2_outputs/qat/resnet18_detector_qat.trt.int8 
                            -f tensorrt 

Looking at the output:

2021-04-06 23:08:28,471 [INFO] iva.detectnet_v2.evaluation.tensorrt_evaluator: step 330 / 339, 0.33s/step
 Matching predictions to ground truth, class 1/1.: 100%|█| 21973/21973 [00:01



Conclusion

We were impressed by these results. AI.Reverie’s synthetic data platform, with just 10% of the real dataset, enabled us to achieve the same performance as we did when training on the full real dataset. That represents a cost savings of roughly 90%, not to mention the time saved on procurement. It now takes days, not months, to generate the needed synthetic data.

TLT also produced a 25.2x reduction in parameter count, a 33.6x reduction in file size, a 174.7x increase in performance (QPS), while retaining 95% of the original performance. TLT’s capabilities were particularly valuable for pruning and quantizing.

Go to AI.Reverie, download the synthetic training data for your project, and start training with TLT.

Categories
Misc

Tutorial on how to implement Hand Tracking at 30 FPS on CPU in 5 Minutes using OpenCV, Python and MediaPipe

https://youtu.be/pMXCZL8w-5Q

submitted by /u/AugmentedStartups
[visit reddit] [comments]

Categories
Misc

Tensorflow Developer Certification Exam Preparation

I am planning to give the Tensorflow Developer Certification Exam.

I have gone through a lot of resources online on how other candidates have successfully cleared this exam.

I have already gone through the TensorFlow Developer Certification Handbook (candidate handbook and environment setup) which outlines the different topics that will be covered in this exam.

I have created a learning path for myself and planning to go through the following resources:

-> Coursera Tensorflow in Practice Specialization

-> Youtube Playlist: Machine Learning Foundation by Laurence Moroney, Coding Tensorflow, MIT Introduction to Deep Learning, CNN, Sequal models by Andrew Ng

-> Pycharm Tutorial Series and Environment set up guidelines

-> Hands-on Machine Learning with Sckit Learn, Keras, and Tensorflow (Ch. 10 to Ch. 16)

Apart from the resources, I have mentioned do you recommend or suggest any other valuable source of material that I should go through or add to my current learning path?

submitted by /u/runtimeterror21
[visit reddit] [comments]

Categories
Misc

GFN Thursday Highlights Legendary Moments From the New Season of Apex Legends

GFN Thursday is our weekly celebration of games streaming from GeForce NOW. This week, we’re kicking off Legends of GeForce NOW, a special event that challenges gamers to show off the best Apex Legends: Legacy moments using one of the features that makes GeForce NOW unique — NVIDIA Highlights. Let No Victory Go Unrecorded That Read article >

The post GFN Thursday Highlights Legendary Moments From the New Season of Apex Legends appeared first on The Official NVIDIA Blog.

Categories
Misc

Am I predicting wrong?[Keras][CNN]

I am trying to implement my first CNN with Keras with https://www.kaggle.com/gpiosenka/100-bird-species dataset. At the moment to train there is no problem reaching 0.75 val_acc. But when I try to predict some new image, the results look like randoms.

from tensorflow.keras.preprocessing.image import ImageDataGenerator import os from tensorflow import random from tensorflow import keras from tensorflow.keras import layers img_size = 80 batch_size = 64 root = "../input/100-bird-species" image_generator_train = ImageDataGenerator( rescale=1./255, horizontal_flip=True) train_data_generated = image_generator_train.flow_from_directory( directory=os.path.join(root, "train"), target_size=(img_size, img_size), class_mode='categorical', batch_size=batch_size) image_generator_valid = ImageDataGenerator(rescale=1./255) valid_data_generated = image_generator_valid.flow_from_directory( directory=os.path.join(root, "valid"), target_size=(img_size, img_size), class_mode='categorical', batch_size=batch_size) keras.backend.clear_session() random.set_seed(42) num_classes = len(os.listdir("../input/100-bird-species/train")) inputs = keras.Input(shape=(img_size, img_size, 3)) x = layers.Conv2D(16, (5, 5), padding="same", activation="relu")(inputs) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Conv2D(32, (5, 5), padding="same", activation="relu")(x) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Conv2D(64, (5, 5), padding="same", activation="relu")(x) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Conv2D(128, (5, 5), padding="same", activation="relu")(x) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Flatten()(x) x = layers.Dropout(0.2)(x) x = layers.Dense(512, activation="relu")(x) output = layers.Dense(num_classes, activation="softmax")(x) model = keras.Model(inputs, output, name="bird_classifier") early_stopping = keras.callbacks.EarlyStopping( monitor='val_loss', patience=5, restore_best_weights=True ) model_checkpoint = keras.callbacks.ModelCheckpoint( "mymodel.h5", monitor='val_loss', verbose=0, save_best_only=True ) model.compile( loss=keras.losses.CategoricalCrossentropy(), optimizer=keras.optimizers.Adam(lr=3e-4), metrics=["accuracy"] ) history = model.fit(train_data_generated, validation_data=valid_data_generated, epochs=150, verbose=2, callbacks=[early_stopping, model_checkpoint] ) classes = (train_data_generated.class_indices) classes = dict((v,k) for k,v in cosas.items()) test_datagen = ImageDataGenerator(rescale=1./255) test_generator = test_datagen.flow_from_directory( "../input/onetest", target_size=(img_size, img_size), color_mode="rgb", shuffle = False, class_mode='categorical', batch_size=1) nb_samples = len(test_generator.filenames) predictions= model.predict(test_generator, steps=nb_samples) print(classes[np.argmax(predictions, axis=1)) 

I do not know if I am missing something on the train or with the predictions. Also, if u have some tip to increase this val_acc above 0.75 would be greatful.

submitted by /u/_AD1
[visit reddit] [comments]