Categories
Misc

Tensorflow Developer Certification Exam Preparation

I am planning to give the Tensorflow Developer Certification Exam.

I have gone through a lot of resources online on how other candidates have successfully cleared this exam.

I have already gone through the TensorFlow Developer Certification Handbook (candidate handbook and environment setup) which outlines the different topics that will be covered in this exam.

I have created a learning path for myself and planning to go through the following resources:

-> Coursera Tensorflow in Practice Specialization

-> Youtube Playlist: Machine Learning Foundation by Laurence Moroney, Coding Tensorflow, MIT Introduction to Deep Learning, CNN, Sequal models by Andrew Ng

-> Pycharm Tutorial Series and Environment set up guidelines

-> Hands-on Machine Learning with Sckit Learn, Keras, and Tensorflow (Ch. 10 to Ch. 16)

Apart from the resources, I have mentioned do you recommend or suggest any other valuable source of material that I should go through or add to my current learning path?

submitted by /u/runtimeterror21
[visit reddit] [comments]

Categories
Misc

GFN Thursday Highlights Legendary Moments From the New Season of Apex Legends

GFN Thursday is our weekly celebration of games streaming from GeForce NOW. This week, we’re kicking off Legends of GeForce NOW, a special event that challenges gamers to show off the best Apex Legends: Legacy moments using one of the features that makes GeForce NOW unique — NVIDIA Highlights. Let No Victory Go Unrecorded That Read article >

The post GFN Thursday Highlights Legendary Moments From the New Season of Apex Legends appeared first on The Official NVIDIA Blog.

Categories
Misc

Am I predicting wrong?[Keras][CNN]

I am trying to implement my first CNN with Keras with https://www.kaggle.com/gpiosenka/100-bird-species dataset. At the moment to train there is no problem reaching 0.75 val_acc. But when I try to predict some new image, the results look like randoms.

from tensorflow.keras.preprocessing.image import ImageDataGenerator import os from tensorflow import random from tensorflow import keras from tensorflow.keras import layers img_size = 80 batch_size = 64 root = "../input/100-bird-species" image_generator_train = ImageDataGenerator( rescale=1./255, horizontal_flip=True) train_data_generated = image_generator_train.flow_from_directory( directory=os.path.join(root, "train"), target_size=(img_size, img_size), class_mode='categorical', batch_size=batch_size) image_generator_valid = ImageDataGenerator(rescale=1./255) valid_data_generated = image_generator_valid.flow_from_directory( directory=os.path.join(root, "valid"), target_size=(img_size, img_size), class_mode='categorical', batch_size=batch_size) keras.backend.clear_session() random.set_seed(42) num_classes = len(os.listdir("../input/100-bird-species/train")) inputs = keras.Input(shape=(img_size, img_size, 3)) x = layers.Conv2D(16, (5, 5), padding="same", activation="relu")(inputs) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Conv2D(32, (5, 5), padding="same", activation="relu")(x) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Conv2D(64, (5, 5), padding="same", activation="relu")(x) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Conv2D(128, (5, 5), padding="same", activation="relu")(x) x = layers.MaxPooling2D(pool_size=(2, 2))(x) x = layers.Flatten()(x) x = layers.Dropout(0.2)(x) x = layers.Dense(512, activation="relu")(x) output = layers.Dense(num_classes, activation="softmax")(x) model = keras.Model(inputs, output, name="bird_classifier") early_stopping = keras.callbacks.EarlyStopping( monitor='val_loss', patience=5, restore_best_weights=True ) model_checkpoint = keras.callbacks.ModelCheckpoint( "mymodel.h5", monitor='val_loss', verbose=0, save_best_only=True ) model.compile( loss=keras.losses.CategoricalCrossentropy(), optimizer=keras.optimizers.Adam(lr=3e-4), metrics=["accuracy"] ) history = model.fit(train_data_generated, validation_data=valid_data_generated, epochs=150, verbose=2, callbacks=[early_stopping, model_checkpoint] ) classes = (train_data_generated.class_indices) classes = dict((v,k) for k,v in cosas.items()) test_datagen = ImageDataGenerator(rescale=1./255) test_generator = test_datagen.flow_from_directory( "../input/onetest", target_size=(img_size, img_size), color_mode="rgb", shuffle = False, class_mode='categorical', batch_size=1) nb_samples = len(test_generator.filenames) predictions= model.predict(test_generator, steps=nb_samples) print(classes[np.argmax(predictions, axis=1)) 

I do not know if I am missing something on the train or with the predictions. Also, if u have some tip to increase this val_acc above 0.75 would be greatful.

submitted by /u/_AD1
[visit reddit] [comments]

Categories
Misc

Detection label is interchanged ( dog detected as cat and cat detected as dog)

I was learning about object detection for multiple classes. If given a cat image,it should detect a cat. If given a dog image, it should detect a dog. Here is a gist of what I did.

1) Created a dateset of dogs and labelled them. 2) Created a datset of cats and labelled them. 3) Put them both in a single folder ( dogs+cats) 4) Spllit them into train and test( 80,20) 4) Generated TF records. (test, train) 5) Downloaded pretrained ssd_mobilenetv2 from zoo, changed pipeline.config ( classes=2, batch_size=16, steps=2000) 6) Trained the model, ended at a total loss of 1.2 7) Exported the model to a .pb file 8) Tested the model by giving an image.

This is where, I am confused. If I give an image of a dog, the bounding box is showing that it is a cat, if I give an image of a cat it is shown that it is a dog. I am really confused as to where I made the mistake.

Did I make a mistake in the dataset preparation by creating the datesets separately and then merging them together? Or can anything else cause this.

Any thoughts on the causes?

submitted by /u/goalscorer101
[visit reddit] [comments]

Categories
Misc

Riding Solo: MIT Develops Single Self-Driving Network on NVIDIA DRIVE AGX Pegasus

A new approach to autonomous driving is pursuing a solo career. Researchers at MIT are developing a single deep neural network (DNN) to power autonomous vehicles, rather than a system of multiple networks. The research, published at COMPUTEX this week, used NVIDIA DRIVE AGX Pegasus to run the network in the vehicle, processing mountains of Read article >

The post Riding Solo: MIT Develops Single Self-Driving Network on NVIDIA DRIVE AGX Pegasus appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA SimNet v21.06 Released for General Availability

NVIDIA SimNet is a physics-informed neural network (PINNs) toolkit, which addresses these challenges using AI and physics.

Today, NVIDIA announces the release of SimNet v21.06 for general availability, enabling physics simulations across a variety of use cases.

NVIDIA SimNet is a Physics-Informed Neural Networks (PINNs) toolkit for engineers, scientists, students, and researchers who either want to get started with AI-driven physics simulations, or would like to leverage a powerful framework to implement their domain knowledge to solve complex nonlinear physics problems with real-world applications. 

V21.06 builds on a successful early access program of baseline features, and layers on additional new capabilities. This GA release introduces support for new physics such as Electromagnetics and 2D wave propagation, as well as delivers a new algorithm that enables wider number of use cases for simulating more complex Fluid-Thermal systems. New time stepping schemes have been implemented for solving temporal problems, treating time as both discrete and continuous.  

Other features and enhancements include a gradient aggregation method for increased batch size on each GPU, adaptive sampling for increased point cloud density in regions of high losses, homoscedastic task uncertainty quantification for loss weighting, transfer learning algorithm enables rapid training for efficient surrogate-based parameterization of STL as well as constructive solid geometries and Polynomial Chaos Expansion method for assessing how uncertainties in a model input manifest in its output. SimNet v21.06 also expands the existing network architectures with Multiplicative Filter Networks. 

SimNet v21.06 Highlights 

Electromagnetics
Frequency domain electromagnetic simulation can be carried out using SimNet v21.06. Solution of real form of frequency domain Maxwell’s equation is available either in scalar form (Helmholtz equation) for 1D, 2D and 3D cases, or in vector form for 3D case. The boundary conditions can be perfect electronic conductor (PEC) for 2D and 3D cases, radiation boundary (absorbing boundary) condition for 3D and waveguide port solver for 2D waveguide source. The implementation can solve for 2D TEz and TMz mode frequency domain electromagnetics and 3D electromagnetics in real form.  

Time Stepping Scheme for Temporal Physics
Transient simulations are required for many computational problems in such fields as fluid dynamics and electromagnetism. Until recently, neural network solvers have struggled to obtain accurate results. Using several innovations in this field, SimNet is now able solve a variety of transient problems to significantly greater degree of speed and accuracy. Shown below is Taylor-Green vortex decay using transient and turbulent Navier-Stokes simulation. 

Transfer Learning
In repetitive trainings, such as training for surrogate-based design optimization or uncertainty quantification, transfer learning reduces the time to convergence for neural network solvers. Once a model is trained for a single geometry, the trained model parameters are transferred to solve a different geometry, without having to train on the new geometry from scratch. 

Transfer learning accelerates patient-specific intracranial aneurysm simulations.  

Aneurysms with two different shapes. 

Gradient Aggregation
Training of a neural network solver for complex problems requires a large batch size that can be beyond the available GPU memory limits. Increasing the number of GPUs can effectively increase the batch size but in case of limited GPU availability, you can use gradient aggregation. With gradient aggregation, the required gradients are computed in several forward/backward iterations using different mini batches of the point cloud and are then aggregated and applied to update the model parameters. This will, in effect, increase the batch size (although at the cost of increasing the training time). 

Increasing the batch size can improve the accuracy of neural network solvers.  

4 Gradient Aggregations on 1 GPU = 4 GPUs without Gradient Aggregations.  
These results are more accurate than the 1 GPU result without any Gradient Aggregation.

Recent SimNet On-Demand Technical Sessions

  • Physics-Informed Neural Networks for Mechanics of Heterogenous Media” – IIT-Bombay presented a session on Physics-Informed Neural Networks for Mechanics of Heterogeneous Media. The PINN-based NVIDIA SimNet toolkit is used to develop a framework for the simulation of damage in elastic and elastoplastic materials. For verification, SimNet results are found in good agreement with the analytical solution based on Haghighat et al, 2020. 
  • Using Physics-Informed Neural Networks and SimNet to Accelerate Product Development” – Kinetic Vision presented a session on using Physics Informed Neural Networks and SimNet to accelerate product development where the Coanda effect, encountered in aerospace and several industrial applications, is simulated using SimNet. Both 2D and 3D geometries are constructed using SimNet’s internal Geometry module and simulated using modified Fourier Network Architecture. The results showed that qualitatively, the velocity flow field predicted by the commercial CFD code, Ansys Fluent and the trained SimNet PINN are very similar. Furthermore, Kinetic Vision did parametric simulations with SimNet and went a step further by taking these results and integrating them into CAD with SolidWorks for automated inference as well as providing a way for users to interact with SimNet from within Solidworks UI.
  • Hybrid Physics-Informed Neural Networks for Digital Twin in Prognosis and Health Management” – University of Central Florida presented a session on Hybrid Physics-Informed Neural Networks for Digital Twin in Prognosis and Health Management where a Digital twin model is built to predict damage and fatigue crack growth in aircraft window panels. SimNet models are based in physics and this ensures accuracy needed for prognosis and health management of structural materials. Once SimNet models are trained, they can be used to perform fast and accurate computations as a function of different input conditions. SimNet also achieves good accuracy that the commercial solvers achieve with high degree of mesh refinement. With SimNet, they can scale the predictive model to a fleet of 500 aircraft and get predictions in less than 10 seconds as opposed to taking a few days to weeks if they were to perform the same computations using high-fidelity finite element models. 
  • Physics-Informed Neural Network for Flow and Transport in Porous Media” – Stanford University presented a session on Physics-Informed Deep Learning for Flow and Transport in Porous Media where a methodology is used to simulate a 2-phase immiscible transport problem (Buckley-Leverett). The model can produce an accurate physical solution both in terms of shock and rarefaction and honors the governing partial differential equation along with initial and boundary conditions. Read more about this on our NVIDIA blog here.
  • AI-Accelerated Computational Science and Engineering Using Physics-Based Neural Networks” – NVIDIA presented a session on AI-Accelerated Computational Science and Engineering Using Physics-Based Neural Networks that covers state-of-the-art AI for addressing diverse areas of applications ranging from real-time simulation (e.g., digital twin and autonomous machines) to design space exploration (generative design and product design optimization), inverse problems (e.g., medical imaging, full wave inversion in oil and gas exploration) and improved science (e.g., micromechanics, turbulence) that are difficult to solve because of various gradients and discontinuities, due to physics laws and complex shapes.
  • On-Demand Webinar: “Building AI-Based Simulation Capabilities in Science & Engineering Courses with NVIDIA SimNet” – Learn how NVIDIA SimNet addresses a wide range of use cases involving coupled forward simulations without any training data, as well as inverse and data assimilation problems.

Read the paper “NVIDIA SimNet: an AI-accelerated multi-physics simulation framework. 

Join our new Forum for community discussion. Post your questions and comments pertaining to AI-driven physics simulations with NVIDIA SimNet.  

Give SimNet v21.06 a try today by downloading it here

Categories
Misc

Advancing the State of the Art in AutoML, Now 10x Faster with NVIDIA GPUs and RAPIDS

To achieve state-of-the-art machine learning (ML) solutions, data scientists often build complex ML models. However,  these techniques are computationally expensive, and until recently required extensive background knowledge, experience, and human effort. Recently, at GTC21, AWS Senior Data Scientist  Nick Erickson gave a session sharing how the combination of AutoGluon, RAPIDS, and NVIDIA GPU computing simplifies … Continued

To achieve state-of-the-art machine learning (ML) solutions, data scientists often build complex ML models. However,  these techniques are computationally expensive, and until recently required extensive background knowledge, experience, and human effort.

Recently, at GTC21, AWS Senior Data Scientist  Nick Erickson gave a session sharing how the combination of AutoGluon, RAPIDS, and NVIDIA GPU computing simplifies achieving state-of-the-art ML accuracy, while improving performance and lowering costs.  This post gives an overview of some key points from Nick’s session:

  • What is AutoML and what is different about AutoGluon?
  • How does AutoGluon outperform 99% of human data science teams in Kaggle prediction competitions with just three lines of code,  without the need for expert knowledge?
  • How does the integration of AutoGluon with RAPIDS enable up to 40x faster training and 10x faster inference?

What is AutoGluon? 

AutoGluon is an open-source AutoML library that enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning text, image, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to:

  • Quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code.
  • Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge.
  • Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing.
  • Easily improve/tune your bespoke models and data pipelines, or customize AutoGluon for your use case.

This post focuses on AutoGluon-Tabular, an AutoGluon API that requires only a few lines of Python to train highly accurate machine learning models on an unprocessed tabular dataset such as a CSV file.  In order to understand how AutoGluon-Tabular does this, we will first explain some concepts.

What is Supervised Machine Learning? 

Supervised machine learning takes a set of labelled training instances as input and builds a model that aims to correctly predict the label of each training example based on other information that we know about the example (known as features of the instance). The purpose of this is to build an accurate model that can automatically label future data with unknown labels.

The diagram shows data consisting of labels and features used to build a model. The model is then used to make predictions on new data features.
Figure 1: Supervised Machine Learning uses labeled data to build a model to make predictions on unlabeled data.

In tabular datasets, columns represent the measurements of a variable (a.k.a. feature), and rows represent individual data points.   For example, the table below shows a small dataset with three columns: “has job”, “owns house” and “income”. In this example “income” is the label (sometimes known as the target variable for prediction) and the other columns are features used to try to predict the income.

Table 1: Income dataset

Supervised Machine learning is an iterative, exploratory process that involves Data preparation, feature engineering, validation splitting, missing value handling, training, testing, hyperparameter tuning, ensembling, and evaluating ML models before a model can be used in production to make predictions.

The diagram shows machine learning consisting of feature extraction, training, and evaluating before a model can be deployed to make predictions.
Figure 2: Machine learning is an iterative process involving feature extraction, training, and evaluating before a model can be deployed to make predictions.

What is AutoML

Historically, achieving state-of-the-art ML performance required extensive background knowledge, experience, and human effort. Depending on the tool and level of automation, AutoML uses different algorithmic techniques to try to find the best features, hyperparameters, algorithms, and or combination of algorithms for an ml pipeline. By automating time-consuming ML pipelines, practitioners and enterprises can apply machine learning to solve business problems faster and more easily.

AutoML in 3 steps with AutoGluon Tabular

AutoGluon Tabular can be used to automatically build state-of-the-art models that predict a particular column’s value based on the other columns in the same row using two functions: fit (), and predict () as shown below.

from autogluon.tabular import TabularPredictor, TabularDataset
# load dataset
train_data = TabularDataset(DATASET_PATH)
# fit the model
predictor = TabularPredictor(label=LABEL_COLUMN_NAME).fit(train_data)
# make predictions on new data
prediction = predictor.predict(new_data)

The fit() function studies the dataset, performs data preprocessing, fits several models and combines them to produce a high accuracy model. For a more complete example to try out, see the AutoGluon Quick Start tutorial on predicting columns in a table. 

The diagram shows  AutoGluon fit() function building  an ML model which can be used with the predict() function.
Figure 3: The AutoGluon fit() function automatically builds an ML model which can be used to predict a particular column’s value based on the other columns in the same row with the predict() function. 

With this simple code,  AutoGluon beats other AutoML frameworks and many top data scientists.   An extensive evaluation with tests on a suite of 50 classification and regression tasks from Kaggle and the OpenML AutoML Benchmark revealed that AutoGluon is faster, more robust, and more accurate than TPOT, H2O, AutoWEKA, auto-sklearn, and Google AutoML Tables. Also in two popular Kaggle competitions, AutoGluon beat 99% of the participating data scientists after merely 4 hours of training on the raw data.

The image shows AutoGluon outperformed other AutoML frameworks and many top Kaggle data scientists.
Figure 4: AutoGluon outperformed other AutoML frameworks and many top Kaggle data scientists.

What is different about AutoGluon?

Most AutoML frameworks focus on the task of Combined Algorithm Selection and Hyperparameter optimization (CASH), offering strategies to find the best model and its hyperparameters from a wide selection of possibilities. However, CASH has some drawbacks:

  • It requires many repeated model training and most of the models are thrown away without contributing to the final result. 
  • The more hyperparameter tuning is done, the higher the risk of overfitting the validation data.
  • Hyperparameter tuning is less helpful when ensembling.

In contrast, AutoGluon-Tabular outperforms other frameworks by relying on methods used by expert data scientists to win competitions: ensembling multiple models and stacking them in multiple layers. 

How does Ensembling Work?

Ensemble learning methods combine multiple machine learning (ML) algorithms to obtain a better model. To understand this better, let’s go over Random Forests, which is an ensemble of decision trees. 

Decision trees create a model that predicts the target label by evaluating a tree of if-then-else and true/false feature questions and estimating the minimum number of questions needed to assess the probability of making a correct decision. Decision trees can be used for classification to predict a category or regression to predict a continuous numeric value. For example, the decision tree below (based on the table above) , tries to predict the label “income” using two decision nodes for the features “has job” and “owns house”.

The image shows  a simple decision tree model with two decision nodes and three leaves.
Figure 5: A simple decision tree model with two decision nodes and three leaves.

Decision trees have the advantage that they are easy to interpret, but they have problems with overfitting and accuracy. Building an accurate model is somewhere in between underfitting  and overfitting —where the model predictions match how the training data behaves and is also generalized enough to make accurate predictions on unseen data.

Decision trees seek to find the best split to subset the data, which results in harsh splits.  For example, given the dataset below on the left we want to predict the color of a dot where the lighter the dot is the higher the value. A decision tree, shown on the right, would split the data into harsh chunks.  Next we will look at how to improve on decision trees with ensembling.

The image shows an example dataset dataset on the left, where the goal is to predict the color of a dot where the lighter the dot is the higher the value. The decision tree for this dataset  on the right splits the data into harsh chunks.
Figure 6: Example dataset on the left, where the goal is to predict the color of a dot where the lighter the dot is the higher the value. The decision tree for this dataset  on the right splits the data into harsh chunks.

Ensembling is a proven approach to improve the accuracy of models, by combining their predictions and improving generalization.  Random forest is a popular ensemble learning method for classification and regression.  Random forest uses a technique called bagging (bootstrap aggregating) to build full decision trees in parallel from random bootstrap samples of the data set and features.   Predictions are made by aggregating the output from all the trees, which reduces the variance and improves the predictive accuracy. The final prediction is a majority class or mean regression of all the decision tree predictions.  Randomness is critical to the success of the forest, bagging makes sure that no decision trees are the same, reducing the problems of overfitting seen with individual trees.

Multiple decision trees training on subsets of the data are shown.
Figure 7: Random forest uses a technique called bagging to build decision trees from random bootstrap samples of the data set and features.

To understand how this gives better predictions, let’s look at an example. Here, are four different decision trees for the data set seen in figure 6, with different prediction colors for a test data point. We can see that each gives approximations of the solution which are not generalized enough to make accurate predictions.

The image shows four different decision trees for the data set seen in figure 6, with different prediction colors for a test data point.
Figure 8: Four different decision trees for the data set seen in figure 6, with different prediction colors for a test data point.
image reference https://gist.github.com/tylerwx51/fc8b316337833c877785222d463a45b0

When these four decision trees are combined and averaged together the harsh boundaries go away and are smoothed as in the random forest example below. Now  the prediction color for the test data point is a blend of the colors from the other tree predictions.

The image shows Random forest  model  for the  four decision trees from figure 8.
Figure 9: Random forest  model  for the  four decision trees from figure 8.

All of the decision trees in a random forest are suboptimal, they are all wrong in random directions. When you average the decision trees, the reasons they are wrong cancel out each other, this is called variance cancellation.   The results are of higher quality because they reflect decisions reached by the majority of trees. The averaging limits errors, even though some trees are wrong, others will be right, so the group of trees collectively moves in the correct direction.

When many uncorrelated decision trees are combined, they produce models with high predictive power resilient to over-fitting. These concepts are foundational to popular machine learning algorithms such as Random Forest, XGBoost, Catboost  and LightGBM  which are employed by AutoGluon.

Multi-layer Stack Ensembling

You can go further than this with ensembling, experienced machine learning practitioners combine outputs of RandomForest, CatBoost, k-nearest neighbors, and others to further improve model accuracy. In the  ML competition community it is hard to find a competition won by a single model, every winning solution incorporates ensembles of models.  

Stacking is a technique that uses the aggregated predictions of a collection of “base” regression or classification models as the features for training a meta-classifier or regressor “stacker” model. 

The image shows stacking technique.
Figure 10: Stacking technique.

Multi-layer stacking feeds the predictions output by the stacker models as inputs to additional higher layer stacker models. Iterating this process in multiple layers has been a winning strategy in many Kaggle competitions. Multi-layer stacking ensembles are powerful but difficult to use and implement robustly and are not currently utilized by any AutoML framework except Autogluon.

Without the need for expert knowledge, AutoGluon automatically assembles and trains a novel form of multi-layer stack ensembling with k-fold bagging shown in figure 11. Here’s how it works:

  • Base:  the first layer has multiple base models which are individually trained and bagged using k-fold ensemble bagging (discussed below).
  • Concatenating: The base layer model predictions are concatenated along with the input features, to use as input for training the next layer. 
  • Stacking: Multiple stacker models are trained on the concat layer output. Unlike traditional stacking strategies,  AutoGluon reuses the same base layer model types (with the same hyperparameter values) as stackers.  Also, the stacker models take as input not only the predictions of the models at the previous layer but also the original data features themselves.
  • Weighting: The final stacking layer applies ensemble selection to aggregate the stacker models’ predictions in a weighted manner.  Aggregating predictions across a high-capacity stack of models improve resilience against over-fitting
The image shows AutoGluon’s multi-layer stack ensembling.
Figure 11: AutoGluon’s multi-layer stack ensembling. 

k-fold Ensemble Bagging

AutoGluon improves stacking performance by utilizing all of the available data for both training and validation, through k-fold ensemble bagging of all models at all layers of the stack.   k-fold ensemble bagging is similar to k-fold cross validation, which is a method that maximizes the training dataset and is typically used for hyperparameter tuning to determine the best model parameters. With k-fold cross-validation, the data is randomly split into k partitions (folds). Each fold is used one time as the validation dataset, while the rest  (Out-Of-Fold – OOF) are used for training. Models are trained using the OOF training sets and evaluated with the validation sets, resulting in k model accuracy measurements. Instead of determining the best model and throwing away the rest,  AutoGluon bags all models and obtains OOF predictions from each model on the partition it did not see during training. This creates k-fold predictions of each model which are used as meta-features for the next layer.

The image shows k-fold Ensemble Bagging.
Figure 12: k-fold Ensemble Bagging.

To further improve predictive accuracy and reduce overfitting, AutoGluon-Tabular  repeats the k-fold bagging process on n different random partitions of the training data, averaging all OOF predictions over the repeated bags. The number n is chosen by estimating how many rounds can be completed within the specified time constraints when calling the fit() function.

Why AutoGluon Needs GPU Acceleration

Multilayer stack ensembling improves accuracy, however, this means training hundreds of models, a much more compute-intensive task than basic ML use cases, and 10 to 20 times more expensive than weighted ensembling.  In the past, the complexity and computational requirements made multilayer stack ensembling difficult to implement for many production use cases and large datasets. With AutoGluon and NVIDIA GPU computing, this is no longer the case.

Architecturally, the CPU is composed of just a few cores with lots of cache memory that can handle a few software threads at a time. In contrast, a GPU is composed of hundreds of cores that can handle thousands of threads simultaneously.  GPUs have been shown to perform over 20x  faster than CPUs in ML workflows and have revolutionized the deep learning field.

The image shows A CPU is composed of just a few cores, in contrast, a GPU is composed of hundreds of cores.
Figure 13: A CPU is composed of just a few cores, in contrast, a GPU is composed of hundreds of cores.

NVIDIA developed RAPIDS—an open-source data analytics and machine learning acceleration platform—for executing end-to-end data science training pipelines completely in GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposes that GPU parallelism and high memory bandwidth through user-friendly Python interfaces like Pandas and Scikit-Learn APIs.

With RAPIDS’s cuML,  popular machine learning algorithms like random forest,  XGBoost, and many others are supported for both single-GPU and large data center deployments. For large datasets, these GPU-based implementations can accelerate the training of machine learning models — by up to 45x in the case of random forests, over 100x for support vector machines, and up to 600x for k-nearest neighbors. These speedups can turn overnight jobs into interactive jobs, allow exploration of larger datasets, and enable trying dozens of model variants in the time it would have previously taken to train a single model.

The image shows a data science pipeline with GPUs and RAPIDS.
Figure 14: Data science pipeline with GPUs and RAPIDS.

The latest release of AutoGluon leverages the full potential of NVIDIA GPU computing through integration with RAPIDS. With these integrations, AutoGluon is able to train popular ml algorithms on GPUs and increase performance, making highly-performant AutoML accessible to a broader audience.

AutoGluon + RAPIDS Benchmark

For the 115-million-row airline dataset used in the gradient boosting machines (GBM) benchmarks suite, AutoGluon + RAPIDS accelerated training by 25x compared to AutoGluon on CPUs, with 81.92% accuracy, 7% above the XGBoost baseline. GPUs prefer longer training times as fixed start up costs become less significant.

The image shows AutoGluon + RAPIDS accelerated training by 25x compared to AutoGluon on CPUs, with 81.92% accuracy.
Figure 15: AutoGluon + RAPIDS accelerated training by 25x compared to AutoGluon on CPUs, with 81.92% accuracy.

In order to obtain 81.92% accuracy, AutoGluon + RAPIDS on GPUs trained in 4 hours versus 4.5 days for CPUs.

The image shows AutoGluon + RAPIDS on GPUs trained in 4 hours versus 4.5 days for CPUs.
Figure 16: AutoGluon + RAPIDS on GPUs trained in 4 hours versus 4.5 days for CPUs.

AutoGluon + RAPIDS on GPUs was not only faster, it also cost less, ¼ as much as CPUs to train to the same accuracy (AWS EC2 pricing: p3.2xlarge $0.9180/hr, m5.2xlarge $0.1480/hr).

The image shows AutoGluon + RAPIDS on GPUs also cost less, ¼ as much as CPUs to train to the same accuracy.
Figure 17: AutoGluon + RAPIDS on GPUs also cost less, ¼ as much as CPUs to train to the same accuracy.

To Get Started

To get started with AutoGluon and RAPIDS:

Conclusion

The AutoGluon AutoML toolkit makes it easy to train and deploy cutting-edge  accurate machine learning models for complex business problems. In addition the integration of AutoGluon with RAPIDS leverages the full potential of NVIDIA GPU computing, enabling complex models to train up to 40x faster and predict 10x faster.

For more information, see the following resource

Categories
Misc

Fully Vectorized Conv2D Implementation

Hey guys.

I wrote a post describing in detail a full vectorized implementation of the convolution operation in NumPy: https://lucasdavid.github.io/vectorization/

I would appreciate if you could give me any notes. I’m also trying to translate this to TensorFlow, but it’s not as trivial as I initially thought, considering indexing is very different (my implementation relies on selecting the multiple regions at once with `image[…, r, c]`, where `r` and `c` are two index matrices).

Any ideas on this would be greatly appreciated!
Have a great day. 🙂

submitted by /u/deepdipship
[visit reddit] [comments]

Categories
Misc

From Experimentation to Products: The Production Machine Learning Journey + Google’s experience with TensorFlow Extended (TFX)

From Experimentation to Products: The Production Machine Learning Journey + Google’s experience with TensorFlow Extended (TFX) submitted by /u/mto96
[visit reddit] [comments]
Categories
Misc

Building Real-time Dermatology Classification with NVIDIA Clara AGX

A 3-step diagram showing the workflow for skin mole detection and classification. Starting from an input, moving to the YOLOv4 model for detection, and ending with an EfficientNet model for final classification.The most commonly diagnosed cancer in the US today is skin cancer. There are three main variants: melanoma, basal cell carcinoma (BCC), and squamous cell carcinoma (SCC). Though melanoma only accounts for roughly 1% of all skin cancers, it is the most fatal, metastasizing rapidly without early detection and treatment. This makes early detection critical, … ContinuedA 3-step diagram showing the workflow for skin mole detection and classification. Starting from an input, moving to the YOLOv4 model for detection, and ending with an EfficientNet model for final classification.

The most commonly diagnosed cancer in the US today is skin cancer. There are three main variants: melanoma, basal cell carcinoma (BCC), and squamous cell carcinoma (SCC). Though melanoma only accounts for roughly 1% of all skin cancers, it is the most fatal, metastasizing rapidly without early detection and treatment. This makes early detection critical, as numerous studies show significantly better survival rates when detection is done in its earliest stages.

The current diagnosis procedure is done through a visual examination by a dermatologist, followed by a biopsy to confirm any suspected pathology. This manual examination is dependent on human subjectivity and thus suffers from error at a concerning rate. When a primary care physician looks for skin cancer, their sensitivity, or ability to identify a patient with the disease correctly, is only 0.45, while a dermatologist has a sensitivity of 0.97.

In recent years, the use of deep learning to perform medical diagnostics has become a quickly growing field. In this post, we discuss developing an end-to-end example of how deep learning could lead to an automated dermatology exam system free of human bias, using the recently announced NVIDIA Clara AGX development kit.

Datasets and models

This reference application is the pairing of two deep learning models:

  • An object detection model (YOLOv4) that looks for moles on the body through a camera. This model was trained with an original dataset created from annotating body mole images.
  • A classification model (EfficientNet) that receives moles from the object detection model and then determines if it is benign, unknown, or melanoma. The classification model was trained using the SIIM-ISIC melanoma Kaggle challenge dataset.

Figure 1 shows the workflow of the algorithm using a single video frame. The application can use a high-definition webcam or IP camera as input to the models, or even run on a previously captured video.

A 3-step diagram showing the workflow for skin mole detection and classification.  Starting from an input, moving to the YOLOv4 model for detection, and ending with an EfficientNet model for final classification.
Figure 1. Skin mole detection and classification workflow.

Clara AGX development kit

This reference application was built using the NVIDIA Clara AGX development kit, a high-end performance workstation built with medical applications in mind. The system includes an RTX 6000 GPU, delivering 200+ INT8 AI TOPs of peak performance and 24 GB of VRAM, leaving plenty of overhead for running multiple models.

A rendered image of the Clara AGX Developer Kit showing the inside the case with key components being highlighted.  The three main components are the NVIDIA Jetson AGX Xavier, NVIDIA Mellanox ConnectX-6, and an NVIDIA RTX 6000 GPU.
Figure 2. Clara AGX Developer Kit.

In addition, the AGX platform offers support for high bandwidth sensors through 100G Ethernet and an NVIDIA ConnectX-6 network interface card (NIC). NVIDIA partners are currently using the NVIDIA Clara AGX development kit to develop applications in ultrasound, genomics, and endoscopy.

The Clara AGX Developer Kit is currently available exclusively for members of the NVIDIA Clara Developer Partner Program. After you register, we’ll be in touch.

Summary

We’ve provided a research prototype of a dermatology application, but what would it take to transform this into a real application?

  • Commercially usable data. The SIIM-ISIC dataset is strictly for non-commercial use.
  • A much larger object detection dataset. The dataset that we used consisted of only a few hundred annotated images, which did lead to a larger than desired number of false positives.
  • Run the models at the “speed of light” (SOL). SOL often entails training models to run using mixed precision and then transforming the models to work with the NVIDIA TensorRT framework. TensorRT is designed to optimize model inference on NVIDIA GPUs and work with common frameworks such as PyTorch and TensorFlow. These steps would help to ensure that your application pipeline runs in real-time.
  • FDA clearance. Any developed medical application must be cleared by the FDA. Today, there are over 70 FDA-cleared AI applications, and the FDA has been active in soliciting feedback from developers in this area. This is typically a long (18 months) and arduous process, but a necessary one.

For more information, see the dermatology reference Docker container on NGC.