Categories
Misc

The Future of Computer Vision

Demonstrate your computer vision expertise by mastering cloud services, AutoML, and Transformer architectures.

Computer vision is a rapidly growing field in research and applications. Advances in computer vision research are now more directly and immediately applicable to the commercial world.

AI developers are implementing computer vision solutions that identify and classify objects and even react to them in real time. Image classification, face detection, pose estimation, and optical flow are some of the typical tasks. Computer vision engineers are a subset of deep learning (DL) or machine learning (ML) engineers that program computer vision algorithms to accomplish these tasks.

The structure of DL algorithms lend themselves well to solving computer vision problems. The architectural characteristics of convolutional neural networks (CNNs) enable the detection and extraction of spatial patterns and features present in visual data.

The field of computer vision is rapidly transforming industries like automotive, healthcare, and robotics, and it can be difficult to stay up-to-date on the latest discoveries, trends, and advancements. This post highlights the core technologies that are influencing and will continue to shape the future of computer vision development in 2022 and beyond:

  • Cloud computing services that help scale DL solutions.
  • Automated ML (AutoML) solutions that reduce the repetitive work required in a standard ML pipeline.
  • Transformer architectures developed by researchers that optimize computer vision tasks.
  • Mobile devices incorporating computer vision technology.

Cloud computing

Cloud computing provides data storage, application servers, networks, and other computer system infrastructure to individuals or businesses over the internet. Cloud computing solutions offer quick, cost-effective, and scalable on-demand resources.

Storage and high processing power are required for most ML solutions. The early-phase development of dataset management (aggregation, cleaning, and wrangling) often requires cloud computing resources for storage or access to solution applications like BigQuery, Hadoop, or BigTable.

Image of data center servers.
Figure 1. Interconnected data center, representing the need for cloud computing and cloud services
(Photo by Taylor Vick on Unsplash)

Recently, there has been a notable increase in devices and systems enabled with computer vision capabilities, such as pose estimation for gait analysis, face recognition for smartphones, and lane detection in autonomous vehicles.

The demand for cloud storage is growing rapidly, and it is projected that this industry will be valued at $390.33B—five times the market’s current value in 2021. The increased market size will lead to an increase in the use of inbound data to train ML models. This correlates directly to larger data storage capacity requirements and increasingly more powerful compute resources.

GPU availability has accelerated computer vision solutions. However, GPUs alone aren’t always enough to provide the scalability and uptime required by these applications, especially when servicing thousands or even millions of consumers. Cloud computing provides the needed resources to startup and supplement existing on-premises infrastructure gaps.

Cloud computing platforms, including Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, provide end-to-end solutions to core components of the ML and data science project pipeline, including data aggregation, model implementation, deployment, and monitoring. For computer vision developers designing vision systems, it’s important to be aware of these major cloud service providers their strengths, and how they can be configured to meet specific and complex pipeline needs.

Computer vision at scale requires cloud service integration

The following are examples of NVIDIA services that support typical computer vision systems.

The NGC Catalog of pretrained DL models reduces the complexity of model training and implementation.

DL scripts provide ready-made customizable pipelines. The robust model deployment solution automates delivery to end users.

NVIDIA Triton Inference Server enables the deployment of models from frameworks such as TensorFlow and PyTorch on any GPU– or CPU-based infrastructure. Triton Inference Server provides scalability of models across various platforms, including cloud, edge, and embedded devices.

The NVIDIA partnership with cloud service providers such as AWS enables the deployment of computer vision-based assets, so computer vision engineers can focus more on model performance and optimization.

Businesses reduce costs and optimize strategies wherever feasible. Cloud computing and cloud service providers accomplish both by providing billed solutions based on usage and scaling based on demand.

AutoML

ML algorithms and model development involve a number of tasks that can benefit from automation like, feature engineering and model selection.

Feature engineering involves the detection and selection of relevant characteristics, properties, and attributes from datasets.

Model selection involves evaluating the performance of a group of ML classifiers, algorithms, or solutions to a given problem.

Both feature engineering and model selection activities require considerable time for ML engineers and data scientists to complete. Software developers frequently revisit these phases of the workflow to enhance model performance or accuracy.

Image of an analytics dashboard.
Figure 2. AutoML enables the automation of repetitive tasks such as numeric calculations
(Photo by Stephen Dawson on Unsplash)

There are several large ongoing projects to simplify the intricacies of an ML project pipeline. AutoML focuses on automating and augmentation workflows and their procedures to make ML easy accessible, and less manually intensive for non-ML experts.

Looking at the market value, projections expect the AutoML market to reach $14 billion by 2030. This would mean an increase ~42x higher than its current value.

This particular marriage of ML and automation is gaining traction, but there are limitations.

AutoML in practice

AutoML saves data scientists and computer engineers time. AutoML capabilities enable computer vision developers to dedicate more effort to other phases of the computer vision development pipeline that best use their skillset like model training, evaluation, and deployment. AutoML helps accelerate data aggregation, preparation, and hyperparameter optimization, but these parts of the workflow still require human input.

Data preparation and aggregation are needed to build the right model, but they are repetitive, time-consuming tasks that depend on locating appropriate data quality sources.

Likewise, hyperparameter tuning can take a lot of time to iterate to get the right algorithm performance. It involves a trial-and-error process with educated guesses. The amount of repeated work that goes into finding the appropriate hyperparameters can be tedious but critical for enabling the model’s training to achieve the desired accuracy.

For those interested in exploring GPU-powered AutoML, the widely used Tree-based Pipeline Optimization Tool (TPOT) is an automated ML library aimed at optimizing ML processes and pipelines through the utilization of genetic programming. RAPIDS cuML provides TPOT functionalities accelerated with GPU compute resources. For more information, see Faster AutoML with TPOT and RAPIDS.

Machine learning libraries and frameworks

ML libraries and frameworks are essential elements in any computer vision developer’s toolkit. Major DL libraries such as TensorFlow, PyTorch, Keras, and MXNet received continuous updates and fixes in 2021, and will likely continue to do so in the future.

More recently, there have been exciting advances going on in mobile-focused DL libraries and packages that optimize commonly used DL libraries.

MediaPipe extended its pose estimation capabilities in 2021 to provide 3D pose estimation through the BlazePose model, and this solution is available in the browser and on mobile environments. In 2022, expect to see more pose estimation applications in use cases involving dynamic movement and those that require robust solutions, such as motion analysis in dance and virtual character motion simulation.

PyTorch Lightning is becoming increasingly popular among researchers and professional ML practitioners due to its simplicity, abstraction of complex neural network implementation details, and augmentation of hardware considerations.

State-of-the-art deep learning

DL methods have long been used to tackle computer vision challenges. Neural network architectures for face detection, lane detection, and pose estimation all use deep consecutive layers of CNNs. A new architecture for computer vision algorithms is emerging: transformers.

The Transformer is a DL architecture introduced in Attention Is All You Need. The paper methodology creates a computational representation of data by using the attention mechanism to derive the significance of one part of the input data relative to other segments of the input data.

The Transformer does not use the conventions of CNNs, but research has shown the applications of transformer models in vision-related tasks. Transformers have made a considerable impact within the NLP domain. For more information, see Generative Pre-trained Transformer (GPT) and Bidirectional Encoder Representation From Transformer (BERT).

Explore a transformer model through the NGC Catalog that includes details of the architecture and utilization of an actual transformer model in PyTorch.

For more information about applying the Transformer network architecture to computer vision, see the Transformers in Vision: A Survey paper.

Mobile devices

Edge devices are becoming increasingly powerful. On-device inference capabilities are a must-have feature for mobile applications used by customers who expect quick service delivery and AI features.

Image of a smartphone on a table.
Figure 3. Mobile devices are a direct commercial application of computer vision features
(Photo by Homescreenify on Unsplash)

The incorporation of computer vision-enabling functionalities within mobile devices, like image and pattern recognition, reduces the latency for obtaining model inference results and provides benefits such as the following:

  • Reduced waiting time for obtaining inference results due to on-device computing.
  • Enhanced privacy and security due to the limited transfer of data between and to cloud servers.
  • Reduced cost of removing dependencies on cloud GPU and the CPU server for inference.

Many businesses are exploring mobile offerings, which includes exploring how existing AI functionality can be replicated on mobile devices. Here are several platforms, tools, and frameworks to implement mobile-first AI solutions:

Summary

Computer vision technology continues to increase as AI becomes more integrated in our daily lives. Computer vision is also becoming more and more common in the latest news headlines. As this technology scales, the demand for specialists with knowledge in computer vision systems will also rise due to trends in cloud computing service, Auto ML pipelines, transformers, mobile-focused DL libraries, and computer vision mobile applications.

In 2022, increased development in augmented and VR applications will enable computer vision developers to extend their skills into new domains, like developing intuitive and efficient methods of replicating and interacting with real objects in a 3D space. Looking ahead, computer vision applications will continue to change and influence the future.

Categories
Offsites

Olympiad level counting

Categories
Misc

Newbie could use some guidance

I’ve been reading about TensorFlow for a few days now but it’s pretty overwhelming and I’m at that stage where everything I read makes me more confused and I need to get my foot in the door. I’m trying to use AI as part of a project I’m working on and maybe use it for other things in the future.

So what I’m working on needs to perform classification on time series. Most time series stuff is about forecasting, and what little I’ve found on time series classification only seems to have one attribute. I have multiple time series which have about 45 columns and about 250 rows and I’m reading them in from an SQL database and I will put them into NumPy for TF to use. I also have a few other little bits of data which may help with the prediction but aren’t really part of the time series. So each example will consist of a 45×250 array and a 3×1 array. How do I start with this?

I understand Flatten will turn the big array into a 11250×1 array, then I could (somehow) join it with the 3×1 array. Will this mess anything up? Does TensorFlow need to understand that these are values that are changing over time or does it not care?

I also have a column which has the day of the week in text, do I need to do something with this so that TF understands it? I figure I need to turn it into an integer but then TF needs to understand that it only goes from 1 – 7 and that after 7 it loops back to 1, rather than 8.

Last of all I need to understand how to build my neural network but don’t really have a clue how to do this. are there any recommended guides out there for someone who’s coming from a Python programming background rather than a statistical analysis background?

Sorry for the stupid questions, I appreciate any help I can get.

submitted by /u/TwinnieH
[visit reddit] [comments]

Categories
Misc

Energy Grids Plug into AI for a Brighter, Cleaner Future

Electric utilities are taking a course in machine learning to create smarter grids for tough challenges ahead. The winter 2021 megastorm in Texas left millions without power. Grid failures the past two summers sparked devastating wildfires amid California’s record drought. “Extreme weather events of 2021 highlighted the risks climate change is introducing, and the importance Read article >

The post Energy Grids Plug into AI for a Brighter, Cleaner Future appeared first on NVIDIA Blog.

Categories
Misc

A couple of questions about an UNET CNN implementation in Tensorflow

So I am a beginner in this field, for a summer project, I took datasets of chest CT scans and lesions and aimed to segment them, and to diagnose specific ILDs(interstitial lung diseases) based on them. My chest segmentation model runs very well(acc>98%), but its the lesion segmentation part(which I hope will diagnose specific diseases) which is giving me problems. So my model is a multi-class model, with 17 classes(same as labels and features too right?) such as ground_glass, fibrosis, bronchial_wall_thickening and so on,and the way it works is that if the input has a specific set of these features, a specific disease can be diagnosed. 17 classes seem too much for my system(32 gb DDR4, RTX 2060 mobile), and the code crashes during the train-test split part. The code runs well if I do not read the full dataset(which contains 1724 train and 431 test images, all 512x512x1), but then I get confused which classes are being processed, and how significant are the parameter values. How should I proceed to run my model without my IDE crashing due to RAM overload, and will Colab pro do the trick? Also what can I optimize in my code, will resizing the images to 128x128x1 help? And if yes how do I proceed with that?

P.S: Here my code(the dataset is not uploaded haha but my thought process would be better understood there)

P.Sx2: Also posted this on the DeepLearning sub, my apologies if you had to read this twice.

submitted by /u/Intelligent-Aioli-43
[visit reddit] [comments]

Categories
Misc

Upcoming Event: How to Build an Edge Solution

​Join us on June 16 to take a deep dive into AI at the edge and learn how you can build an edge computing solution that delivers real-time results.

Categories
Misc

What is Extended Reality?

Advances in extended reality have already changed the way we work, live and play, and it’s just getting started. Extended reality, or XR, is an umbrella category that covers a spectrum of newer, immersive technologies, including virtual reality, augmented reality and mixed reality. From gaming to virtual production to product design, XR has enabled people Read article >

The post What is Extended Reality? appeared first on NVIDIA Blog.

Categories
Misc

Best Practices: Explainable AI Powered by Synthetic Data

Learn how financial institutions are using high-quality synthetic data to validate explainable AI models and comply with data privacy regulations.

Data sits at the heart of model explainability. Explainable AI (XAI) is a rapidly advancing field looking to provide insights into the complex decision-making processes of AI algorithms.

Where AI has a significant impact on individuals’ lives, like credit risk scoring, managers and consumers alike rightfully demand insight into these decisions. leading financial institutions are already leveraging XAI for validating their models. Similarly, regulators are also demanding insight into financial institutions’ algorithmic environment. But how is it possible to do that in practice?

Pandora’s closed box

The more advanced AI gets, the more important data becomes for explainability.

Modern day ML algorithms have ensemble methods and deep learning that result in thousands, if not millions of model parameters. They are impossible to grasp without seeing them in action when applied to actual data.

The need for broad access to data is apparent even and especially in cases where the training data is sensitive. Financial and healthcare data used for credit scoring and insurance pricing are some of the most frequently used, but also some of the most sensitive data types in AI.

It’s a conundrum of opposing needs: You want your data protected and you want a transparent decision.

Explainable AI needs data

So, how can these algorithms be made transparent? How can you judge model decisions made by machines? Given their complexity, disclosing the mathematical model, implementation, or the full training data won’t serve the purpose.

Instead, you have to explore a system’s behavior by observing its decisions across a variety of actual cases and probe its sensitivity with respect to modifications. These sample-based, what-if explorations help our understanding of what drives the decision of a model.

This simple yet powerful concept of systematically exploring changes in model output given variations of input data is also referred to as local interpretability and can be performed domain– and model-agnostic at scale. Thus, the same principle can be applied to help interpret credit-scoring systems, sales demand forecasts, fraud detection systems, text classifiers, recommendation systems, and more.

However, local interpretability methods like SHAP require access not only to the model but also to a large number of representative and relevant data samples.

Figure 1 shows a basic demonstration, performed on a model, predicting customer response to marketing activity within the finance industry. Looking at the corresponding Python calls reveals the need for the trained model, as well as a representative dataset for performing these types of analyses. However, what if that data is actually sensitive and can’t be accessed by AI model validators?

Driver analysis and dependency plots based on real-world data are built using SHAP, a local interpretability method.
Figure 1. Example of model explainability through SHAP using actual data

Synthetic data for scaling XAI across teams

In the early days of AI adoption, it was typically the same group of engineers who developed models and validated them. In both cases, they used real-world production data.

Given the real-world impact of algorithms on individuals, it is now increasingly understood that independent groups should inspect and assess models and their implications. These people would ideally bring diverse perspectives to the table from engineering and non-engineering backgrounds.

External auditors and certification bodies are being contracted to establish additional confidence that the algorithms are fair, unbiased, and nondiscriminative. However, privacy concerns and modern-day data protection regulations, like GDPR, limit access to representative validation data. This severely hampers model validation being broadly conducted.

Fortunately, model validation can be performed using high-quality AI-generated synthetic data that serves as a highly accurate, anonymized, drop-in replacement for sensitive data. For example, MOSTLY AI’s synthetic data platform enables organizations to generate synthetic datasets in a fully self-service, automated manner.

Figure 2 shows the XAI analysis being performed for the model with synthetic data. There are barely any discernible differences in results when comparing Figure 1 and Figure 2. The same insights and inspections are possible by leveraging MOSTLY AI’s privacy-safe synthetic data, which finally enables true collaboration to perform XAI at scale and on a continuous basis.

Driver analysis and dependency plots based on synthetic data are built using SHAP, a local interpretability method.
Figure 2. Example of model explainability through SHAP using synthetic data

Figure 3 shows the process of scaling model validation across teams. An organization runs a state-of-the-art synthetic data solution within their controlled compute environment. It continuously generates synthetic replicas of their data assets, which can be shared with a diverse team of internal and external AI validators.

Flow diagram depicting financial institutions using real-world data to generate synthetic data for external AI auditing and validation.
Figure 3. Process flow for model validation through synthetic data

Scaling to real-world data volumes with GPUs

GPU-accelerated libraries, like RAPIDS and Plotly, enable model validation at the scale required for real-world use cases encountered in practice. The same applies to generating synthetic data, where AI-powered synthetization solutions such as MOSTLY AI can benefit significantly from running on top of a full-stack accelerated computing platform. For more information, see Accelerating Trustworthy AI for Credit Risk Management.

To demonstrate, we turned to the mortgage loan dataset published by Fannie Mae (FNMA) for the purpose of validating an ML model for loan delinquencies. We started by generating a statistical representative synthetic replica of the training data, consisting of tens of millions of synthetic loans, composed of dozens of synthetic attributes (Figure 4).

All data is being artificially created and no single record can be linked back to any actual record from the original dataset. However, the structure, patterns, and correlations of the data are faithfully retained in the synthetic dataset.

This ability to capture the diversity and richness of data is critical for model validation. The process seeks to validate model behavior not only on the dominant majority classes but also on under-represented and most vulnerable minority segments within a population.

Comparison of real and synthetic Fannie Mae mortgage loan datasets used to validate machine learning models for loan delinquencies.
Figure 4. A snapshot of real and synthetic data samples

Given the generated synthetic data, you can then use GPU-accelerated XAI libraries to compute statistics of interest to assess model behavior.

Figure 5, for example, displays a side-by-side comparison of SHAP values: the loan delinquency model being explained on the real data and after being explained on the synthetic data. The same conclusions regarding the model can be reliably derived by using high-quality synthetic data as a drop-in alternative to the sensitive original data.

Side-by-side comparison of SHAP values for loan delinquency models explained by real-world and synthetic data shows similar conclusions.
Figure 5. SHAP values of the ML model for loan delinquencies

Figure 5 shows that synthetic data serves as a safe drop-in replacement for the actual data for explaining model behavior.

Further, the ability of synthetic data generators to yield an arbitrary amount of new data enables you to improve the model validation significantly for smaller groups.

Figure 6 shows a side-by-side comparison of SHAP values for a specific ZIP code found within the dataset. While the original data had less than 100 loans for a given geography, we leverage 10x the data volume to inspect the model behavior in that area, enabling more detail and richer insights.

Side-by-side comparison of SHAP values for a specific ZIP code found within the mortgage loan delinquency dataset using synthetic data oversampling for richer insights.
Figure 6. Richer insights by performing model validation with synthetic oversampling

Individual-level inspection with synthetic samples

While summary statistics and visualizations are key to analyzing the general model behavior, our understanding of models further benefits from inspecting individual samples on a case-by-case basis.

XAI tooling reveals the impact of multiple signals on the final model decision. These cases need not necessarily be actual cases, as long as synthetic data is realistic and representative.

Figure 7 displays four randomly generated synthetic cases with their final model predictions and corresponding decomposition for each of the input variables. This enables you to gain insights on what factor contributed to what extent and what direction to the model decision for unlimited potential cases without exposing the privacy of any individual.

Four randomly generated synthetic datasets with final model predictions and corresponding decomposition for each of the input variables.
Figure 7. Inspecting model predictions of four randomly sampled synthetic records

Effective AI governance with synthetic data

AI-powered services are becoming more present across private and public sectors, playing an ever bigger role in our daily lives. Yet, we are only at the dawn of AI governance.

While regulations, like Europe’s proposed AI Act, will take time to manifest, developers and decision-makers must act responsibly today and adopt XAI best practices. Synthetic data enables a collaborative, broad setting, without putting the privacy of customers at risk. It’s a powerful, novel tool to support the development and governance of fair and robust AI.

For more information about AI explainability in banking, see the following resources:

Categories
Misc

From Cloud to Car: How NIO Develops Intelligent Vehicles on NVIDIA HGX

Building next-generation intelligent vehicles requires an AI infrastructure that pushes the cutting edge. Electric vehicle maker NIO is using NVIDIA HGX to build a comprehensive data center infrastructure for developing AI-powered, software-defined vehicles. With high-performance compute, the automaker can continuously iterate on sophisticated deep learning models, creating robust autonomous driving algorithms in a closed-loop environment. Read article >

The post From Cloud to Car: How NIO Develops Intelligent Vehicles on NVIDIA HGX appeared first on NVIDIA Blog.

Categories
Misc

Issue with Val accuracy never changing

I’m building a model that categorizes 1/2s of data at a 256 sample rate, the input has a shape of (128, 4) and is flattened. The output has 9 different classes. I can’t seem to correctly tune the hidden layers. As soon as I start training. Training accuracy jumps to 95is percent with a very low loss, but my Val accuracy is 0.1111, which is what I expect from guessing a random class, and my Val loss is always much much higher than training loss.

Should I use a different optimizer? Different activations? What are your Guy’s recommendations?

submitted by /u/RhinoGaming1187
[visit reddit] [comments]