A one liner : For the DevOps nerds, AutoDeploy allows configuration based MLOps.
For the rest : So you’re a data scientist and have the greatest model on planet earth to classify dogs and cats! :). What next? It’s a steeplearning cusrve from building your model to getting it to production. MLOps, Docker, Kubernetes, asynchronous, prometheus, logging, monitoring, versioning etc. Much more to do right before you The immediate next thoughts and tasks are
How do you get it out to your consumer to use as a service.
How do you monitor its use?
How do you test your model once deployed? And it can get trickier once you have multiple versions of your model. How do you perform A/B testing?
Can i configure custom metrics and monitor them?
What if my data distribution changes in production – how can i monitor data drift?
My models use different frameworks. Am i covered? … and many more.
What if you could only configure a single file and get up and running with a single command. That is what AutoDeploy is!
Read our documentation to know how to get setup and get to serving your models.
AI is at play on a global stage, and local developers are stealing the show. Grassroot communities are essential to driving AI innovation, according to Kate Kallot, head of emerging areas at NVIDIA. On its opening day, Kallot gave a keynote speech at the largest AI Expo Africa to date, addressing a virtual crowd of Read article >
The transportation industry is adding more torque toward realizing autonomy, electrification and sustainability. That was a key takeaway from Germany’s premier auto show, IAA Mobility 2021 (Internationale Automobil-Ausstellung), which took place this week in Munich. The event brought together leading automakers, as well as execs at companies that deliver mobility solutions spanning from electric vehicles Read article >
Running PyCarert on GPU not only streamline model building but offsets the time cost.
PyCaret is a low-code Python machine learning library based on the popular Caret library for R. It automates the data science process from data preprocessing to insights, such that short lines of code can accomplish each step with minimal manual effort. In addition, the ability to compare and tune many models with simple commands streamlines efficiency and productivity with less time spent in the weeds of creating useful models.
This post will go over how to use PyCaret on GPUs to save both development and computation costs by an order of magnitude.
All benchmarks were run with nearly identical code on a machine with a 32-core CPU and four NVIDIA Tesla T4s. For simplicity, GPU code was written to run on a single GPU.
Getting started with PyCaret
Using PyCaret is as simple as importing the library and executing a set-up statement. The setup() function creates the environment and offers a host of pre-processing features all in one go.
After a simple setup, a data scientist can develop the rest of their pipeline, including data preprocessing/preparation, model training, ensembling, analysis, and deployment. After the data is prepared, a great place to start is by comparing models.
True to PyCaret’s ethos of simplicity, we can compare a host of standard models to see which are best for our data with a single line of code. The compare_models command trains all the models in PyCaret’s model library using default hyperparameters and evaluates performance metrics using cross-validation. A data scientist can then select the models they’d like to use, tune, and ensemble based on this info.
Figure 1: Output of the compare_models command in PyCaret.
**Models are sorted best to worst, and PyCaret highlights the top results in each metric category for ease of use.
Accelerating PyCaret with RAPIDS cuML
PyCaret is a great tool for any data scientist to have in their arsenal, as it streamlines model building and makes running many models easy. PyCaret can be made even better with GPUs. Since PyCaret does so much work behind the scenes, seemingly simple commands can take a long time. For example, we ran the commands preceding on a dataset with roughly half a million instances and over 90 attributes (UC Irvine’s Year Prediction MSD dataset). On the CPU, it took over 3 hours. On a GPU, it took less than half that.
In the past, using PyCaret on a GPU would have required many manual coding, but thankfully, the PyCaret team has integrated the RAPIDS machine learning library (cuML), meaning you can use the same simple API that makes PyCaret so effective while also using the computational ability of your GPU.
Running PyCaret on a GPU tends to be much faster-meaning you can make full use of everything PyCaret has to offer without balancing time costs. Using the same dataset just mentioned, we tested PyCaret ML functionality on both a CPU and a GPU, including comparing, creating, tuning, and ensembling models. Performing the switch to GPU is simple; we set use_gpu to True in the setup function:
With PyCaret set to run on GPU, it uses cuML to train all of the following models:
Logistic Regression
Ridge Classifier
Random Forest
K Neighbors Classifier
K Neighbors Regressor
Support Vector Machine
Linear Regression
Ridge Regression
Lasso Regression
K-Means Clustering
Density-Based Spatial Clustering
Running the same compare_models code solely on GPU was over 2.5 times as fast.
The impact was even greater on a model-by-model basis with popular but computationally expensive models. The K Neighbors Regressor, for example, was 265 times as fast on GPU.
Figure 2: Comparison of common PyCaret actions run on CPU versus GPU.
Impact
The simplicity of PyCaret’s API frees up time that would otherwise be spent coding so data scientists can do more experiments and fine-tune their experiments. When paired with GPUs, this impact is even greater, as the computation costs of taking full advantage of PyCaret’s suite of evaluation and comparison tools are significantly lower.
Conclusion
Extensive comparing and evaluating models can help improve the quality of your results, and doing so efficiently is exactly what PyCaret is for. PyCaret on GPU offsets the time costs that go along with so much processing.
The goal of RAPIDS is to accelerate your data science, and PyCaret is among a growing list of libraries whose compatibility with the RAPIDS suite can help bring a new layer of efficiency to your machine learning pursuits.
Combining mentoring, socializing, and specialized training proved key for the virtual 2021 KISTI GPU Hackathon.
Due to the coronavirus, the 2021 Korea Institute of Science and Technology Information (KISTI) GPU Hackathon was held virtually, under the guidance of expert mentors from KISTI, NVIDIA, and the OpenACC Organization. With the goal of inspiring possibilities for scientists to accelerate their AI research or HPC codes, the hackathon provided opportunities for solving research problems and expanding expertise using NVIDIA GPU parallel computing technology.
Known for being a face-to-face event, a virtual hackathon poses its own challenges for both attendees and hosts. The new format also required juggling a diversity of teams—composed of three HPC and AI teams, four higher education and research teams, and two industry teams.
The event team found the following recipe helped create a meaningful and successful experience for the participants:
Mentoring
Based on their expertise in specific domains or programming languages, dedicated mentors were paired with teams for guidance in setting goals, and considering different approaches. The mentors collaboratively worked to solve problems and troubleshoot obstacles the teams encountered. Daily mentor sync-up calls each day kept everyone focused and working toward the best strategy for meeting their goals.
Figure 1. KISTI GPU Hackathon 2021.
Socializing
Everyone knows that all work and no play can actually hinder a team’s productivity. The hackathon provided a TGIF social hour session for participants and mentors. Using the Metaverse Gather Town Space, mentors and teams shared experiences, recharged their batteries, and developed connections that helped them continue forward for the duration of the event.
Figure 2. The TGIF social hour.
Resources and Live Seminars
Another important ingredient to success was making specialized training and resources available to attendees. For example, an NVIDIA Deep Learning Institute (DLI) workshop covering CUDA C/C++ topics was presented by a DLI ambassador and mentor. Other mentors provided team-dedicated tech sessions focused on TRT and NVIDIA Triton, OpenACC, and Nsight systems for profiling, parallel computing, and optimization.
Figure 3. PaScaL team working on their project.
Hard Work Pays Off
The PasCal team from Yonsei University is developing a thermal fluid solver that efficiently calculates the thermal motion of turbulence. At this hackathon, the team converted existing code based on CPUs to a multi-GPU environment through OpenACC and cuFFT Library. This resulted in accelerating the calculations by 4.84 times RHS (right-hand side, fraction step) of one of most time consuming subroutines.
The Amore Opt team from AmorePacific cosmetics company worked on GPU optimization of DeepLabV3 + segmentation model. By applying what they learned about the TensorRT Inference optimizer and NVIDIA Triton inference server, they improved inference speed making it 26 times faster. They did this while maintaining the accuracy in their AI models to detect skin problems for their future large-scale customer service.
Video 1. TFC Team interview for KISTI Hackathon.
The TFC team from Seoul National University joined a project to accelerate a CPU-based Fortran in-house fluid calculation code. By using NVIDIA GPUs at KISTI, the team accelerated the time-consuming Tri-Diagonal Matrix Algorithm (TDMA) for thermal solver and momentum solver and Fast Fourier Transform (FFT) for pressure solver calculations. They achieved a speed 11.15 times faster on a single V100 GPU.
NVIDIA Inception member Nota and HangYang University teamed-up to optimize the Nota Model compression engine by leveraging the Tensor Core in NVIDIA GPUs for INT4 quantization. Named NOTA-HYU, the team learned to use NVIDIA profiling tools Nsight system and Nsight Compute. They then applied NVIDIA library CUTLASS to achieve an overall speed 1.85 times faster for their residual block with CUDA optimization.
I just started using Tensorflow and Keras not long along ago, and I really like the field of deep learning. Right now I am doing it as more of a hobby than anything, and I recently learned about the concepts of transfer learning and fine-tuning. I tried to apply them to a dataset of microscopic images using the tutorial here: https://www.tensorflow.org/tutorials/images/transfer_learning.
I am using ResNet50 with the ImageNet weights, but am far from getting good results. I think it might be because of the learning rate OR because of the activation function in my last layer OR because of the fact that I use the Adam optimizer and not SGD.
Learn how AI is transforming the retail industry through enabling intelligent stores, omnichannel management, and automated supply chains.
Global retailers and suppliers are faced with navigating rapidly changing consumer demand, behavior, and expectations. These changes are driving uncertainty in forecasting and putting pressure on global supply chains as well as omni-channel and store operations. Real-time agility is required to navigate the rapidly evolving challenges for the retail industry.
Artificial intelligence offers a powerful solution for retailers to rapidly and more accurately forecast daily demand and automate supply chain logistics. This has a major impact on in-store supply and last-mile delivery cost challenges, while meeting consumer expectations on delivery timelines. For online shoppers, AI is helping create personalized shopping journeys and product recommendations. In-store, retailers are using AI to reduce shrinkage and stockout, while creating frictionless experiences and ensuring the health and safety of employees and shoppers.
The financial opportunity is significant across the $26 trillion retail industry, which historically has averaged a 2% net profit margin. An estimated 3X increase in profit margin with AI-enabled solutions would lead to over a $1 trillion increase in annual revenue for retailers, according to analysis by the McKinsey Global Institute. From customer engagement, to operational agility, to seamless omnichannel management, AI at the edge is transforming retail.
Retailers Adopt AI for Improved In-Store Experiences, Less Shrinkage
Retailers are leveraging AI at the edge to analyze data from in-store cameras and sensors, and create intelligent stores. One application alerts store associates when shelf inventory levels are low, reducing the impact of stockout. Another is decreasing shrinkage—the loss of inventory from theft, errors, fraud, waste, and damage—which costs the industry an estimated $100 billion per year globally. AI helps retailers protect their assets with store analytics that monitor points-of-sale and floor merchandise to prevent ticket switching, misscans, and shoplifting.
Deep North, a computer vision startup, uses intelligent video analytics to power in-store analytics, providing insights to customer traffic and heatmaps, determining dwell times, queue/wait times, conversion metrics, demographics, and more. These insights help retailers improve inventory planning, store layout and merchandising, and improve the customer experience and conversion.
Frictionless store concepts are being tested where shoppers can skip the checkout lines entirely and be automatically billed for their purchases. AiFi is a leader in these AI-enabled autonomous stores from nano stores to full-sized grocery chains. These “grab-and-go” stores are rising in popularity and locations are projected to expand 4X in the next 3 years.
Figure 1. AiFi provides autonomous shopping solutions leveraging computer vision powered by NVIDIA Metropolis for in-store analytics and the NVIDIA EGX platform for AI at the edge.
AI Personalizes Online Shopping
Personalized experiences are critical in ecommerce, which can account for as much as 30% of revenue for the world’s largest retailers. Underlying ecommerce are sophisticated recommender systems (RecSys). GPU-powered machine learning and deep learning enable recommenders by learning how customers shop, personalizing their shopping experience, and finding related items that shoppers are most likely to purchase. NVIDIA has deep domain expertise with our data science teams winning three global RecSys competitions in 5 months.
To support the online personalization and recommendations, global retailers are using AI to automatically generate metadata for new items listed in their vast digital catalogs. With up to millions of new products to onboard, comprehensively and accurately labeling and describing every product is a daunting task. AI quickly produces accurate, comprehensive, and engaging product content that a RecSys engine uses to provide personalized recommendations that attract a shopper’s interest.
Visual search is a key trend for retailers to provide a customer-centric experience with product search and discovery. Clarifai, a startup and NVIDIA Inception member, uses computer vision and AI to deliver more relevant search results and hyperpersonalized product recommendations quickly. With “snap-and-search” capabilities, shoppers are able to take a photo of a product they’re interested in with their mobile device and automatically have the photo matched to a product catalog. Related products to complete their desired outfit, room, or other look are also shared.
Smart virtual assistants, used by hundreds of millions of users each month, are also driving growth in ecommerce. Retailers are looking to improve the digital customer experience by introducing voice ordering to replace text-based searches with voice commands. With voice ordering, shoppers can easily search for items, ask for product information, and place online orders using smart speakers or other intelligent mobile devices.
Creating a Resilient Global Supply Chain
AI empowers retailers to create more resilient supply chains that respond quickly to changing consumer demand and effectively manage inventory distribution.
Walmart, for example, uses AI to run accurate daily forecasts of millions of item-to-store combinations across thousands of stores in the United States. Built on open-source data processing and machine learning libraries, this AI-powered platform enabled Walmart to get the right products to the right stores more efficiently, react in real time to shopper trends, and realize inventory cost savings at scale.
In smart warehouses, AI improves operational efficiency and throughput with customer orders automatically picked, packed, and shipped by robots. These robots use edge computing and intelligent video analytics to identify, position, and sort packages while adjusting the speed of conveyor belts to minimize product damage and machine downtime.
KION Group simplifies the deployment and management of AI at the edge— including autonomous forklifts and pick-and-place robotics—across thousands of retail distribution centers.
Figure 2. KION uses pick-and-place robots in its intelligent warehouses with AI deployed at the edge.
AI delivers end-to-end visibility that combines data from GPS, weather, traffic, and construction to determine optimal shipping routes. This can significantly reduce overhead costs for last-mile delivery associated with fuel, transportation, and delivery personnel—and can provide more accurate delivery windows that enhance customer service and create more satisfied shoppers.
Future of Edge Computing in Retail and Beyond
Billions of connected sensors are coming online in retail stores, city streets, hospitals, warehouses, and more. By deploying and managing scalable, secure AI at the edge, enterprises can turn this data into faster insights and more robust services and applications.
BlueField-2 offers protection, while delivering high security, integrity, and reliability for the new hybrid cloud era.
Data Processing Units, or DPUs, are the new foundation for a comprehensive and innovative security offering. The hyperscale giants and telecom providers have adopted this strategy for building and securing highly efficient cloud data centers, and it’s now available for enterprise customers. This strategy has revolutionized the approach to minimize risks and enforce security policies inside the data center.
A DPU is an integrated system on a chip (SOC) that combines high-performance CPU, network interface, and data center function accelerators into a single ASIC. DPUs utilize programmable hardware to offload and accelerate inline security services at line-rate.
BlueField DPU
The NVIDIA® BlueField®-2 DPU provides the first line of defense against attacks. Internal attacks that try to infiltrate the data center are prevented through an isolated, secure boot process and secure firmware updates. Aimed at accelerating security throughout the data center, DPUs are capable of filtering packets in support of next-generation firewalls and intrusion detection/prevention systems. A DPU can detect, block, and protect sensitive assets and data from threats.
The design offers built-in functional isolation to protect individual workloads and provide flexible control and visibility at the server level, reducing risk, and increasing efficiency. The isolation enables software agents and applications to run securely on the DPU, irrespective of the rest of the system.
Isolated from the host, and leveraging unique hardware capabilities, the DPU can deliver better security by reducing the attack surface and bolstering individual workload isolation—providing additional protection to reduce risks and simplify security management policies.
Furthermore, DPUs offload and accelerate cryptographic operations and offer encryption of data at rest or in motion. Freeing the host CPU to run critical applications, and being isolated from the application domain, keeps the cryptographic keys secure from a potentially compromised host.
The Changing Role of Firewall Security
The adoption of a cloud compute model calls for intelligent security solutions capable of delivering maximum performance and agility. In the age of hybrid cloud and virtualized computing, security functions have transformed. They are being deployed within every host to provide strict policy enforcement, as well as visibility into possible attacks.
Previously, protection within the host required running a software security solution or VM appliance, which was slow and consumed CPU resources. As new technology and bandwidth requirements increase, host-based protection now requires hardware performance at line-rate.
BlueField-2 DPU can be used as such a device to filter traffic traveling between computers. NVIDIA DPUs employ powerful hardware and software networking components that are programmable, with a configurable set of rules to inspect all traffic traversing the connection at line-rate. When combined with perimeter security solutions, a DPU can extend security to include host-based protection.
The DPU can be used as a platform that sits at the ingress and egress points of each host, adding a new layer where it’s needed most—at the computing edge. Together, they better protect against malicious threats to company assets and resources.
Software-based firewalls also place an additional burden on the CPU, reducing available computing resources for application processing and decreasing the number of VM, applications, and services that can run on a single host. BlueField-2 DPU security platform, with a software security stack, offloads security services from the host, freeing CPU cycles for applications running on host resources.
Micro-Segmentation
The complexity of security increases with the use of computing and network virtualization technologies. Micro-segmentation is a network security technique in virtualized environments that logically divides a data center into distinct individual security segments at the workload level. At this low-level, security controls can be defined, and security services delivered for each unique segment. BlueField-2 provides a platform for hosting micro-segmentation and network connection tracking services for flow-based network analytics and application-level security for traffic inside the data center.
The DPU can run a security software stack, leaving no impact on servers or hindering application performance. This hardware-accelerated solution includes security policies and enforcement capabilities at wire speed and is fully isolated from the application workloads themselves.
The decade-old approach of perimeter security defense has reached a tipping point of performance limitations and operational complexities. A new holistic approach to security is needed for enterprises to achieve robust protection. Enterprise data centers are evolving and following the lead of hyperscale and public cloud providers with CDI and should also adopt their similar tactics for network security. BlueField-2 can offer protection for all types of workloads against current and future threats, and can deliver the highest security, integrity, and reliability for the new hybrid cloud era.
As the cybersecurity landscape grows in complexity, security experts continue to turn to NVIDIA BlueField-2 as a platform to provide advanced security features.
Our security experts are here to help answer questions about how BlueField-2 can augment and expand your security solution. If you’re a security architect or product manager at a cybersecurity company, contact us.