Gear up for some festive fun this GFN Thursday with some of the GeForce NOW community’s top picks of games to play during the holidays, as well as a new title joining the GeForce NOW library this week. And, following the recent update that enabled Ubisoft Connect account syncing with GeForce NOW, select Ubisoft+ Multi-Access Read article >
Robots have rolled into action for sustainability in farms, lower energy in food delivery, efficiency in retail inventory, improved throughput in warehouses and just about everything in between — what’s not to love? In addition to reshaping industries and helping end users, robots play a vital role in the transition away from fossil fuels. The Read article >
All of us recycle. Or, at least, all of us should. Now, AI is joining the effort. On the latest episode of the NVIDIA AI Podcast, host Noah Kravitz spoke with JD Ambadti, founder and CEO of EverestLabs, developer of RecycleOS, the first AI-enabled operating system for recycling. The company reports that an average of Read article >
Edge computing and edge AI are powering the digital transformation of business processes. But, as a growing field, there are still many questions about what…
Edge computing and edge AI are powering the digital transformation of business processes. But, as a growing field, there are still many questions about what exactly needs to be in an edge management platform.
The benefits of edge computing include low latency for real-time responses, using local area networks for higher bandwidth, and storage at lower costs compared to cloud computing.
However, the distributed nature of edge nodes can make managing edge AI complex and challenging. It can be time-consuming and costly when gathering insights from separate locations, installing hardware, deploying software, and maintaining upgrades at individual nodes.
Centralized management platforms are a critical component of a company’s edge AI solution. This enables organizations to deploy and manage industry applications at the edge, automate management tasks, allocate computing resources, update system software over the air, and monitor locations.
However, the entire stack that makes up an edge AI management solution is complicated, making the question of whether to build or buy an edge management platform exceedingly difficult.
In this post, I break down some of the most important factors to consider when evaluating an AI edge solution for your company.
To get started, consider asking the following questions:
What is the problem you’re solving? Clarify the requirements needed for your platform and prioritize them. No solution will be perfect.
What is your budget? Financial resources will inform your approach. Evaluate the cost of using vendor software compared to bringing in resources to your existing team. Management and maintenance costs are also a factor.
What is your timeline? Are there competitive reasons for you to move quickly? Remember to factor in integration and customization.
Benefits of building or buying
Similar to building a home, when building an edge management platform you are part of the entire process and maintain control of the design. This can be extremely beneficial to an enterprise, especially in terms of customization, data control, and security.
However, buying a solution can be a benefit, especially when it comes to ensuring quality and support from a vendor. Faster time-to-market and lower long-term costs are also significant advantages to buying. In the following, I lay out the top points for either option.
Benefits of building an edge management solution
Customization
Data control
Security risk
Customization
Understanding business needs is paramount to having a proper edge management solution. In doing your due diligence, you may find specific use cases or edge devices that require lots of customization. In this case, you are better off building the platform yourself.
Data control
Maintaining local storage and control of all critical data could be necessary depending on your business. It is important to ask how the third party will use your proprietary data. By building the platform, you ensure complete access and oversight to important data and business insights. If your data is a vital component of your competitive advantage, it becomes imperative to maintain this information internally.
Security risk
Enterprise-level software companies are the targets, and sometimes victims, of large-scale cyber attacks. These attacks compromise all users of their software, potentially leaking vitally important data or opening up pathways into your network. Building the entire platform in-house enables you to add security to places you deem the most important and limit exposure to any breach that a third party may have.
Benefits of buying an edge management solution
Ensured quality, expertise, and support
Faster time to market
Lower cost
Ensured quality, expertise, and support
Enterprise-edge AI management platforms are extremely complex with many layers. A solution provider is incentivized to ensure that the solution meets your needs. They have dedicated expert resources to build an optimal, enterprise-grade solution as well as provide enterprise support for all issues from low level to critical. This means that the platform not only resolves all your current needs but also solves future issues and has a dedicated resource to call upon when needed.
Faster time to market
Buying can help you deploy an edge computing solution faster. Enterprises across the world are working to find the best way to manage all their disparate edge nodes. It would be a competitive disadvantage to wait several months to build a quality solution.
Being an early adopter of edge AI management software can also give you a competitive advantage. You’re able to realize insights from your data in nearly real time and deploy or update new AI applications faster.
Lower cost
Enterprise software often has usage-based pricing, which can lower long-term expenses. Providers are able to spread maintenance and support costs, which is something you are unable to do in-house. Purchasing enterprise-grade software is a capital expenditure as opposed to an operating expense. In the long run, it tends to be cost-effective to purchase.
Risks of building or buying
There are also downsides to consider. There is some assumed risk with building your own solution. These risks—specifically around quality, opportunity cost, and support—can hinder development and slow down business growth.
But, nothing comes without risk, and buying a solution is no exception. These can be summarized into three main buckets: potential data leaks; a solution that doesn’t meet your needs; and trusting someone else to do the job. In the following section, I examine risks in detail.
Risks of building an edge management solution
Quality compromise
Technical debt
Opportunity cost
Quality compromise
A proper and complete solution must deploy AI workloads at scale, have layered security, and orchestrate containers, among other things. There is a tremendous amount of detail required to have a complete edge management platform. While this may seem simple to create, the many layers of complex software below the user interface could require an outside expert to solve your problem.
Technical debt
Another option is to extend your current solution to support edge computing and AI but that often brings more trouble than benefit. It could be costly, with additional licensing costs, and may not encompass all the benefits and features needed. A loop of continual repairs rather than rip and replace is not only costly but also time-consuming, leaving you with a platform that does not perform as needed.
Opportunity cost
Even in cases that do not require bringing in outside developers, the existing team may be of better value in building unique and custom AI applications for use cases rather than the platform. A solution provider can also offer expertise in edge computing and management, saving you time bringing the solution to market while meeting your all requirements.
Risks of buying an edge management solution
Long-term support
Access to private data
Unmet requirements
Market changes
Long-term support
By building your own solution, you also take on the cost of maintenance and support. Those costs rise as more applications and users come onto the platform. This can strain your IT personnel and end-users, while also growing operating expenses and lowering your net income.
Access to private data
The solution provider becomes a responsible owner for several components of the edge compute stack and could have access to some edge data. If there is data vital to your company’s competitive advantage, this is a risk you must consider.
Unmet requirements
The vendor’s solution may not meet the exact needs of your organization. You may have a niche or unique need that off-the-shelf products cannot solve. These could include specific connectivity, firewall, or provisioning issues limiting your ability to use a service provider.
Market changes
Using a third party could leave you vulnerable to any changes that the third party makes on their own. They could decide to leave the market or may struggle with market shifts leaving you exposed and without a trusted partner.
Choosing the right edge management solution
A lot goes into a quality edge AI management platform. While you still may be thinking through the best option, one approach to consider is a hybrid model; where you buy the primary solution but build out customizations for your organization’s needs.
This is only possible if the provider’s solution has APIs for integration. Be sure to ask if integration into other management tools and the wider ecosystem is possible. Also, when performing due diligence ask about local app data storage on-premises to minimize any data concerns.
The most important thing is to understand the capabilities of both the vendor and your own organization. Work closely with the vendor, ask for demos, ask questions about the flexibility of the pricing structure, and ensure it is a collaborative effort between all parties that are involved.
NVIDIA works with many customers who have chosen to build their own edge solutions and also offers the edge management platform NVIDIA Fleet Command. Fleet Command is a cloud service that enables the management of distributed edge computing environments at scale.
A proof-of-concept (POC) is the first step towards a successful edge AI deployment. Companies adopt edge AI to drive efficiency, automate workflows, reduce…
A proof-of-concept (POC) is the first step towards a successful edge AI deployment.
Companies adopt edge AI to drive efficiency, automate workflows, reduce cost, and improve overall customer experiences. As they do so, many realize that deploying AI at the edge is a new process that requires different tools and procedures than the traditional data center.
Without a clear understanding of what distinguishes a successful and unsuccessful edge AI solution, organizations often succumb to common pitfalls, starting in the POC process.
In fact, Gartner details that by 2025, 50% of edge computing solutions deployed without an enterprise edge computing strategy in place will fail to meet goals in deployment time, functionality, or cost.
As the leading AI infrastructure company, NVIDIA has helped countless organizations, customers, and partners successfully build their edge AI POCs. This post details the common edge AI POC challenges and solutions.
Before you start
The first decision that an organization makes before starting the process is to determine whether to buy a solution from an AI software vendor or to build their own.
Typically, companies that do not have in-house AI expertise partner with a software vendor. Vendors have insight into the best practices and can provide guidance to make the POC process as streamlined and cost-effective as possible.
Companies that have the technical capability can build a custom solution at a lower cost.
Defining the steps from development to production
While the process of developing and deploying an application may vary for different organizations, most organizations follow this process:
AI model development
Hands-on trial
Proof of concept
Production
AI model development
Your data requirements depend on whether you’re using pretrained models or building from scratch. Even when an AI application is purchased, most models must still be retrained on labeled data from your environment to achieve the desired accuracy.
Some data sources may include raw data from sensors at the edge, synthetic data, or crowdsourced data. Expect data collection to be the timeliest task of model development, followed by optimizing the training pipeline.
The purpose of this phase is to prove the feasibility of the project and model accuracy, not to get production-level performance. This phase is ongoing, as the model is continually retrained as new data is collected.
Hands-on trial
The more prepared organizations are for their POC, the smoother deployments will run. We highly recommend that you use free trials to test different software options before committing to them in the POC phase.
For example, free programs such as NVIDIA LaunchPad equip a curated experience with all of the hardware and software stacks necessary to test and prototype end-to-end solution workflows. The result is that the same stack can be deployed in production, enabling more confident software and infrastructure decisions.
Testing a solution before starting the POC streamlines the overall process and minimizes the common trap of entering a never-ending POC.
Proof of concept
The POC is a 1–3-month engagement where IT requirements are defined, hardware is acquired, and models are trained with company data and deployed in the company’s production environment to limited locations.
Unlike the hands-on trial, the key to this step is incorporating the company’s data rather than just testing standard software and hardware and generic data. The goal of a POC’s validation process is to verify the problem-solution fit, and that the solution can meet business requirements. It acts as the final test before a solution is fully scaled.
Production
In production, the AI model is deployed to every intended location and is fully functioning. Ongoing monitoring is expected.
What are the common challenges?
Following these four steps maximizes the chances of a smooth deployment. Unfortunately, most enterprises get stuck in the POC phase because they did not properly scope out the project, understand the requirements, define the measures of success, or have the correct tools and processes in place.
To get the most out of your POC program, have a solution in mind to combat the following common challenges that enterprises face when deploying AI at the edge:
Misalignment on POC design
Manual management of edge environments
POC creeps into production
Misalignment on POC design
When preparing for a POC project, first set expectations and then align on them. The steps should include identifying a high-value use case to solve, setting the project scope, determining measures of success, and ensuring stakeholder alignment.
High-value use case
Make sure that your problem statement is of high value and can be solved with AI. The key is to recognize which types of problems to hand off to the AI and which problems can be solved through managerial changes, or improved employee training.
Solving a problem that provides high value to your organization helps justify the resources and budget needed to prove the solution’s efficacy and enable scaling. Selecting a low-value use case runs the risk of the project losing focus before a full solution can be rolled out.
Examples of high-value use cases that solve a business problem include improving safety, efficiency, and customer experiences, and reducing costs and waste.
Measures of success
The purpose of a POC is to validate a solution quickly, so it’s important to run a focused POC with clear project goals.
If the success criteria are not properly defined, organizations typically experience the “moving goal post” phenomenon, where they find themselves constantly re-adjusting and re-designing the POC to meet ever-changing goals. A never-ending POC is costly and time-consuming.
The most common measures of success include:
Accuracy: Can the problem be solved with AI? Verify by testing whether the model can reach the desired accuracy. Accuracy is the first metric that should be tested. If model accuracy cannot be reached, then another solution should be put in place.
Latency: Does the solution add value to the overall system or process? It is not enough for a problem to be solvable with AI, it must provide value. For example, if a computer vision application at a manufacturing line works but requires the company to operate the line at 50% speed, the cost of slowing down the manufacturing line is not worth the benefit of using AI.
Efficiency: Is the solution cost-effective? Check whether the solution’s capital expenditures and operating expenditures are more favorable than other solutions. For example, if a network upgrade is necessary for the edge AI model to be effective, is it cheaper just to hire people to inspect products at your manufacturing line?
Defining the POC objectives, scope, and success criteria before executing the POC is the best way to understand whether the selected use case and solution can really achieve the intended benefits.
Stakeholder alignment
A POC requires a diverse team. To optimize your chances of success, identify and engage with both technical and business experts early on.
The involved stakeholders are usually business owners, AI developers, data scientists, IT, SecOps teams, and AI software providers. The AI software providers are particularly important because they have the knowledge, experience, and best practices. At this stage, identify the responsibilities of each stakeholder, including who owns the project after it scales.
Manual management of edge environments
Edge environments are unique because they are highly distributed, deployed in remote locations without trained IT staff, and often lack the physical security that a data center boasts.
These features present unique, often overlooked challenges when deploying, managing, and upgrading edge systems. It is extremely difficult and time-consuming for IT teams to troubleshoot issues manually at every remote edge site every time an upgrade is required or an issue arises.
Unfortunately, existing data center tools are not always applicable to edge AI environments. Moreover, because a POC is deployed to limited locations, organizations usually overlook a management tool during this phase and opt to update their models manually.
The POC is a highly iterative process, so implementing a management platform in this phase can help organizations save time. For customers who do not already have edge management tools in place, turnkey solutions like NVIDIA Fleet Command can help with the rollout of a POC as well as its transition to production.
Remote management
After setup, day 1, and day 2 operations begin, organizations must deploy and scale new applications, update existing applications, troubleshoot bugs, and validate new configurations.
Having remote management capabilities that are secure is critical because production deployments contain important data and insights that you want to keep safe.
Third-party access
Organizations should implement a management solution with advanced functionality for third-party access and security functions such as just-in-time (JIT) access, clearly defined access controls, and timed sessions.
Software vendors, system integrators, and hardware partners are just a few different parties that may need access to your systems. Coupled with remote management functionality, third parties can help make updates to your POC environment without gaining physical access to your edge location.
Monitoring
Tracking performance is important, even in the POC phase, because it can help with sizing and showing where bottlenecks may occur. These are important considerations to iron out before scaling.
POC creeps into production
A POC does not have to be fully production-ready for it to be successful. While it is true that the closer an organization can get to production specs in the POC phase, the easier it will be to scale, most POCs are not designed for production.
Many times, companies use whatever hardware or software they have on hand. This means that upon completion of a POC, businesses should go back and update their models and hardware before their final deployment. Many do not.
Here are some tips for transitioning from POC to production.
Measure efficacy
Track the efficacy of all software and hardware to help make decisions on what should be moved into production, and what must be upgraded.
Use enterprise-grade hardware and software
While it is okay to use existing systems that a business may already have during a POC, take extra time to understand what systems are needed for production and any implications of that change.
Only use software from a trusted source with a line of support to speak to when needed. Many organizations deploying edge applications download software online without researching whether it is from a trusted source and then they accidentally download malware.
Prepare for success
Ultimately, POCs are just the first step to a successful deployment. They are designed to help organizations determine whether a project should move forward and whether it is an effective use of their resources. Edge AI is a paradigm shift for most organizations. To avoid common pitfalls when deploying your solution, see An IT Manager’s Guide: How to Successfully Deploy an Edge AI Solution.
Nowadays, a huge number of implementations of state-of-the-art (SOTA) models and modeling solutions are present for different frameworks like TensorFlow, ONNX,…
Nowadays, a huge number of implementations of state-of-the-art (SOTA) models and modeling solutions are present for different frameworks like TensorFlow, ONNX, PyTorch, Keras, MXNet, and so on. These models can be used for out-of-the-box inference if you are interested in categories already in the datasets, or they can be embedded to custom business scenarios with minor fine-tuning.
This post gives you an overview of prevalent DL model categories and walks you through the end-to-end examples of deploying these models using NVIDIA Triton Inference Server. The client applications can be used as it is or can be modified according to the use case scenarios. I walk you through the deployment of image classification, object detection, and image segmentation public models using Triton Inference Server. The steps outlined in this post can also be applied to other open-source models with minor changes.
Deep learning inference challenges
Recent years have seen remarkable advancements in deep learning (DL). By resolving numerous complex and intricate problems that have hampered the AI community for years, it has completely revolutionized the future of AI. It is currently being used with rapidly growing applications in different industries, ranging from healthcare and aerospace engineering to autonomous driving and user authentications.
Deep learning, however, has various challenges when it comes to inference:
Support of multiple frameworks
Ease of use
Cost of deployment
Support of multiple frameworks
The first key challenge is around supporting multiple different types of model frameworks.
Developers and data scientists today are using various frameworks for their production models. For instance, there can be difficulties modifying the system for testing and deployment if a machine learning project is written in Keras, but a team member has more experience with TensorFlow.
Also, converting the models can be expensive and complicated, especially if new data is required for their training. They must have a server application to support each of those models.
Ease of use
The next key challenge is to have a serving application that can support different inference queries and use cases.
In some applications, you’re focused on real-time online inferencing where the priority is to minimize latency as much as possible. On the other hand, there might be use cases that require you to do offline batch inferencing where you’re focused on maximizing throughput.
It’s essential to have solutions that can support each type of query and use case and optimize for them.
Cost of deployment
The next challenge is managing the cost of deployment and lowering the cost of inference.
A key part of this is having one serving application that can support running on a mixed infrastructure. You might create a separate serving solution for running on CPU, another one for GPU, and a different one for deploying on the cloud in the data center and edge. That’s going to skyrocket costs and lead to a nonscalable implementation.
Triton Inference Server
Triton Inference Server is an open-source server inference application allowing inference on both CPU and GPU in different environments. It supports various backends, including TensorRT, PyTorch, TensorFlow, ONNX, and Python. To have maximum hardware utilization, NVIDIA Triton allows concurrent execution of different models. Further dynamic batching allows grouping together inference queries to maximize the throughput for different types of queries. For more information, see NVIDIA Triton Inference Server.
Quickstart with NVIDIA Triton
The easiest way to install and run NVIDIA Triton is to use the pre-built Docker image available from NGC.
Server: Pull the Docker image
Pull the image using the following command:
$ docker pull nvcr.io/nvidia/tritonserver:-py3
NVIDIA Triton is optimized to provide the best inferencing performance by using GPUs, but it can also work on CPU-only systems. In both cases, you can use the same Docker image.
Use the following command to run NVIDIA Triton with the example model repository that you just created:
In this command, is the version to pull. Run the client image.
To start the client, run the following command:
$ docker run -it --rm --net=host /path/to/the/repo/client/:/python_examples nvcr.io/nvidia/tritonserver:-py3-sdk
End-to-end model deployment
The NVIDIA Triton project provides several client libraries in C++ and Python to simplify communication. These APIs make communicating with NVIDIA Triton easy. With the help of these APIs, the client applications process the input and communicate with NVIDIA Triton to perform inferencing.
In general, the interaction of client applications with NVIDIA Triton can be summarized as follows:
Input
Preprocess
Inference
Postprocess
Output
Input: Depending upon the application type, one or more inputs are read to be inferred by the neural network.
Preprocess: Preprocessing data is a common first step in the deep learning workflow to prepare raw data in a format the network can accept, For example, image resizing, normalization, or noise removal from input data.
Inference: For the inference part, a client initially serializes the inference request into a message and sends it to Triton Inference Server. The message travels over the network from the client to the server and gets deserialized. The request is placed on the queue. The request is removed from the queue and computed. The completed request is serialized in a message and sent back to the client. The message travels over the network from the server to the client. The message arrives at the client and is deserialized.
Postprocess: When the message arrives at the client application, it is processed as a completed inference request. Depending upon the network type and application use case, post-processing is applied. For example, in object detection, postprocessing involves suppressing the superfluous boxes, aiding in selecting the best possible boxes, and mapping them back to the input image.
Output: After inference and processing, depending upon the application, the output can be stored, displayed, or passed to the network.
Image classification
Image classification is the task of comprehending an entire image and specifying a specific label for the image. Typically in image classification, a single object is present in the image, which is analyzed and comprehended. For more information, see image classification.
Server: Download the model
Download the ResNet-18 image classification model from the ONNX model zoo:
$ cd /path/to/the/repo/server/models/classification/1
$ wget https://github.com/onnx/models/raw/main/vision/classification/resnet/model/resnet18-v1-7.onnx && mv resnet18-v1-7.onnx model.onnx
The following code example shows the model configuration file:
The name property is optional. If the name of the model is not specified in the configuration, it is assumed to be the same as the model repository directory containing the model. The model is executed by the NVIDIA Triton backend, which is simply a wrapper around the DL frameworks like TensorFlow, PyTorch, TensorRT, and so on. For more information, see backend.
Maximum batch size
The maximum batch size that a model can support is indicated by the max_batch_size property. Zero size shows that bathing is not supported. For more information, see batch size.
Inputs and outputs
For each model, the expected input, output, and data types must be specified in the model configuration file. Based on the input and output tensors, different data types are allowed. For more information, see Datatypes.
The image classification model accepts a single input, and after the inference returns a single output.
In a separate console, launch the image_client example from the NGC NVIDIA Triton container.
Client: Run the image classification client
To run the image classification client, use the following command:
For the classification case, the model returns a single classification output that comprehends the input image. The class is decoded and printed in the console.
for results in output_array:
if not supports_batching:
results = [results]
for result in results:
if output_array.dtype.type == np.object_:
cls = "".join(chr(x) for x in result).split(':')
else:
cls = result.split(':')
print(" {} ({}) = {}".format(cls[0], cls[1], cls[2]))
For more information, see classification.py.
Figure 4 shows the sample output.
Object detection
The process of finding instances of objects of a particular class within an image is known as object detection. The problem of object detection combines classification with localization. It also examines more plausible scenarios in which an image might contain several objects. For more information, see object detection.
Server: Download the model
Download the faster_rcnn_inception_v2_coco object detection model:
$ cd /path/to/the/repo/server/models/detection/1
$ wget http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz && tar xvf faster_rcnn_inception_v2_coco_2018_01_28.tar.gz && cp faster_rcnn_inception_v2_coco_2018_01_28/frozen_inference_graph.pb ./model.graphdef && rm -r faster_rcnn_inception_v2_coco_2018_01_28 faster_rcnn_inception_v2_coco_2018_01_28.tar.gz
The following code example shows the model configuration file for the object detection model:
The process of clustering parts of an image that correspond to the same object class is known as image segmentation. Image segmentation entails splitting images or video frames into multiple objects or segments. For more information, see image segmentation.
Server: Download the model
To download the model, use the following commands:
$ cd /path/to/the/repo/server/models/segmentation/1
$ wget https://github.com/onnx/models/raw/main/vision/object_detection_segmentation/fcn/model/fcn-resnet50-11.onnx && mv fcn-resnet50-11.onnx model.onnx
The following code example shows the model configuration file for the image segmentation model:
The segmentation model accepts a single input and returns a single output. After inferencing, the model returns the output based on which segmented and blended images are generated.
Learn how to write simple, portable, parallel-first GPU-accelerated applications using only C++ standard language features in this self-paced course from the…
Learn how to write simple, portable, parallel-first GPU-accelerated applications using only C++ standard language features in this self-paced course from the NVIDIA Deep Learning Institute