Categories
Offsites

Accelerating Text Generation with Confident Adaptive Language Modeling (CALM)

Language models (LMs) are the driving force behind many recent breakthroughs in natural language processing. Models like T5, LaMDA, GPT-3, and PaLM have demonstrated impressive performance on various language tasks. While multiple factors can contribute to improving the performance of LMs, some recent studies suggest that scaling up the model’s size is crucial for revealing emergent capabilities. In other words, some instances can be solved by small models, while others seem to benefit from increased scale.

Despite recent efforts that enabled the efficient training of LMs over large amounts of data, trained models can still be slow and costly for practical use. When generating text at inference time, most autoregressive LMs output content similar to how we speak and write (word after word), predicting each new word based on the preceding words. This process cannot be parallelized since LMs need to complete the prediction of one word before starting to compute the next one. Moreover, predicting each word requires significant computation given the model’s billions of parameters.

In “Confident Adaptive Language Modeling”, presented at NeurIPS 2022, we introduce a new method for accelerating the text generation of LMs by improving efficiency at inference time. Our method, named CALM, is motivated by the intuition that some next word predictions are easier than others. When writing a sentence, some continuations are trivial, while others might require more effort. Current LMs devote the same amount of compute power for all predictions. Instead, CALM dynamically distributes the computational effort across generation timesteps. By selectively allocating more computational resources only to harder predictions, CALM generates text faster while preserving output quality.

Confident Adaptive Language Modeling

When possible, CALM skips some compute effort for certain predictions. To demonstrate this, we use the popular encoder-decoder T5 architecture. The encoder reads the input text (e.g., a news article to summarize) and converts the text to dense representations. Then, the decoder outputs the summary by predicting it word by word. Both the encoder and decoder include a long sequence of Transformer layers. Each layer includes attention and feedforward modules with many matrix multiplications. These layers gradually modify the hidden representation that is ultimately used for predicting the next word.

Instead of waiting for all decoder layers to complete, CALM attempts to predict the next word earlier, after some intermediate layer. To decide whether to commit to a certain prediction or to postpone the prediction to a later layer, we measure the model’s confidence in its intermediate prediction. The rest of the computation is skipped only when the model is confident enough that the prediction won’t change. For quantifying what is “confident enough”, we calibrate a threshold that statistically satisfies arbitrary quality guarantees over the full output sequence.

Text generation with a regular language model (top) and with CALM (bottom). CALM attempts to make early predictions. Once confident enough (darker blue tones), it skips ahead and saves time.

Language Models with Early Exits

Enabling this early exit strategy for LMs requires minimal modifications to the training and inference processes. During training, we encourage the model to produce meaningful representations in intermediate layers. Instead of predicting only using the top layer, our learning loss function is a weighted average over the predictions of all layers, assigning higher weight to top layers. Our experiments demonstrate that this significantly improves the intermediate layer predictions while preserving the full model’s performance. In one model variant, we also include a small early-exit classifier trained to classify if the local intermediate layer prediction is consistent with the top layer. We train this classifier in a second quick step where we freeze the rest of the model.

Once the model is trained, we need a method to allow early-exiting. First, we define a local confidence measure for capturing the model’s confidence in its intermediate prediction. We explore three confidence measures (described in the results section below): (1) softmax response, taking the maximum predicted probability out of the softmax distribution; (2) state propagation, the cosine distance between the current hidden representation and the one from the previous layer; and (3) early-exit classifier, the output of a classifier specifically trained for predicting local consistency. We find the softmax response to be statistically strong while being simple and fast to compute. The other two alternatives are lighter in floating point operations (FLOPS).

Another challenge is that the self-attention of each layer depends on hidden-states from previous words. If we exit early for some word predictions, these hidden-states might be missing. Instead, we attend back to the hidden state of the last computed layer.

Finally, we set up the local confidence threshold for exiting early. In the next section, we describe our controlled process for finding good threshold values. As a first step, we simplify this infinite search space by building on a useful observation: mistakes that are made at the beginning of the generation process are more detrimental since they can affect all of the following outputs. Therefore, we start with a higher (more conservative) threshold, and gradually reduce it with time. We use a negative exponent with user-defined temperature to control this decay rate. We find this allows better control over the performance-efficiency tradeoff (the obtained speedup per quality level).

Reliably Controlling the Quality of the Accelerated Model

Early exit decisions have to be local; they need to happen when predicting each word. In practice, however, the final output should be globally consistent or comparable to the original model. For example, if the original full model generated “the concert was wonderful and long”, one would accept CALM switching the order of the adjectives and outputting “the concert was long and wonderful”. However, at the local level, the word “wonderful” was replaced with “long”. Therefore, the two outputs are globally consistent, but include some local inconsistencies. We build on the Learn then Test (LTT) framework to connect local confidence-based decisions to globally consistent outputs.

In CALM, local per-timestep confidence thresholds for early exiting decisions are derived, via LTT calibration, from user-defined consistency constraints over the full output text. Red boxes indicate that CALM used most of the decoder’s layers for that specific prediction. Green boxes indicate that CALM saved time by using only a few Transformer layers. Full sentence shown in the last example of this post.

First, we define and formulate two types of consistency constraints from which to choose:

  1. Textual consistency: We bound the expected textual distance between the outputs of CALM and the outputs of the full model. This doesn’t require any labeled data.
  2. Risk consistency: We bound the expected increase in loss that we allow for CALM compared to the full model. This requires reference outputs against which to compare.

For each of these constraints, we can set the tolerance that we allow and calibrate the confidence threshold to allow early exits while reliably satisfying our defined constraint with an arbitrarily high probability.

CALM Saves Inference Time

We run experiments on three popular generation datasets: CNN/DM for summarization, WMT for machine translation, and SQuAD for question answering. We evaluate each of the three confidence measures (softmax response, state propagation and early-exit classifier) using an 8-layer encoder-decoder model. To evaluate global sequence-level performance, we use the standard Rouge-L, BLEU, and Token-F1 scores that measure distances against human-written references. We show that one can maintain full model performance while using only a third or half of the layers on average. CALM achieves this by dynamically distributing the compute effort across the prediction timesteps.

As an approximate upper bound, we also compute the predictions using a local oracle confidence measure, which enables exiting at the first layer that leads to the same prediction as the top one. On all three tasks, the oracle measure can preserve full model performance when using only 1.5 decoder layers on average. In contrast to CALM, a static baseline uses the same number of layers for all predictions, requiring 3 to 7 layers (depending on the dataset) to preserve its performance. This demonstrates why the dynamic allocation of compute effort is important. Only a small fraction of the predictions require most of the model’s complexity, while for others much less should suffice.

Performance per task against the average number of decoder layers used.

Finally, we also find that CALM enables practical speedups. When benchmarking on TPUs, we saved almost half of the compute time while maintaining the quality of the outputs.

Example of a generated news summary. The top cell presents the reference human-written summary. Below is the prediction of the full model (8 layers) followed by two different CALM output examples. The first CALM output is 2.9x faster and the second output is 3.6x faster than the full model, benchmarked on TPUs.

Conclusion

CALM allows faster text generation with LMs, without reducing the quality of the output text. This is achieved by dynamically modifying the amount of compute per generation timestep, allowing the model to exit the computational sequence early when confident enough.

As language models continue to grow in size, studying how to efficiently use them becomes crucial. CALM is orthogonal and can be combined with many efficiency related efforts, including model quantization, distillation, sparsity, effective partitioning, and distributed control flows.

Acknowledgements

It was an honor and privilege to work on this with Adam Fisch, Ionel Gog, Seungyeon Kim, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, and Donald Metzler. We also thank Anselm Levskaya, Hyung Won Chung, Tao Wang, Paul Barham, Michael Isard, Orhan Firat, Carlos Riquelme, Aditya Menon, Zhifeng Chen, Sanjiv Kumar, and Jeff Dean for helpful discussions and feedback. Finally, we thank Tom Small for preparing the animation in this blog post.

Categories
Offsites

Differential Privacy Accounting by Connecting the Dots

Differential privacy (DP) is an approach that enables data analytics and machine learning (ML) with a mathematical guarantee on the privacy of user data. DP quantifies the “privacy cost” of an algorithm, i.e., the level of guarantee that the algorithm’s output distribution for a given dataset will not change significantly if a single user’s data is added to or removed from it. The algorithm is characterized by two parameters, ε and δ, where smaller values of both indicate “more private”. There is a natural tension between the privacy budget (ε, δ) and the utility of the algorithm: a smaller privacy budget requires the output to be more “noisy”, often leading to less utility. Thus, a fundamental goal of DP is to attain as much utility as possible for a desired privacy budget.

A key property of DP that often plays a central role in understanding privacy costs is that of composition, which reflects the net privacy cost of a combination of DP algorithms, viewed together as a single algorithm. A notable example is the differentially-private stochastic gradient descent (DP-SGD) algorithm. This algorithm trains ML models over multiple iterations — each of which is differentially private — and therefore requires an application of the composition property of DP. A basic composition theorem in DP says that the privacy cost of a collection of algorithms is, at most, the sum of the privacy cost of each. However, in many cases, this can be a gross overestimate, and several improved composition theorems provide better estimates of the privacy cost of composition.

In 2019, we released an open-source library (on GitHub) to enable developers to use analytic techniques based on DP. Today, we announce the addition to this library of Connect-the-Dots, a new privacy accounting algorithm based on a novel approach for discretizing privacy loss distributions that is a useful tool for understanding the privacy cost of composition. This algorithm is based on the paper “Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions”, presented at PETS 2022. The main novelty of this accounting algorithm is that it uses an indirect approach to construct more accurate discretizations of privacy loss distributions. We find that Connect-the-Dots provides significant gains over other privacy accounting methods in literature in terms of accuracy and running time. This algorithm was also recently applied for the privacy accounting of DP-SGD in training Ads prediction models.

Differential Privacy and Privacy Loss Distributions

A randomized algorithm is said to satisfy DP guarantees if its output “does not depend significantly” on any one entry in its training dataset, quantified mathematically with parameters (ε, δ). For example, consider the motivating example of DP-SGD. When trained with (non-private) SGD, a neural network could, in principle, be encoding the entire training dataset within its weights, thereby allowing one to reconstruct some training examples from a trained model. On the other hand, when trained with DP-SGD, we have a formal guarantee that if one were able to reconstruct a training example with non-trivial probability then one would also be able to reconstruct the same example even if it was not included in the training dataset.

The hockey stick divergence, parameterized by ε, is a measure of distance between two probability distributions, as illustrated in the figure below. The privacy cost of most DP algorithms is dictated by the hockey stick divergence between two associated probability distributions P and Q. The algorithm satisfies DP with parameters (ε, δ), if the value of the hockey stick divergence for ε between P and Q is at most δ. The hockey stick divergence between (P, Q), denoted δP||Q(ε) is in turn completely characterized by it associated privacy loss distribution, denoted by PLDP||Q.

Illustration of hockey stick divergence δP||Q(ε) between distributions P and Q (left), which corresponds to the probability mass of P that is above eεQ, where eεQ is an eε scaling of the probability mass of Q (right).

The main advantage of dealing with PLDs is that compositions of algorithms correspond to the convolution of the corresponding PLDs. Exploiting this fact, prior work has designed efficient algorithms to compute the PLD corresponding to the composition of individual algorithms by simply performing convolution of the individual PLDs using the fast Fourier transform algorithm.

However, one challenge when dealing with many PLDs is that they often are continuous distributions, which make the convolution operations intractable in practice. Thus, researchers often apply various discretization approaches to approximate the PLDs using equally spaced points. For example, the basic version of the Privacy Buckets algorithm assigns the probability mass of the interval between two discretization points entirely to the higher end of the interval.

Illustration of discretization by rounding up probability masses. Here a continuous PLD (in blue) is discretized to a discrete PLD (in red), by rounding up the probability mass between consecutive points.

Connect-the-Dots : A New Algorithm

Our new Connect-the-Dots algorithm provides a better way to discretize PLDs towards the goal of estimating hockey stick divergences. This approach works indirectly by first discretizing the hockey stick divergence function and then mapping it back to a discrete PLD supported on equally spaced points.

Illustration of high-level steps in the Connect-the-Dots algorithm.

This approach relies on the notion of a “dominating PLD”, namely, PLDP’||Q’ dominates over PLDP||Q if the hockey stick divergence of the former is greater or equal to the hockey stick divergence of the latter for all values of ε. The key property of dominating PLDs is that they remain dominating after compositions. Thus for purposes of privacy accounting, it suffices to work with a dominating PLD, which gives us an upper bound on the exact privacy cost.

Our main insight behind the Connect-the-Dots algorithm is a characterization of discrete PLD, namely that a PLD is supported on a given finite set of ε values if and only if the corresponding hockey stick divergence as a function of eε is linear between consecutive eε values. This allows us to discretize the hockey stick divergence by simply connecting the dots to get a piecewise linear function that precisely equals the hockey stick divergence function at the given eε values. See a more detailed explanation of the algorithm.

Comparison of the discretizations of hockey stick divergence by Connect-the-Dots vs Privacy Buckets.

Experimental Evaluation

The DP-SGD algorithm involves a noise multiplier parameter, which controls the magnitude of noise added in each gradient step, and a sampling probability, which controls how many examples are included in each mini-batch. We compare Connect-the-Dots against the algorithms listed below on the task of privacy accounting DP-SGD with a noise multiplier = 0.5, sampling probability = 0.2 x 10-4 and δ = 10-8.

We plot the value of the ε computed by each of the algorithms against the number of composition steps, and additionally, we plot the running time of the implementations. As shown in the plots below, privacy accounting using Renyi DP provides a loose estimate of the privacy loss. However, when comparing the approaches using PLD, we find that in this example, the implementation of Connect-the-Dots achieves a tighter estimate of the privacy loss, with a running time that is 5x faster than the Microsoft PRV Accountant and >200x faster than the previous approach of Privacy Buckets in the Google-DP library.

Left: Upper bounds on the privacy parameter ε for varying number of steps of DP-SGD, as returned by different algorithms (for fixed δ = 10-8). Right: Running time of the different algorithms.

Conclusion & Future Directions

This work proposes Connect-the-Dots, a new algorithm for computing optimal privacy parameters for compositions of differentially private algorithms. When evaluated on the DP-SGD task, we find that this algorithm gives tighter estimates on the privacy loss with a significantly faster running time.

So far, the library only supports the pessimistic estimate version of Connect-the-Dots algorithm, which provides an upper bound on the privacy loss of DP-algorithms. However, the paper also introduces a variant of the algorithm that provides an “optimistic” estimate of the PLD, which can be used to derive lower bounds on the privacy cost of DP-algorithms (provided those admit a “worst case” PLD). Currently, the library does support optimistic estimates as given by the Privacy Buckets algorithm, and we hope to incorporate the Connect-the-Dots version as well.

Acknowledgements

This work was carried out in collaboration with Vadym Doroshenko, Badih Ghazi, Ravi Kumar. We thank Galen Andrew, Stan Bashtavenko, Steve Chien, Christoph Dibak, Miguel Guevara, Peter Kairouz, Sasha Kulankhina, Stefan Mellem, Jodi Spacek, Yurii Sushko and Andreas Terzis for their help.

Categories
Offsites

EHR-Safe: Generating High-Fidelity and Privacy-Preserving Synthetic Electronic Health Records

Analysis of Electronic Health Records (EHR) has a tremendous potential for enhancing patient care, quantitatively measuring performance of clinical practices, and facilitating clinical research. Statistical estimation and machine learning (ML) models trained on EHR data can be used to predict the probability of various diseases (such as diabetes), track patient wellness, and predict how patients respond to specific drugs. For such models, researchers and practitioners need access to EHR data. However, it can be challenging to leverage EHR data while ensuring data privacy and conforming to patient confidentiality regulations (such as HIPAA).

Conventional methods to anonymize data (e.g., de-identification) are often tedious and costly. Moreover, they can distort important features from the original dataset, decreasing the utility of the data significantly; they can also be susceptible to privacy attacks. Alternatively, an approach based on generating synthetic data can maintain both important dataset features and privacy.

To that end, we propose a novel generative modeling framework in “EHR-Safe: Generating High-Fidelity and Privacy-Preserving Synthetic Electronic Health Records“. With the innovative methodology in EHR-Safe, we show that synthetic data can satisfy two key properties: (i) high fidelity (i.e., they are useful for the task of interest, such as having similar downstream performance when a diagnostic model is trained on them), (ii) meet certain privacy measures (i.e., they do not reveal any real patient’s identity). Our state-of-the-art results stem from novel approaches for encoding/decoding features, normalizing complex distributions, conditioning adversarial training, and representing missing data.

Generating synthetic data from the original data with EHR-Safe.

Challenges of Generating Realistic Synthetic EHR Data

There are multiple fundamental challenges to generating synthetic EHR data. EHR data contain heterogeneous features with different characteristics and distributions. There can be numerical features (e.g., blood pressure) and categorical features with many or two categories (e.g., medical codes, mortality outcome). Some of these may be static (i.e., not varying during the modeling window), while others are time-varying, such as regular or sporadic lab measurements. Distributions might come from different families — categorical distributions can be highly non-uniform (e.g., for under-represented groups) and numerical distributions can be highly skewed (e.g., a small proportion of values being very large while the vast majority are small). Depending on a patient’s condition, the number of visits can also vary drastically — some patients visit a clinic only once whereas some visit hundreds of times, leading to a variance in sequence lengths that is typically much higher compared to other time-series data. There can be a high ratio of missing features across different patients and time steps, as not all lab measurements or other input data are collected.

Examples of real EHR data: temporal numerical features (upper) and temporal categorical features (lower).

EHR-Safe: Synthetic EHR Data Generation Framework

EHR-Safe consists of sequential encoder-decoder architecture and generative adversarial networks (GANs), depicted in the figure below. Because EHR data are heterogeneous (as described above), direct modeling of raw EHR data is challenging for GANs. To circumvent this, we propose utilizing a sequential encoder-decoder architecture, to learn the mapping from the raw EHR data to the latent representations, and vice versa.

Block diagram of EHR-Safe framework.

While learning the mapping, esoteric distributions of numerical and categorical features pose a great challenge. For example, some values or numerical ranges might dominate the distribution, but the capability of modeling rare cases is essential. The proposed feature mapping and stochastic normalization (transforming original feature distributions into uniform distributions without information loss) are key to handling such data by converting to distributions for which the training of encoder-decoder and GAN are more stable (details can be found in the paper). The mapped latent representations, generated by the encoder, are then used for GAN training. After training both the encoder-decoder framework and GANs, EHR-Safe can generate synthetic heterogeneous EHR data from any input, for which we feed randomly sampled vectors. Note that only the trained generator and decoders are used for generating synthetic data.

Datasets

We focus on two real-world EHR datasets to showcase the EHR-Safe framework, MIMIC-III and eICU. Both are inpatient datasets that consist of varying lengths of sequences and include multiple numerical and categorical features with missing components.

Fidelity Results

The fidelity metrics focus on the quality of synthetically generated data by measuring the realisticness of the synthetic data. Higher fidelity implies that it is more difficult to differentiate between synthetic and real data. We evaluate the fidelity of synthetic data in terms of multiple quantitative and qualitative analyses.

Visualization

Having similar coverage and avoiding under-representation of certain data regimes are both important for synthetic data generation. As the below t-SNE analyses show, the coverage of the synthetic data (blue) is very similar with the original data (red). With membership inference metrics (will be introduced in the privacy section), we also verify that EHR-Safe does not just memorize the original train data.

t-SNE analyses on temporal and static data on MIMIC-III (upper) and eICU (lower) datasets.

Statistical Similarity

We provide quantitative comparisons of statistical similarity between original and synthetic data for each feature. Most statistics are well-aligned between original and synthetic data — for example a measure of the KS statistics, i.e,. the maximum difference in the cumulative distribution function (CDF) between the original and the synthetic data, are mostly lower than 0.03. More detailed tables can be found in the paper. The figure below exemplifies the CDF graphs for original vs. synthetic data for three features — overall they seem very close in most cases.

CDF graphs of two features between original and synthetic EHR data. Left: Mean Airway Pressure. Right: Minute Volume Alarm.

Utility

Because one of the most important use cases of synthetic data is enabling ML innovations, we focus on the fidelity metric that measures the ability of models trained on synthetic data to make accurate predictions on real data. We compare such model performance to an equivalent model trained with real data. Similar model performance would indicate that the synthetic data captures the relevant informative content for the task. As one of the important potential use cases of EHR, we focus on the mortality prediction task. We consider four different predictive models: Gradient Boosting Tree Ensemble (GBDT), Random Forest (RF), Logistic Regression (LR), Gated Recurrent Units (GRU).

Mortality prediction performance with the model trained on real vs. synthetic data. Left: MIMIC-III. Right: eICU.

In the figure above we see that in most scenarios, training on synthetic vs. real data are highly similar in terms of Area Under Receiver Operating Characteristics Curve (AUC). On MIMIC-III, the best model (GBDT) on synthetic data is only 2.6% worse than the best model on real data; whereas on eICU, the best model (RF) on synthetic data is only 0.9% worse.

Privacy Results

We consider three different privacy attacks to quantify the robustness of the synthetic data with respect to privacy.

  • Membership inference attack: An adversary predicts whether a known subject was a present in the training data used for training the synthetic data model.
  • Re-identification attack: The adversary explores the probability of some features being re-identified using synthetic data and matching to the training data.
  • Attribute inference attack: The adversary predicts the value of sensitive features using synthetic data.
Privacy risk evaluation across three privacy metrics: membership-inference (top-left), re-identification (top-right), and attribute inference (bottom). The ideal value of privacy risk for membership inference is random guessing (0.5). For re-identification, the ideal case is to replace the synthetic data with disjoint holdout original data.

The figure above summarizes the results along with the ideal achievable value for each metric. We observe that the privacy metrics are very close to the ideal in all cases. The risk of understanding whether a sample of the original data is a member used for training the model is very close to random guessing; it also verifies that EHR-Safe does not just memorize the original train data. For the attribute inference attack, we focus on the prediction task of inferring specific attributes (e.g., gender, religion, and marital status) from other attributes. We compare prediction accuracy when training a classifier with real data against the same classifier trained with synthetic data. Because the EHR-Safe bars are all lower, the results demonstrate that access to synthetic data does not lead to higher prediction performance on specific features as compared to access to the original data.

Comparison to Alternative Methods

We compare EHR-Safe to alternatives (TimeGAN, RC-GAN, C-RNN-GAN) proposed for time-series synthetic data generation. As shown below, EHR-Safe significantly outperforms each.

Downstream task performance (AUC) in comparison to alternatives.

Conclusions

We propose a novel generative modeling framework, EHR-Safe, that can generate highly realistic synthetic EHR data that are robust to privacy attacks. EHR-Safe is based on generative adversarial networks applied to the encoded raw data. We introduce multiple innovations in the architecture and training mechanisms that are motivated by the key challenges of EHR data. These innovations are key to our results that show almost-identical properties with real data (when desired downstream capabilities are considered) with almost-ideal privacy preservation. An important future direction is generative modeling capability for multimodal data, including text and image, as modern EHR data might contain both.

Acknowledgements

We gratefully acknowledge the contributions of Michel Mizrahi, Nahid Farhady Ghalaty, Thomas Jarvinen, Ashwin S. Ravi, Peter Brune, Fanyu Kong, Dave Anderson, George Lee, Arie Meir, Farhana Bandukwala, Elli Kanal, and Tomas Pfister.

Categories
Misc

Toy Jensen Rings in Holidays With AI-Powered ‘Jingle Bells’

In a moment of pure serendipity, Lah Yileh Lee and Xinting Lee, a pair of talented singers who often stream their performances online, found themselves performing in a public square in Taipei when NVIDIA founder and CEO Jensen Huang happened upon them. Huang couldn’t resist joining in, cheering on their serenade as they recorded Lady Read article >

The post Toy Jensen Rings in Holidays With AI-Powered ‘Jingle Bells’ appeared first on NVIDIA Blog.

Categories
Misc

Top Food Stories From 2022: Meet 4 Startups Putting AI on the Plate

This holiday season, feast on the bounty of food-themed stories NVIDIA Blog readers gobbled up in 2022. Startups in the retail industry — and particularly in quick-service restaurants — are using NVIDIA AI and robotics technology to make it easier to order food in drive-thrus, find beverages on store shelves and have meals delivered. They’re Read article >

The post Top Food Stories From 2022: Meet 4 Startups Putting AI on the Plate appeared first on NVIDIA Blog.

Categories
Misc

Make Your Spirit Merry and Bright With Hit Games on GeForce NOW This Holiday Season

Gear up for some festive fun this GFN Thursday with some of the GeForce NOW community’s top picks of games to play during the holidays, as well as a new title joining the GeForce NOW library this week. And, following the recent update that enabled Ubisoft Connect account syncing with GeForce NOW, select Ubisoft+ Multi-Access Read article >

The post Make Your Spirit Merry and Bright With Hit Games on GeForce NOW This Holiday Season appeared first on NVIDIA Blog.

Categories
Misc

Explainer: What Is an AI Cockpit?

Intelligent interiors are transforming transportation.

Intelligent interiors are transforming transportation.

Categories
Misc

Top 5 Robots of 2022: Watch Them Change the World

Robots have rolled into action for sustainability in farms, lower energy in food delivery, efficiency in retail inventory, improved throughput in warehouses and just about everything in between — what’s not to love? In addition to reshaping industries and helping end users, robots play a vital role in the transition away from fossil fuels. The Read article >

The post Top 5 Robots of 2022: Watch Them Change the World appeared first on NVIDIA Blog.

Categories
Misc

Doing the Best They Can: EverestLabs Ensures Fewer Recyclables Go to Landfills

All of us recycle. Or, at least, all of us should. Now, AI is joining the effort. On the latest episode of the NVIDIA AI Podcast, host Noah Kravitz spoke with JD Ambadti, founder and CEO of EverestLabs, developer of RecycleOS, the first AI-enabled operating system for recycling. The company reports that an average of Read article >

The post Doing the Best They Can: EverestLabs Ensures Fewer Recyclables Go to Landfills appeared first on NVIDIA Blog.

Categories
Misc

Evaluating Hidden Costs When Building or Buying an Edge Management Platform

Edge computing and edge AI are powering the digital transformation of business processes. But, as a growing field, there are still many questions about what…

Edge computing and edge AI are powering the digital transformation of business processes. But, as a growing field, there are still many questions about what exactly needs to be in an edge management platform.

The benefits of edge computing include low latency for real-time responses, using local area networks for higher bandwidth, and storage at lower costs compared to cloud computing.

However, the distributed nature of edge nodes can make managing edge AI complex and challenging. It can be time-consuming and costly when gathering insights from separate locations, installing hardware, deploying software, and maintaining upgrades at individual nodes. 

Centralized management platforms are a critical component of a company’s edge AI solution. This enables organizations to deploy and manage industry applications at the edge, automate management tasks, allocate computing resources, update system software over the air, and monitor locations. 

However, the entire stack that makes up an edge AI management solution is complicated, making the question of whether to build or buy an edge management platform exceedingly difficult. 

In this post, I break down some of the most important factors to consider when evaluating an AI edge solution for your company.

Visual is a single user sitting at a computer with arrows pointing to different nodes showcasing how one user can manage the end-to-end lifecycle of their edge deployments
Figure 1. Managing edge AI deployments from a central plane

To get started, consider asking the following questions:

  • What is the problem you’re solving? Clarify the requirements needed for your platform and prioritize them. No solution will be perfect.
  • What is your budget? Financial resources will inform your approach. Evaluate the cost of using vendor software compared to bringing in resources to your existing team. Management and maintenance costs are also a factor.
  • What is your timeline? Are there competitive reasons for you to move quickly? Remember to factor in integration and customization.

Benefits of building or buying

Similar to building a home, when building an edge management platform you are part of the entire process and maintain control of the design. This can be extremely beneficial to an enterprise, especially in terms of customization, data control, and security.

However, buying a solution can be a benefit, especially when it comes to ensuring quality and support from a vendor. Faster time-to-market and lower long-term costs are also significant advantages to buying. In the following, I lay out the top points for either option.

Benefits of building an edge management solution

  • Customization
  • Data control
  • Security risk

Customization

Understanding business needs is paramount to having a proper edge management solution. In doing your due diligence, you may find specific use cases or edge devices that require lots of customization. In this case, you are better off building the platform yourself. 

Data control

Maintaining local storage and control of all critical data could be necessary depending on your business. It is important to ask how the third party will use your proprietary data. By building the platform, you ensure complete access and oversight to important data and business insights. If your data is a vital component of your competitive advantage, it becomes imperative to maintain this information internally.

Security risk

Enterprise-level software companies are the targets, and sometimes victims, of large-scale cyber attacks. These attacks compromise all users of their software, potentially leaking vitally important data or opening up pathways into your network. Building the entire platform in-house enables you to add security to places you deem the most important and limit exposure to any breach that a third party may have.

Benefits of buying an edge management solution

  • Ensured quality, expertise, and support
  • Faster time to market
  • Lower cost

Ensured quality, expertise, and support

Enterprise-edge AI management platforms are extremely complex with many layers. A solution provider is incentivized to ensure that the solution meets your needs. They have dedicated expert resources to build an optimal, enterprise-grade solution as well as provide enterprise support for all issues from low level to critical. This means that the platform not only resolves all your current needs but also solves future issues and has a dedicated resource to call upon when needed.

Faster time to market

Buying can help you deploy an edge computing solution faster. Enterprises across the world are working to find the best way to manage all their disparate edge nodes. It would be a competitive disadvantage to wait several months to build a quality solution.  

Being an early adopter of edge AI management software can also give you a competitive advantage. You’re able to realize insights from your data in nearly real time and deploy or update new AI applications faster.

Lower cost

Enterprise software often has usage-based pricing, which can lower long-term expenses. Providers are able to spread maintenance and support costs, which is something you are unable to do in-house. Purchasing enterprise-grade software is a capital expenditure as opposed to an operating expense. In the long run, it tends to be cost-effective to purchase. 

Risks of building or buying

There are also downsides to consider. There is some assumed risk with building your own solution. These risks—specifically around quality, opportunity cost, and support—can hinder development and slow down business growth.

But, nothing comes without risk, and buying a solution is no exception. These can be summarized into three main buckets: potential data leaks; a solution that doesn’t meet your needs; and trusting someone else to do the job. In the following section, I examine risks in detail.

Risks of building an edge management solution

  • Quality compromise
  • Technical debt
  • Opportunity cost

Quality compromise 

A proper and complete solution must deploy AI workloads at scale, have layered security, and orchestrate containers, among other things. There is a tremendous amount of detail required to have a complete edge management platform. While this may seem simple to create, the many layers of complex software below the user interface could require an outside expert to solve your problem.   

Technical debt

Another option is to extend your current solution to support edge computing and AI but that often brings more trouble than benefit. It could be costly, with additional licensing costs, and may not encompass all the benefits and features needed. A loop of continual repairs rather than rip and replace is not only costly but also time-consuming, leaving you with a platform that does not perform as needed. 

Opportunity cost

Even in cases that do not require bringing in outside developers, the existing team may be of better value in building unique and custom AI applications for use cases rather than the platform. A solution provider can also offer expertise in edge computing and management, saving you time bringing the solution to market while meeting your all requirements.

Risks of buying an edge management solution

  • Long-term support
  • Access to private data
  • Unmet requirements
  • Market changes

Long-term support

By building your own solution, you also take on the cost of maintenance and support. Those costs rise as more applications and users come onto the platform. This can strain your IT personnel and end-users, while also growing operating expenses and lowering your net income.

Access to private data

The solution provider becomes a responsible owner for several components of the edge compute stack and could have access to some edge data. If there is data vital to your company’s competitive advantage, this is a risk you must consider.

Unmet requirements 

The vendor’s solution may not meet the exact needs of your organization. You may have a niche or unique need that off-the-shelf products cannot solve. These could include specific connectivity, firewall, or provisioning issues limiting your ability to use a service provider.  

Market changes

Using a third party could leave you vulnerable to any changes that the third party makes on their own. They could decide to leave the market or may struggle with market shifts leaving you exposed and without a trusted partner.

Choosing the right edge management solution 

A lot goes into a quality edge AI management platform. While you still may be thinking through the best option, one approach to consider is a hybrid model; where you buy the primary solution but build out customizations for your organization’s needs. 

This is only possible if the provider’s solution has APIs for integration. Be sure to ask if integration into other management tools and the wider ecosystem is possible. Also, when performing due diligence ask about local app data storage on-premises to minimize any data concerns.

The most important thing is to understand the capabilities of both the vendor and your own organization. Work closely with the vendor, ask for demos, ask questions about the flexibility of the pricing structure, and ensure it is a collaborative effort between all parties that are involved.

NVIDIA works with many customers who have chosen to build their own edge solutions and also offers the edge management platform NVIDIA Fleet Command. Fleet Command is a cloud service that enables the management of distributed edge computing environments at scale. 

Get started on your edge management journey with a free trial of Fleet Command on NVIDIA LaunchPad