Meta’s AI supercomputer — the largest NVIDIA DGX A100 customer system to date — will deliver 5 exaflops of AI performance.
How to make custom models for tensorflow.js?
i seem to be unable to find anything about making a custom tf model for js.
so i though i’d ask here. how would you go about doing it?
whats the best way to create and train custom models, can you do it using python and use those models in js?
and if you can, then how would you do that?
also as a side note, are there any articles or video series that takes you trough this step-by-step?
thanks for all the help!
submitted by /u/CCmamo
[visit reddit] [comments]
Nsight Compute kernel profiler now includes Range Replay, Memory Analysis, and Guided Analysis enhancements.
NVIDIA Nsight Compute is an interactive kernel profiler for CUDA applications. It provides detailed performance metrics and API debugging through a user interface and a command-line tool. Nsight Compute 2022.1 brings updates to improve data collection modes enabling new use cases and options for performance profiling.
What’s New
Range Replay
This release of Nsight Compute extends the existing replay modes with the highly requested feature of Range Replay. Range Replay captures and replays complete ranges of CUDA API calls and kernel launches within the profiled application. Metrics are associated with the entire range as opposed to individual kernels.This allows the tool to execute kernels without serialization and support profiling kernels that need to be run concurrently for correctness or performance reasons. A range consists of a start and an end marker; and includes all CUDA API calls and kernels launched between these markers from any CPU thread.
Range markers can be defined using either:
- Profiler Start/Stop API
- NVTX Ranges
For complete details, see the “Replay” section in Nsight Compute’s Kernel Profiling Guide.

Memory Analysis
When profiling on A100, a new L2 Cache Eviction Policies table in the Memory Analysis section helps you understand the number of accesses and achieved hit rates by the various cache eviction policies. In the same section, the L2 Cache table now has a new ECC row to show traffic created from enabling hardware Error Correction Code on the GPU.

Guided Analysis
Nsight Compute now makes it easier to select initial analysis targets in multiresult collection by dynamically selecting between the Summary and Details pages when opening a report. Rules were extended to detect non-fused floating-point instructions as an optimization opportunity. Last, but not least, when the Uncoalesced Memory Access rules are triggered, they show a table of the five most valuable instances, making it easier to inspect and resolve them on the Source page.

Additional improvements
Further improvements include an Occupancy Calculator auto-update. There is also a new ‘Thread Instructions Executed’ metric and register name tooltips for the Register Dependency columns in the Source page, as well as NVLink updates.
At GTC in November of 2021, we released insightful assets showcasing Nsight tools capabilities:
- Understanding CUDA Application Behavior, Performance, and Optimization Just Got Easier with the Latest Developer Tools [A31048]
- Optimizing CUDA Machine Learning Codes with Nsight Profiling Tools [DLIT1605]
- Guided Analysis with Nsight Compute Demo
Resources
With NVIDIA Jetson embedded platforms, teams at the DARPA SubT Challenge detected objects with both high accuracy and high throughput.
Performing real-time inference with high accuracy is a challenging task, especially in a poor-visibility environment. With NVIDIA Jetson embedded platforms, teams at the recently concluded Defense Advanced Research Projects Agency (DARPA) Subterranean (SubT) Challenge were able to detect objects of interest with both high accuracy and high throughput. In this post, we will cover the results, systems, and challenges faced by teams in the final leg of the systems competition.
The SubT Challenge is an international robotics competition organized and coordinated by DARPA. The competition encourages researchers to develop new approaches for robots to map, navigate, and search environments that pose various challenges such as poor visibility, presence of hazards, unknown maps, or poor communication infrastructure.
The challenge consists of three preliminary circuit events: Tunnel Circuit, Urban Circuit, and Cave Circuit (canceled due to the COVID-19 pandemic), as well as a final integrated challenge course. Each circuit and the final event are held in different environments with various types of terrain. According to the event organizers, the competition was held over 3 years in different phases with the final event held in September of 2021 in Louisville, KY.
Competitors in the SubT Challenge leveraged NVIDIA technology for both their hardware and software needs. Teams used desktop/server GPUs to train models that were deployed on robots using NVIDIA Jetson embedded platform for real-time detection of artifacts and objects of interest–the main criteria used to determine the winning team. Five out of seven competitors also used the Jetson platform to perform real-time object detection.
The SubT Challenge
The SubT Challenge is inspired by real-world scenarios faced by first responders during search and rescue operations or disaster response.
The state-of-the-art methods developed through this competition will help reduce the risk of casualties of search and rescue personnel and first responders while they explore the unknown underground environments. Additionally, the autonomous robots will assist personnel in exploring the environment to find survivors, objects of interest, and access locations that are otherwise risky for humans.

Technical challenges
The competition incorporates various technical challenges such as dealing with unknown, unstructured, and uneven terrain that some robots might not be able to maneuver easily.
These environments typically would not have any infrastructure for communication with the central command. From a perception perspective, these environments will have poor visibility where the robots must find artifacts and objects of interest.
The competing teams were tasked with addressing these challenges by developing novel sensor fusion methods as well as developing new or modifying existing robotic platforms with different capabilities to locate and detect objects of interest.
Team CERBERUS
Team CERBERUS (CollaborativE walking and flying RoBots for autonomous ExploRation in Underground Settings) is a joint consortium between several universities and industrial organizations worldwide.

The team participated in the competition with four quadruped robots called ANYmal, five primarily in-house-built drones with variable size and payload capacity, and a rover robot in the form of Super Mega Bot. In the competition finals, the team ended up using four ANYmal robots and the Super Mega Bot for exploration and artifact detection.
Each ANYmal robot was equipped with two CPU-based computers and an NVIDIA Jetson AGX Xavier. The rover robot was equipped with an NVIDIA GTX 1070 GPU.
The CERBERUS team used a modified version of the You Only Look Once (YOLO) model for object detection. The model was trained on 40,000 labeled images using two NVIDIA RTX 3090 GPUs.
The trained model was further optimized using TensorRT before being deployed on Jetson for real-time inference. The Jetson AGX Xavier was able to perform inference at a collective rate of 20 Hz. In the competition finals, the CERBERUS team was the first to detect 23 of the 40 artifacts located in the environment, clinching the number one spot.
The CERBERUS team also used GPUs for the elevation mapping of the terrain and training the locomotion policy controller of the ANYmal quadruple robot. The elevation mapping was done in real-time using Jetson AGX Xavier. The ANYmal robot’s locomotion policy training for the rough terrain was done offline using desktop GPUs.
Team Co-STAR
Led by researchers at NASA’s Jet Propulsion Laboratory (JPL) in Southern California along with other universities and industrial collaborators, team Collaborative SubTerranean Autonomous Robots (Co-STAR) was the winner of the 2020 competition focused on exploring complex underground urban environments.

They also successfully participated in the 2021 competition in mixed artificial and natural environments, placing fifth. The Co-STAR team entered the competition with four Spots, four Husky robots, and two drones.
Following an unexpected hardware issue in the final run, the team ended up using one Spot and three Husky robots. Each robot was equipped with a CPU-based computer along with one NVIDIA Jetson AGX Xavier.
For object detection, team Co-STAR used RGB and thermal images. They used the medium variant of the YOLO v5 model to process high-resolution images for real-time inference. The team trained two different models to perform inference on captured RGB and thermal images.
The image-based model was trained using approximately 54,000 labeled frames whereas the thermal image model was trained using about 2,400 labeled images. For training the model on their customized dataset, team Co-STAR used a pretrained YOLO v5 model on the COCO dataset and performed transfer learning using the NVIDIA Transfer Learning Toolkit (known as TAO Toolkit).
The models were trained using two on-premise NVIDIA A100 GPUs and an AWS instance that consisted of eight V100 GPUs. Before deploying the models on Jetson AGX Xavier, the team pruned the models using TensorRT.
Using this setup, team Co-STAR was able to perform inference at 28 Hz with RGB images received from five RealSense cameras and images received from one thermal camera. In the final run, the robots were able to detect all 13 artifacts present in the designated areas. The exploration time was limited due to the delayed deployment caused by unexpected hardware issues at the deployment site.
Equipped with the NVIDIA Jetson platform and NVIDIA GPU hardware, teams competing in the DARPA SubT event were able to effectively train models for real-time inference, addressing the challenge posed by underground environments with accurate object detection.
Deep machine learning (ML) systems have achieved considerable success in medical image analysis in recent years. One major contributing factor is access to abundant labeled datasets, which are used to train highly effective supervised deep learning models. However, in the real-world, these models may encounter samples exhibiting rare conditions that are individually too infrequent for per-condition classification. Nevertheless, such conditions can be collectively common because they follow a long-tail distribution and when taken together can represent a significant portion of cases — e.g., in a recent deep learning dermatological study, hundreds of rare conditions composed around 20% of cases encountered by the model at test time.
To prevent models from generating erroneous outputs on rare samples at test time, there remains a considerable need for deep learning systems with the ability to recognize when a sample is not a condition it can identify. Detecting previously unseen conditions can be thought of as an out-of-distribution (OOD) detection task. By successfully identifying OOD samples, preventive measures can be taken, like abstaining from prediction or deferring to a human expert.
Traditional computer vision OOD detection benchmarks work to detect dataset distribution shifts. For example, a model may be trained on CIFAR images but be presented with street view house numbers (SVHN) as OOD samples, two datasets with very different semantic meanings. Other benchmarks seek to detect slight differences in semantic information, e.g., between images of a truck and a pickup truck, or two different skin conditions. The semantic distribution shifts in such near-OOD detection problems are more subtle in comparison to dataset distribution shifts, and thus, are harder to detect.
In “Does Your Dermatology Classifier Know What it Doesn’t Know? Detecting the Long-Tail of Unseen Conditions”, published in Medical Image Analysis, we tackle this near-OOD detection task in the application of dermatology image classification. We propose a novel hierarchical outlier detection (HOD) loss, which leverages existing fine-grained labels of rare conditions from the long tail and modifies the loss function to group unseen conditions and improve identification of these near OOD categories. Coupled with various representation learning methods and the diverse ensemble strategy, this approach enables us to achieve better performance for detecting OOD inputs.
The Near-OOD Dermatology Dataset
We curated a near-OOD dermatology dataset that includes 26 inlier conditions, each of which are represented by at least 100 samples, and 199 rare conditions considered to be outliers. Outlier conditions can have as low as one sample per condition. The separation criteria between inlier and outlier conditions can be specified by the user. Here the cutoff sample size between inlier and outlier was 100, consistent with our previous study. The outliers are further split into training, validation, and test sets that are intentionally mutually exclusive to mimic real-world scenarios, where rare conditions shown during test time may have not been seen in training.
Train set | Validation set | Test set | ||||
Inlier | Outlier | Inlier | Outlier | Inlier | Outlier | |
Number of classes | 26 | 68 | 26 | 66 | 26 | 65 |
Number of samples | 8854 | 1111 | 1251 | 1082 | 1192 | 937 |
Inlier and outlier conditions in our benchmark dataset and detailed dataset split statistics. The outliers are further split into mutually exclusive train, validation, and test sets. |
Hierarchical Outlier Detection Loss
We propose to use “known outlier” samples during training that are leveraged to aid detection of “unknown outlier” samples during test time. Our novel hierarchical outlier detection (HOD) loss performs a fine-grained classification of individual classes for all inlier or outlier classes and, in parallel, a coarse-grained binary classification of inliers vs. outliers in a hierarchical setup (see the figure below). Our experiments confirmed that HOD is more effective than performing a coarse-grained classification followed by a fine-grained classification, as this could result in a bottleneck that impacted the performance of the fine-grained classifier.
We use the sum of the predictive probabilities of the outlier classes as the OOD score. As a primary OOD detection metric we use the area under receiver operating characteristics (AUROC) curve, which ranges between 0 and 1 and gives us a measure of separability between inliers and outliers. A perfect OOD detector, which separates all inliers from outliers, is assigned an AUROC score of 1. A popular baseline method, called reject bucket, separates each inlier individually from the outliers, which are grouped into a dedicated single abstention class. In addition to a fine-grained classification for each individual inlier and outlier classes, the HOD loss–based approach separates the inliers collectively from the outliers with a coarse-grained prediction loss, resulting in better generalization. While similar, we demonstrate that our HOD loss–based approach outperforms other baseline methods that leverage outlier data during training, achieving an AUROC score of 79.4% on the benchmark, a significant improvement over that of reject bucket, which achieves 75.6%.
Our model architecture and the HOD loss. The encoder (green) represents the wide ResNet 101×3 model pre-trained with different representation learning models (ImageNet, BiT, SimCLR, and MICLe; see below). The output of the encoder is sent to the HOD loss where fine-grained and coarse-grained predictions for inliers (blue) and outliers (orange) are obtained. The coarse predictions are obtained by summing over the fine-grained probabilities as indicated in the figure. The OOD score is defined as the sum of the probabilities of outlier classes. |
Representation Learning and the Diverse Ensemble Strategy
We also investigate how different types of representation learning help in OOD detection in conjunction with HOD by pretraining on ImageNet, BiT-L, SimCLR and MICLe models. We observe that including HOD loss improves OOD performance compared to the reject bucket baseline method for all four representation learning methods.
Representation Learning Methods |
OOD detection metric (AUROC %) | |
With reject bucket | With HOD loss | |
ImageNet | 74.7% | 77% |
BiT-L | 75.6% | 79.4% |
SimCLR | 75.2% | 77.2% |
MICLe | 76.7% | 78.8% |
OOD detection performance for different representation learning models with reject bucket and with HOD loss. |
Another orthogonal approach for improving OOD detection performance and accuracy is deep ensemble, which aggregates outputs from multiple independently trained models to provide a final prediction. We build upon deep ensemble, but instead of using a fixed architecture with a fixed pre-training, we combine different representation learning architectures (ImageNet, BiT-L, SimCLR and MICLe) and introduce objective loss functions (HOD and reject bucket). We call this a diverse ensemble strategy, which we demonstrate outperforms the deep ensemble for OOD performance and inlier accuracy.
Downstream Clinical Trust Analysis
While we mainly focus on improving the performance for OOD detection, the ultimate goal for our dermatology model is to have high accuracy in predicting inlier and outlier conditions. We go beyond traditional performance metrics and introduce a “penalty” matrix that jointly evaluates inlier and outlier predictions for model trust analysis to approximate downstream impact. For a fixed confidence threshold, we count the following types of mistakes: (i) incorrect inlier predictions (i.e., mistaking inlier condition A as inlier condition B); (ii) incorrect abstention of inliers (i.e., abstaining from making a prediction for an inlier); and (iii) incorrect prediction for outliers as one of the inlier classes.
To account for the asymmetrical consequences of the different types of mistakes, penalties can be 0, 0.5, or 1. Both incorrect inlier and outlier-as-inlier predictions can potentially erode user trust in the model and were penalized with a score of 1. Incorrect abstention of an inlier as an outlier was penalized with a score of 0.5, indicating that potential model users should seek additional guidance given the model-expressed uncertainty or abstention. For correct decisions no cost is incurred, indicated by a score of 0.
Action of the Model | ||||
Prediction as Inlier | Abstain | |||
Inlier | 0 (Correct)
1 (Incorrect, mistakes |
0.5 (Incorrect, abstains inliers) |
||
Outlier | 1 (Incorrect, mistakes that may erode trust) |
0 (Correct) |
The penalty matrix is designed to capture the potential impact of different types of model errors. |
Because real-world scenarios are more complex and contain a variety of unknown variables, the numbers used here represent simplifications to enable qualitative approximations for the downstream impact on user trust of outlier detection models, which we refer to as “cost”. We use the penalty matrix to estimate a downstream cost on the test set and compare our method against the baseline, thereby making a stronger case for its effectiveness in real-world scenarios. As shown in the plot below, our proposed solution incurs a much lower estimated cost in comparison to baseline over all possible operating points.
Trust analysis comparing our proposed method to the baseline (reject bucket) for a range of outlier recall rates, indicated by 𝛕. We show that our method reduces downstream estimated cost, potentially reflecting improved downstream impact. |
Conclusion
In real-world deployment, medical ML models may encounter conditions that were not seen in training, and it’s important that they accurately identify when they do not know a specific condition. Detecting those OOD inputs is an important step to improving safety. We develop an HOD loss that leverages outlier data during training, and combine it with pre-trained representation learning models and a diverse ensemble to further boost performance, significantly outperforming the baseline approach on our new dermatology benchmark dataset. We believe that our approach, aligned with our AI Principles, can aid successful translation of ML algorithms into real-world scenarios. Although we have primarily focused on OOD detection for dermatology, most of our contributions are fairly generic and can be easily incorporated into OOD detection for other applications.
Acknowledgements
We would like to thank Shekoofeh Azizi, Aaron Loh, Vivek Natarajan, Basil Mustafa, Nick Pawlowski, Jan Freyberg, Yuan Liu, Zach Beaver, Nam Vo, Peggy Bui, Samantha Winter, Patricia MacWilliams, Greg S. Corrado, Umesh Telang, Yun Liu, Taylan Cemgil, Alan Karthikesalingam, Balaji Lakshminarayanan, and Jim Winkens for their contributions. We would also like to thank Tom Small for creating the post animation.
Learn how edge computing is powering efficient energy operations, protecting worker health and safety, and improving power grid resiliency.
Each day, energy flows throughout our lives – from the fuel that powers cars and planes, to the gas used for stove top cooking, to the electricity that keeps the lights on in homes and businesses. Oil, gas, and electricity are mature commodity markets, but AI is transforming the processes used to produce, transport, and deliver these resources.
Enter AI deployed at the edge: on oil rigs, within power plants, riding along utility trucks, even embedded in smart buildings. Oil and gas enterprises and utilities are using AI and edge computing to improve operational efficiency, protect worker health and safety, integrate renewable energy, increase grid resiliency, and provide more reliable and affordable sources of energy to consumers.

As companies and countries race to decarbonize and meet net-zero emissions goals, edge AI will play a key role managing distributed energy resources such as electric vehicles, home batteries, solar panels, and wind farms to enhance power grid resiliency and accelerate the energy transition. The following examples highlight the top AI use cases across the energy industry, including:
- Software-defined smart grids: Future smart meters will use edge computing to optimize power flow, detect grid anomalies, deliver more reliable energy at a lower cost, and unlock opportunities for new energy applications. Utilidata, a leading grid-edge software company, is developing a software-defined smart grid chip with NVIDIA that will power next-generation smart meters to increase grid resiliency, decarbonization, and consumer value.
- Autonomous operations: Industrial sites, such as oil rigs and power plants, require extensive monitoring for efficiency and safety because liquid, steam, or oil leakages can be catastrophic, costly, and wasteful. Global energy leaders, such as Siemens Energy, are using AI and machine learning to deliver a path to autonomous power plants. The company trains AI models using thousands of images and video streams from millions of onsite cameras and sensors to detect process anomalies. These models are deployed at the edge in power plants and use real-time inferencing to identify leaks. Rig operators are using computer vision, deep learning, and intelligent video analytics (IVA) to monitor heavy machinery, detect potential hazards, and alert workers in real-time to protect their health and safety, prevent accidents, and assign repair technicians for maintenance.
- Pipeline optimization: Oil and gas enterprises rely on finding the best-fit routes to transfer oil to refineries and eventually fuel stations. Edge AI can calculate the optimal flow of oil to ensure reliability of production and protect long-term pipeline health. Using IVA, these companies can inspect pipelines for defects that could lead to dangerous failures and automatically alert pipeline operators. Further downstream, NVIDIA ReOpt uses GPU-accelerated solvers for logistics and route optimization, which can efficiently route fuel to fueling stations.
- Power grid maintenance: With proactive maintenance, utilities can accurately detect defects and reduce unplanned outages to better serve customers. FirstEnergy worked with Noteworthy AI, a NVIDIA Inception member, on a pilot project to automate utility pole inspections. Fixed camera systems powered by NVIDIA Jetson were secured to the roof of service trucks and collected standardized, high-resolution images of their utility poles, power lines, and pole-mounted assets. The images were analyzed at the edge to determine if repairs or vegetation management was needed. Edge computing can help monitor the estimated 185 million utility poles in the United States, and reduce the tens of millions of dollars spent each year by utilities to manually track and maintain poles.
- Power grid simulation: Intelligent forecasting using GPU-accelerated grid simulations combined with historical data on energy usage and weather can inform more efficient generation, distribution, and management of energy resources to consumers. AI helps manage the bidirectional flow of power in a grid, delivering reliable energy to residents and enterprises while automating the process for consumers to sell their additional energy back to the grid.
Thanks to edge AI, the future of energy is more sustainable than ever. Explore how NVIDIA is building an ecosystem to accelerate the energy transition.
GeForce NOW is taking cloud gaming to new heights. This GFN Thursday delivers an upgraded streaming experience as part of an update that is now available to all members. It includes new resolution upscaling options to make members’ gaming experiences sharper, plus the ability to customize streaming settings in session. The GeForce NOW app is Read article >
The post Let Me Upgrade You: GeForce NOW Adds Resolution Upscaling and More This GFN Thursday appeared first on The Official NVIDIA Blog.
From the largest firms trading on Wall Street to banks providing customers with fraud protection to fintechs recommending best-fit products to consumers, AI is driving innovation across the financial services industry. New research from NVIDIA found that 78 percent of financial services professionals state that their company uses accelerated computing to deliver AI-enabled applications through Read article >
The post Nearly 80 Percent of Financial Firms Use AI to Improve Services, Reduce Fraud appeared first on The Official NVIDIA Blog.
I have replicated an architecture from one of the research papers and running it on GPU gives Out Of Memory even on colab (The model is quite deep and huge). So naturally, I want to train it using a TPU.
The same code using GPU doesn’t cause any issue. However it throws this error if I train on TPU:
InvalidArgumentError: 9 root error(s) found. (0) INVALID_ARGUMENT: {{function_node __inference_train_function_56959}} Reshape’s input dynamic dimension is decomposed into multiple output dynamic dimensions, but the constraint is ambiguous and XLA can’t infer the output dimension %reshape.8395 = f32[3,3,2,86,86,32]{5,4,3,2,1,0} reshape(f32[<=18,86,86,32]{3,2,1,0} %convolution.8393), metadata={op_type=”BatchToSpaceND” op_name=”model/conv2d_3/Conv2D/BatchToSpaceND”}.
[[{{node TPUReplicate/_compile/_11021259981217135469/_4}}]]
The colab notebook can be found here:
https://colab.research.google.com/drive/1D-laydrWwLnqAehVREhSkTPoqNXyRGS8?usp=sharing
submitted by /u/SuccMyStrangerThings
[visit reddit] [comments]
Hey all!
I have trained a TensorFlow modal with faces of people I want to detect.
While it detects the people and gives the correct label to the faces, I trained the model to detect. If I point the webcam at a face that I did not train the model with, it gives a label of one of the people I trained the model with.
I’ve tried many things to stop this, but nothing has worked.
I can share all the code and faces I am trying to detect if needed, but is there any way to stop this?
Any advice is greatly appreciated! I’m still learning TensorFlow and while I’m a little better than my pervious posts I’m still learning!
Thanks!
submitted by /u/Adhesive_Hooks
[visit reddit] [comments]