NVIDIA Announces Nsight Graphics 2021.1

Nsight Graphics 2021.1 is available to download – check out this article to see what’s new.

Nsight Graphics 2021.1 is available to download.

We now provide you with the ability to set any key to be the capture shortcut. This new keybinding is supported for all activities, including GPU Trace. F11 is the default binding for both capture and trace, but if you prefer the old behavior, the original capture keybinding is still supported (when the ‘Frame Capture (Target) > Legacy Capture Chord’ setting is set to Yes).

You can now profile applications which use D3D12 or Vulkan strictly for compute tasks using the new ‘One-shot’ option in GPU Trace. Tools that generate normal maps or use DirectML for image upscaling can now be properly profiled and optimized.  To enable this, set the ‘Capture Type’ to ‘One-shot [Beta]’

While TraceRays/DispatchRays has been the common way to initiate ray generation, it’s now possible to ray trace directly from your compute shaders using DXR1.1 and the new Khronos Vulkan Ray Tracing extension. In order to support this new approach, we’ve added links to the acceleration structure data for applications that use RayQuery calls in compute shaders.  

It’s important to know how much GPU Memory you’re using and to keep this as low as possible in Ray Tracing applications. We’re now making this even easier for you by adding size information to the Acceleration Structure Viewer.

Finally, we’ve added the Nsight HUD to Windows Vulkan applications in all frame debugging capture states. Previously the HUD was only activated once an application was captured.

We’re always looking to improve our HUD so please make sure to give us any feedback you might have.

For more details on Nsight Graphics 2021.1, check out the release notes (link).

We want to hear from you! Please continue to use the integrated feedback button that lets you send comments, feature requests, and bugs directly to us with the click of a button. You can send feedback anonymously or provide an email so we can follow up with you about your feedback. Just click on the little speech bubble at the top right of the window.

Try out the latest version of Nsight Graphics today!

Khronos released the final Vulkan Ray Tracing extensions today. NVIDIA Vulkan beta drivers available for download. Welcome to the era of portable, cross-vendor, cross-platform ray tracing acceleration! 

And be sure to check out the final Vulkan Ray Tracing extensions from the Khronos Group as well!  


Certifiably Fast: Top OEMs Debut World’s First NVIDIA-Certified Systems Built to Crush AI Workloads

AI, the most powerful technology of our time, demands a new generation of computers tuned and tested to drive it forward. Starting today, data centers can get boot up a new class of accelerated servers from our partners to power their journey into AI and data analytics. Top system makers are delivering the first wave Read article >

The post Certifiably Fast: Top OEMs Debut World’s First NVIDIA-Certified Systems Built to Crush AI Workloads appeared first on The Official NVIDIA Blog.


Stabilizing Live Speech Translation in Google Translate

The transcription feature in the Google Translate app may be used to create a live, translated transcription for events like meetings and speeches, or simply for a story at the dinner table in a language you don’t understand. In such settings, it is useful for the translated text to be displayed promptly to help keep the reader engaged and in the moment.

However, with early versions of this feature the translated text suffered from multiple real-time revisions, which can be distracting. This was because of the non-monotonic relationship between the source and the translated text, in which words at the end of the source sentence can influence words at the beginning of the translation.

Transcribe (old) — Left: Source transcript as it arrives from speech recognition. Right: Translation that is displayed to the user. The frequent corrections made to the translation interfere with the reading experience.

Today, we are excited to describe some of the technology behind a recently released update to the transcribe feature in the Google Translate app that significantly reduces translation revisions and improves the user experience. The research enabling this is presented in two papers. The first formulates an evaluation framework tailored to live translation and develops methods to reduce instability. The second demonstrates that these methods do very well compared to alternatives, while still retaining the simplicity of the original approach. The resulting model is much more stable and provides a noticeably improved reading experience within Google Translate.

Transcribe (new) — Left: Source transcript as it arrives from speech recognition. Right: Translation that is displayed to the user. At the cost of a small delay, the translation now rarely needs to be corrected.

Evaluating Live Translation
Before attempting to make any improvements, it was important to first understand and quantifiably measure the different aspects of the user experience, with the goal of maximizing quality while minimizing latency and instability. In “Re-translation Strategies For Long Form, Simultaneous, Spoken Language Translation”, we developed an evaluation framework for live-translation that has since guided our research and engineering efforts. This work presents a performance measure using the following metrics:

  • Erasure: Measures the additional reading burden on the user due to instability. It is the number of words that are erased and replaced for every word in the final translation.
  • Lag: Measures the average time that has passed between when a user utters a word and when the word’s translation displayed on the screen becomes stable. Requiring stability avoids rewarding systems that can only manage to be fast due to frequent corrections.
  • BLEU score: Measures the quality of the final translation. Quality differences in intermediate translations are captured by a combination of all metrics.

It is important to recognize the inherent trade-offs between these different aspects of quality. Transcribe enables live-translation by stacking machine translation on top of real-time automatic speech recognition. For each update to the recognized transcript, a fresh translation is generated in real time; several updates can occur each second. This approach placed Transcribe at one extreme of the 3 dimensional quality framework: it exhibited minimal lag and the best quality, but also had high erasure. Understanding this allowed us to work towards finding a better balance.

Stabilizing Re-translation
One straightforward solution to reduce erasure is to decrease the frequency with which translations are updated. Along this line, “streaming translation” models (for example, STACL and MILk) intelligently learn to recognize when sufficient source information has been received to extend the translation safely, so the translation never needs to be changed. In doing so, streaming translation models are able to achieve zero erasure.

The downside with such streaming translation models is that they once again take an extreme position: zero erasure necessitates sacrificing BLEU and lag. Rather than eliminating erasure altogether, a small budget for occasional instability may allow better BLEU and lag. More importantly, streaming translation would require retraining and maintenance of specialized models specifically for live-translation. This precludes the use of streaming translation in some cases, because keeping a lean pipeline is an important consideration for a product like Google Translate that supports 100+ languages.

In our second paper, “Re-translation versus Streaming for Simultaneous Translation”, we show that our original “re-translation” approach to live-translation can be fine-tuned to reduce erasure and achieve a more favorable erasure/lag/BLEU trade-off. Without training any specialized models, we applied a pair of inference-time heuristics to the original machine translation models — masking and biasing.

The end of an on-going translation tends to flicker because it is more likely to have dependencies on source words that have yet to arrive. We reduce this by truncating some number of words from the translation until the end of the source sentence has been observed. This masking process thus trades latency for stability, without affecting quality. This is very similar to delay-based strategies used in streaming methods such as Wait-k, but applied only during inference and not during training.

Neural machine translation often “see-saws” between equally good translations, causing unnecessary erasure. We improve stability by biasing the output towards what we have already shown the user. On top of reducing erasure, biasing also tends to reduce lag by stabilizing translations earlier. Biasing interacts nicely with masking, as masking words that are likely to be unstable also prevents the model from biasing toward them. However, this process does need to be tuned carefully, as a high bias, along with insufficient masking, may have a negative impact on quality.

The combination of masking and biasing, produces a re-translation system with high quality and low latency, while virtually eliminating erasure. The table below shows how the metrics react to the heuristics we introduced and how they compare to the other systems discussed above. The graph demonstrates that even with a very small erasure budget, re-translation surpasses zero-flicker streaming translation systems (MILk and Wait-k) trained specifically for live-translation.

System     BLEU     Lag
(Transcribe old)
    20.4     4.1     2.1
+ Stabilization
(Transcribe new)
    20.2     4.1     0.1
Evaluation of re-translation on IWSLT test 2018 Engish-German (TED talks) with and without the inference-time stabilization heuristics of masking and biasing. Stabilization drastically reduces erasure. Translation quality, measured in BLEU, is very slightly impacted due to biasing. Despite masking, the effective lag remains the same because the translation stabilizes sooner.
Comparison of re-translation with stabilization and specialized streaming models (Wait-k and MILk) on WMT 14 English-German. The BLEU-lag trade-off curve for re-translation is obtained via different combinations of bias and masking while maintaining an erasure budget of less than 2 words erased for every 10 generated. Re-translation offers better BLEU / lag trade-offs than streaming models which cannot make corrections and require specialized training for each trade-off point.

The solution outlined above returns a decent translation very quickly, while allowing it to be revised as more of the source sentence is spoken. The simple structure of re-translation enables the application of our best speech and translation models with minimal effort. However, reducing erasure is just one part of the story — we are also looking forward to improving the overall speech translation experience through new technology that can reduce lag when the translation is spoken, or that can enable better transcriptions when multiple people are speaking.


TensorFlow Lite Support

TensorFlow Lite Support
submitted by /u/nbortolotti

[visit reddit]


[Help Please] Applying an already trained model to an image


I am new to tensorflow and am trying to figure out what I think
should be a rather simple task. I have a model (.pb file) given to
me and I need to use it to markup an image.

I have two classes that the model was trained on: background and

From this point on, I have literally no idea what I am doing. I
tried searching online and there is a lot about how to train a
model but I don’t need to do be able to do that.

Any help pointing me in the right direction would be

submitted by /u/barrinmw

[visit reddit]



Is my data actually predictive

Hey, Very much a beginner with tensorflow, but been enjoying it
so far.

Background: response between 0-200, have 43 variables,
regression type problem, data set is over 200k rows

I’ve built a basic sequential model using Keras, and my loss
and validation loss are ideal – I.e validation loss is slightly
above loss, and it looks as it should.

However my actual loss seems quite high, it is converging around
34 and I’d have liked it to be around 20, now because of the
above I’m not sure if this means my data isn’t actually

I have standardised many variables rather than normalised, I’m
not sure if this would make any difference.

Is there anything I could add you think? I don’t think the
data set is lacking particularly woth the dimensions.

submitted by /u/Accomplished-Big4227

[visit reddit]



Upcoming Webinars: Learn About the New Features of JetPack 4.5 and VPI API for Jetson

JetPack SDK 4.5 is now available. This production release features enhanced secure boot, disk encryption, a new way to flash Jetson devices through Network File System, and the first production release of Vision Programming Interface.

JetPack SDK 4.5 is now available. This production release features enhanced secure boot, disk encryption, a new way to flash Jetson devices through Network File System, and the first production release of Vision Programming Interface. 

For AI embedded and edge developers, the latest update for NVIDIA JetPack is available. It includes the first production release of Vision Programming Interface (VPI) to accelerate computer vision on Jetson. Visit our download page to learn more.

This production release features:

  • Enhanced secure boot and support for disk encryption
  • Improved Jetson Nano™ bootloader functionality
  • A new way of flashing Jetson devices using network file system 
  • V4l2 api extended to support CSI cameras

Download now >>

Upcoming Webinars

The Jetson team is hosting two webinars with live Q&A to dive into JetPack’s new capabilities. Learn how to get the most out of your Jetson device and accelerate development. 

NVIDIA JetPack 4.5 Overview and Feature Demo
February 9 at 9 a.m. PT

This webinar is a great way to learn about what’s new in JetPack 4.5. We’ll provide an in-depth look at the new release and show a live demo of select features. Come with questions—our Jetson experts will be hosting a live Q&A after the presentation.

Register now >> 

Implementing Computer Vision and Image Processing Solutions with VPI
February 11 at 9 a.m. PT

Get a comprehensive introduction to VPI API. You’ll learn how to build a complete and efficient stereo disparity-estimation pipeline using VPI that runs on Jetson-family devices. It provides a unified API to both CPU and NVIDIA CUDA algorithm implementations, as well as interoperability between VPI and OpenCV and CUDA.
Register now >>


Anyone have a good example/ tutorial for TF attention/ transformers from scratch?

I am have searched a lot of tutorials and courses, most start
with a BERT model or some variation of it. I want to watch/ learn
how a transformer/ attention is trainned from scratch.

I want to try to build a attention/ transformer model for solved
games like chess, (ie I will have generate-able data)

submitted by /u/Ok_Cryptographer2209

[visit reddit]



Training custom EfficientNet from scratch (greyscale)

I’m looking at reducing the costs of EfficientNet for a task
that only deals with greyscale data.

To do this, I need to reduce the number of filters across the
network by 1/3rd (RGB -> (B/W)), and train on COCO in

TensorFlow 2 Detection Model Zoo
a link of training configs
if you want to train from

However, I can’t seem to find how I would edit the architecture
to reduce the number of channels.

I can see there’s an
official definition in Keras
, however I’m unsure if this is
what’s used by the config.

If there was some way to load the saved model, and then edit
it’s structure that way, that could work. But I’m unsure if there’s
a better way to do this.

submitted by /u/pram-ila

[visit reddit]



How do I convert my checkpoint file to a pb file

So at this point I’ve managed to get ahold of my checkpoint file
which is of type `DATA-00000-OF-00001` and there is also a similar
one that is of type `INDEX` and this file is significantly smaller
in terms of size. I would like to convert these two into a single
`*.pb` file. Is that possible?

submitted by /u/SilentWolfDev

[visit reddit]