Categories
Misc

Record, Edit, and Rewind in Virtual Reality with NVIDIA VR Capture and Replay

Developers and users can capture and replay VR sessions for performance testing and scene troubleshooting with early access to NVIDIA Virtual Reality Capture and Replay.

Developers and early access users can now accurately capture and replay VR sessions for performance testing, scene troubleshooting, and more with NVIDIA Virtual Reality Capture and Replay (VCR.) 

The potentials of virtual worlds are limitless, but working with VR content poses challenges, especially when it comes to recording or recreating a virtual experience. Unlike the real world, capturing an immersive scene isn’t as easy as taking a video on your phone or hitting the record button on your computer.

It’s impossible to repeat an identical experience in VR, and immersive demos are often jittery and difficult to watch due to excessive camera motion. Creating VR applications can also be cumbersome, as developers have to jump in and out of their headsets to code, test, and refine their work. Plus, all of these tasks require a 1:1 device connection, in order to launch and run a VR application.

All of this makes recording anything in VR an extremely time-consuming and tedious process. 

“We often find ourselves spending more time getting the hardware ready and navigating to a location within VR than we actually do testing or troubleshooting an issue,” explains Lukas Faeth, Senior Product Manager at Autodesk. “The NVIDIA VCR SDK should help us test performance between builds without having to put someone in VR for hours at a time.”

“The NVIDIA VCR SDK seems at first promising and rather cool, and when I tried it out it left my head spinning! With a little creative thought, this tool can be very powerful. I am still trying to get my head around the myriad of ways I can use it in my day-to-day workflows. It has opened up quite a few use cases for me in terms of automatic testing, training potential VR users, and creating high-quality GI renders of an OpenGL VR session,” said Danny Tierney, Automotive Design Solution Specialist at Autodesk

Easier, faster VR video production

NVIDIA VCR started as an internal project for VR performance testing across NVIDIA GPUs. The NVIDIA XR team continued to expand the feature set as they recognized new use cases. The team is making it available to select partners to help evaluate, test, and identify additional applications for the project.

Four images that depict a performance testing graph, an exploded car model, an avatar host, and a group of four avatars viewing a car model in a virtual scene.
Figure 1. Potential NVIDIA VCR use cases: Performance testing, scene troubleshooting, and VR video generation.

With NVIDIA VCR, developers and creators can more easily develop VR applications, assist end users with QA and troubleshooting, and generate quality VR videos. 

NVIDIA VCR features include:

  • Accurate and painless VR session playback. This is especially useful for performance testing and QC.
  • Less time in a headset. With a reduced number of development steps users spend less time jumping in and out of VR.
  • Multirole recordings from a single headset in the same VR scene using one HMD. Replay the recordings simultaneously to simulate collaboration.

Early partners like ESI Group imagine promising opportunities to leverage the SDK. “NVIDIA VCR opens up infinite possibilities for immersive experiences,” says Eric Kam, Solutions Marketing Manager at ESI Group. 

“Recording and playback of virtuality add a temporal dimension to VR sessions,” Kam adds, pointing out that VCR could be developed to serve downstream workflows in addition to addressing challenges with performance testing.

Getting started with NVIDIA VCR

NVIDIA VCR records time-stamped HMD and controller inputs during an immersive VR session. Users can then replay the recording, without an HMD attached, to reproduce the session. It’s also possible to filter the recorded session through an optional processing step, cleaning up the data and removing excessive camera motion.

Diagram detailing the VCR workflow of capturing, filtering, and replaying VR content.
Figure 2. NVIDIA VCR workflow to capture, filter, and replay VR content.

Components of NVIDIA VCR:

  • Capture is an OpenVR background application that stores HMD and controller properties and logs motion and button presses into tracking data.
  • Filter is an optional processing step to read and write recorded sessions. Using the VCR C++ API, developers can analyze a session, clean up data, or retime HMD motion paths.
  • Replay uses an OpenVR driver to emulate an HMD and controllers, read tracking data, and replay motion and button presses in the scene. Hardware properties such as display resolution and refresh rate can be edited as a JSON file.

Four NVIDIA VCR Use Cases

  1. Use a simple capture and replay workflow to record tracking data and replay it an infinite number of times. This is ideal for verifying scene correctness, such as in performance testing or QC use cases.
Video 1. Example of a simple capture and replay scenario.
  1. In a filtering workflow, apply motion data smoothing to minimize jitter and produce a more professional-looking VR demo video or tutorial.
Video 2. Filtering a VCR session to reduce jitter.
  1. Repeat and mix segments captured in VCR to generate an entirely new sequence. In the video below, the same set of segments (the letters “H,” “o,” “l,” and “e” in addition to movement and interaction data) were reordered to spell a completely new word.
Video 3. How to repeat and mix segments captured in VCR.
  1. Use NVIDIA VCR within the Autodesk VRED application to capture an example of single-user collaboration. In this workflow, one user generates four separate VCR captures with a single HMD system. These were then replayed simultaneously on multiple systems to simulate multiuser collaboration.
Video 4. Building a collaborative scene in VCR using a single HMD system.

Apply to become an early access partner

NVIDIA VCR is available to a limited number of early access partners. If you have an innovative use case and are willing to provide feedback on VCR apply for early access.

Categories
Misc

Latest Releases and Resources: NVIDIA GTC 2022

This GTC focused roundup features updates to the HPC SDK, cuQuantum SDK, Nsight Graphics and Systems 2022.2, CUDA 11.6, Update 1, cuNumeric, and Warp.

Our weekly roundup covers the most recent software updates, learning resources, events, and notable news. This week we have several software releases.


Software releases 

Leveraging standard languages for portable and performant code with the HPC SDK

The NVIDIA HPC SDK is a comprehensive suite of compilers, libraries, and tools for developing accelerated HPC applications. With a breadth of flexible support options, users can create applications with a programming model most relevant to their situation. 

The HPC SDK offers a variety of programming models including performance-optimized drop-in libraries, standard languages, directives-based methods, and specialization provided by CUDA. Many of the latest enhancements have been in the area of standard language support for ISO C++, ISO Fortran, and Python. 

The NVIDIA HPC compilers use recent advances in the public specifications for these languages, delivering a productive programming environment that is both portable and performant for scaling on GPU accelerated platforms. 

Visit our site to download the new HPC SDK version 22.3 and read our new post on parallel programming with standard languages under our “Resources” section.

Get started: NVIDIA HPC SDK 22.3     

Accelerate quantum circuit simulation with the NVIDIA cuQuantum SDK  

cuQuantum – An SDK for accelerating quantum circuit simulation
NVIDIA cuQuantum is an SDK of optimized libraries and tools for accelerating quantum computing workflows. Developers can use cuQuantum while creating and verifying new algorithms more easily and reliably. Speeding up quantum circuit simulations by orders of magnitude, for both state vector and tensor network methods helps developers simulate bigger problems, faster.  

Expanding ecosystem integrations and collaborations
cuQuantum is now integrated as a backend in popular industry simulators. It is also offered as a part of quantum application development platforms, and is used to power quantum research at scale in areas from chemistry to climate modeling.

SDK available with new appliance beta 
The cuQuantum SDK is now GA and free to download. NVIDIA has also packaged up an optimized Beta software container, the cuQuantum DGX appliance, available from the NGC Catalog. 

Learn more: cuQuantum SDK 

Boost ray tracing application performance using Nsight Graphics 2022.2    

Nsight Graphics is a performance analysis tool designed to visualize, analyze, and optimize programming models. It also tunes to scale efficiently across any quantity or size of CPUs and GPUs—from workstations to supercomputers.

The latest features in Nsight Graphics 2022.2 include:

  • AABB Overlay Heatmap display
  • Shader Timing Heatmap (with D3D12/Vulkan + RT support)
  • Shows other processes using the GPU with ETW
  • Vulkan video extension 

Download now: Nsight Graphics 2022.2

Simplify system profiling and debugging with Nsight Systems 2022.2    

Nsight Systems is a triage and performance analysis tool designed to track GPU workloads to their CPU origins within a system-wide view. The features help you analyze your applications’ GPU utilization, graphic and compute API activity, and OS runtime operations. This helps optimize your application to perform and scale efficiently across any quantity or size of CPUs and GPUs—from workstations to supercomputers.

What’s new:

  • NVIDIA NIC Ethernet metrics sampling
  • Vulkan memory operations and warnings
  • Vulkan graphics pipeline library
  • Multireport view enhancements

Download now: Nsight Systems 2022.2

Enhanced CUDA 11.6, Update 1, platform for all new SDKs 

This CUDA Toolkit release focuses on enhancing the programming model and performance of CUDA applications. CUDA 11.6 ships with the R510 driver, an update branch. CUDA Toolkit 11.6, Update 1, is available to download.

What’s new:

  • GSP driver architecture is now default on NVIDIA Turing and Ampere GPUs.
  • New API for disabling nodes in instantiated graph.
  • Full support of 128-bit integer type
  • Cooperative groups namespace update
  • CUDA compiler update
  • Nsight Compute 2022.1 release

Learn more: CUDA Toolkit 11.6, Update 1                            

Deliver distributed and accelerated computing to Python with the help of cuNumeric 

NVIDIA cuNumeric is a Legate library that aspires to provide a drop-in replacement for the NumPy API on top of the Legion runtime. This brings distributed and accelerated computing on the NVIDIA platform to the Python community.

What’s new:

  • Transparently accelerates and scales existing NumPy workflow
  • Scales to up to thousands of GPUs optimally
  • Requires zero code changes to ensure developer productivity
  • Freely available, get started on GitHub or Conda

Learn more: cuNumeric

Warp helps Python coders with GPU-accelerated graphics simulation    

Warp is a Python framework that gives coders an easy way to write GPU-accelerated, kernel-based programs in NVIDIA Omniverse and OmniGraph. With Warp, Omniverse developers can create GPU-accelerated, 3D simulation workflows and fantastic virtual worlds!

What’s new:

  • Kernel-based code in Python
  • Differentiable programming
  • Built-in geometry processing
  • Simulation performance on par with native code
  • Shorter time to market with improved iteration time

Learn more: Warp

Categories
Misc

Segment Objects without Masks and Reduce Annotation Effort Using the DiscoBox DL Framework

High quality instance segmentation done fast and efficiently with DiscoBox.Discobox is a weakly supervised learning algorithm to identify objects without costly mask annotations during training.High quality instance segmentation done fast and efficiently with DiscoBox.

Instance segmentation is a core visual recognition problem for detecting and segmenting objects. In the past several years, this area has been one of the holy grails in the computer vision community with wide applications ranging from autonomous vehicles (AV), robotics, video analysis, smart home, digital human, and healthcare. 

Annotation, the process of classifying every object in an image or video, is a challenging component of instance segmentation. Training a conventional instance segmentation method, such as Mask R-CNN, requires class labels, bounding boxes, and segmentation masks of objects simultaneously.

However, obtaining segmentation masks is costly and time-consuming. The COCO dataset for example required about 70,000 hours of time to annotate 200k images, with 55,000 hours spent gathering object masks.

Introducing Discobox

Working to expedite the annotation process, NVIDIA researchers developed the DiscoBox framework. The solution uses a weakly supervised learning algorithm that can output high-quality instance segmentation without mask annotations during training.

The framework generates instance segmentation directly from bounding box supervisions, rather than using mask annotations to directly supervise the task. Bounding boxes were introduced as a fundamental form of annotation for training modern object detectors and use labeled rectangles to tightly enclose objects. Each rectangle encodes the localization, size, and category information of an object.

Bounding box annotation is the sweet spot of industrial computer vision applications. It contains rich localization information and is very easy to draw, making it more affordable and scalable when annotating large amounts of data. However, by itself, it does not provide pixel-level information, and cannot be directly used for training instance segmentation.

The output of an image through DiscoBox is three fold: object detection, instance segmentation and semantic correspondence.
Figure 1. Given a pair of input images, DiscoBox is able to jointly output detection, instance segmentation, and multi-object semantic correspondence.

Innovative features of DiscoBox

DiscoBox is the first weakly supervised instance segmentation algorithm that gives comparable performance to fully-supervised methods while reducing labeling time and costs. The method, for example, is faster and more accurate than the legendary Mask R-CNN, without requiring mask annotations during training. This raises the question of whether mask annotations are truly needed in future instance segmentation applications as less labeling is required.

DiscoBox is also the first weakly supervised algorithm that unifies both instance segmentation and multi-object semantic correspondence under box supervision. These two tasks are useful in many computer vision applications such as 3D reconstruction and are shown to mutually help each other. For example, predicted object masks from instance segmentation can help semantic correspondence to focus on foreground object pixels, whereas semantic correspondence can refine mask prediction. DiscoBox unifies both tasks under box supervision, making their model training easy and scalable.

At the center of DiscoBox is a teacher-student design. The design features the use of self-consistency as a self-supervision to replace the mask supervision missing in DiscoBox training. The design is effective in promoting high-quality mask prediction, even though mask annotations are absent in training.

DiscoBox applications

There are many applications of DiscoBox beyond its use as an auto-labeling toolkit for AI applications at NVIDIA. By automating costly mask annotations the tool could help product teams in intelligent video analytics or AV save a significant amount on annotation budgets.

Another potential application is 3D reconstruction, an area where both object masks and semantic correspondence are important information for a reconstruction task. DiscoBox is capable of giving these two outputs with only bounding box supervision, helping generate large-scale 3D reconstruction in an open-world scenario. This could benefit many applications for building virtual worlds, such as content creation, virtual reality, and digital humans.

For more information on the model or to use the code, visit DiscoBox on GitHub.

To learn more about research NVIDIA is conducting, visit NVIDIA Research.

Categories
Misc

Orchestrated to Perfection: NVIDIA Data Center Grooves to Tune of Millionfold Speedups

The hum of a bustling data center is music to an AI developer’s ears — and NVIDIA data centers have found a rhythm of their own, grooving to the swing classic “Sing, Sing, Sing” in this week’s GTC keynote address. The lighthearted video, created with the NVIDIA Omniverse platform, features Louis Prima’s iconic music track, Read article >

The post Orchestrated to Perfection: NVIDIA Data Center Grooves to Tune of Millionfold Speedups appeared first on NVIDIA Blog.

Categories
Misc

Take Control This GFN Thursday With New Stratus+ Controller From SteelSeries

GeForce NOW gives you the power to game almost anywhere, at GeForce quality. And with the latest controller from SteelSeries, members can stay in control of the action on Android and Chromebook devices. This GFN Thursday takes a look at the SteelSeries Stratus+, now part of the GeForce NOW Recommended program. And it wouldn’t be Read article >

The post Take Control This GFN Thursday With New Stratus+ Controller From SteelSeries appeared first on NVIDIA Blog.

Categories
Misc

What’s the utility of the audio embeddings from Google Audioset for audio classification?

I have extracted the audio embeddings from Google Audioset corpus (https://research.google.com/audioset/dataset/index.html). The audio embeddings contain a list of “bytes_lists” which is similar to the following

 feature { bytes_list { value: "#226]06(N223K377207r36333337700Y322v935130300377311375215E342377J0000_00370222:2703773570024500377213jd267353377J33$2732673073537700207244Q00002060000312356<R325g30335616N224377270377237240377377321252j357O217377377,33000377|24600133400377357212267300b000000251236002333500326377327327377377223009{" } } 

From the documentation and forum discussions, I learnt that these embeddings are the output of a pretrained model (MFCC+CNN) of the 10 second chunks of respective youtube videos. I have also learnt that these embeddings make it easy to work on deep learning models. How does it help the ML engineers?

My confusion is if these audio embeddings are already pre-trained, what are the utilities of these audio embeddings? i.e. How can I use these embeddings to train advanced models for performing Sound Event Detection?

submitted by /u/sab_1120
[visit reddit] [comments]

Categories
Misc

Tensorflow Transfer Learning (VGG16) Error: ValueError: Shapes (None, 1) and (None, 4) are incompatible

Hello! So I am trying to create a multiclass classifier using VGG16 in transfer learning to classify users’I emotions. The data is sorted into 4 classes, which have their proper directories so I can use the ‘image_dataset_from_directory’ function.

def dataset_creator(directory=""): from keras.preprocessing.image import ImageDataGenerator data = image_dataset_from_directory(directory=directory,labels='inferred') return data train_ds = dataset_creator(directory=traindir) val_set = dataset_creator(directory="~/Documents/CC/visSystems/val_set/") print(type(train_ds)) num_classes = 4 base_model = VGG16(weights="imagenet", include_top=False, input_shape=(256,256,3),classes=4) base_model.trainable = False normalization_layer = layers.Rescaling(scale=1./127.5, offset=-1) flatten_layer = layers.Flatten() dense_layer_0 = layers.Dense(520, activation='relu') dense_layer_1 = layers.Dense(260, activation='relu') dense_layer_2 = layers.Dense(160, activation='relu') dense_layer_3 = layers.Dense(80, activation='relu') prediction_layer = layers.Dense(4, activation='softmax') model = models.Sequential([ base_model, normalization_layer, flatten_layer, dense_layer_1, dense_layer_2, dense_layer_3, prediction_layer ]) from tensorflow.keras.callbacks import EarlyStopping model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'], ) es = EarlyStopping(monitor='val_accuracy', mode='max', patience=3, restore_best_weights=True) model.fit(train_ds,validation_data=val_set, epochs=10, callbacks=[es]) model.save("~/Documents/CC/visSystems/affect2model/saved_model") 

My code correctly identifies X number of images to 4 classes, but when I try to execute model.fit() it returns this error:

ValueError: in user code: File "/home/blabs/.local/lib/python3.9/site-packages/keras/engine/training.py", line 878, in train_function * return step_function(self, iterator) File "/home/blabs/.local/lib/python3.9/site-packages/keras/engine/training.py", line 867, in step_function ** outputs = model.distribute_strategy.run(run_step, args=(data,)) File "/home/blabs/.local/lib/python3.9/site-packages/keras/engine/training.py", line 860, in run_step ** outputs = model.train_step(data) File "/home/blabs/.local/lib/python3.9/site-packages/keras/engine/training.py", line 809, in train_step loss = self.compiled_loss( File "/home/blabs/.local/lib/python3.9/site-packages/keras/engine/compile_utils.py", line 201, in __call__ loss_value = loss_obj(y_t, y_p, sample_weight=sw) File "/home/blabs/.local/lib/python3.9/site-packages/keras/losses.py", line 141, in __call__ losses = call_fn(y_true, y_pred) File "/home/blabs/.local/lib/python3.9/site-packages/keras/losses.py", line 245, in call ** return ag_fn(y_true, y_pred, **self._fn_kwargs) File "/home/blabs/.local/lib/python3.9/site-packages/keras/losses.py", line 1664, in categorical_crossentropy return backend.categorical_crossentropy( File "/home/blabs/.local/lib/python3.9/site-packages/keras/backend.py", line 4994, in categorical_crossentropy target.shape.assert_is_compatible_with(output.shape) ValueError: Shapes (None, 1) and (None, 4) are incompatible 

How can I approach solving this issue? Thank you for your help.

submitted by /u/blevlabs
[visit reddit] [comments]

Categories
Misc

Newb Question: How to host and load Tensorflow Models (as a directory) in the Cloud?

We have a Tensorflow workflow and model that works great when used in a local environment (Python) – however, we now need to push it to production (Heroku). So we’re thinking we need to move our model into some type of Cloud hosting.

If possible, I’d like to upload the model directory (not an H5 file) to a cloud service/storage provider and then load that model into Tensorflow.

Here is how we’re currently loading in a model, and what we’d like to be able to do:

# Current setup loads model from local directory dnn_model = tf.keras.models.load_model('./neural_network/true_overall) # We'd like to be able to load the model from a cloud service/storage dnn_model = tf.keras.models.load_model('https://some-kinda-storage-service.com/neural_network/true_overall) 

Downloading the directory and running it from a temp directory isn’t an option with our setup – so we’ll need to be able to run the model from the cloud. We don’t necessarily need to “train” the model in the cloud, we just need to be able to load it.

I’ve looked into some things like TensorServe and TensorCloud, but I’m not 100% sure if thats what we need (we’re super new to Tensorflow and AI in general).

What’s the best way to get the models (as a directory) into the cloud so we can load them into our code?

submitted by /u/jengl
[visit reddit] [comments]

Categories
Offsites

Auto-generated Summaries in Google Docs

For many of us, it can be challenging to keep up with the volume of documents that arrive in our inboxes every day: reports, reviews, briefs, policies and the list goes on. When a new document is received, readers often wish it included a brief summary of the main points in order to effectively prioritize it. However, composing a document summary can be cognitively challenging and time-consuming, especially when a document writer is starting from scratch.

To help with this, we recently announced that Google Docs now automatically generates suggestions to aid document writers in creating content summaries, when they are available. Today we describe how this was enabled using a machine learning (ML) model that comprehends document text and, when confident, generates a 1-2 sentence natural language description of the document content. However, the document writer maintains full control — accepting the suggestion as-is, making necessary edits to better capture the document summary or ignoring the suggestion altogether. Readers can also use this section, along with the outline, to understand and navigate the document at a high level. While all users can add summaries, auto-generated suggestions are currently only available to Google Workspace business customers. Building on grammar suggestions, Smart Compose, and autocorrect, we see this as another valuable step toward improving written communication in the workplace.

A blue summary icon appears in the top left corner when a document summary suggestion is available. Document writers can then view, edit, or ignore the suggested document summary.

Model Details
Automatically generated summaries would not be possible without the tremendous advances in ML for natural language understanding (NLU) and natural language generation (NLG) over the past five years, especially with the introduction of Transformer and Pegasus.

Abstractive text summarization, which combines the individually challenging tasks of long document language understanding and generation, has been a long-standing problem in NLU and NLG research. A popular method for combining NLU and NLG is training an ML model using sequence-to-sequence learning, where the inputs are the document words, and the outputs are the summary words. A neural network then learns to map input tokens to output tokens. Early applications of the sequence-to-sequence paradigm used recurrent neural networks (RNNs) for both the encoder and decoder.

The introduction of Transformers provided a promising alternative to RNNs because Transformers use self-attention to provide better modeling of long input and output dependencies, which is critical in document summarization. Still, these models require large amounts of manually labeled data to train sufficiently, so the advent of Transformers alone was not enough to significantly advance the state-of-the-art in document summarization.

The combination of Transformers with self-supervised pre-training (e.g., BERT, GPT, T5) led to a major breakthrough in many NLU tasks for which limited labeled data is available. In self-supervised pre-training, a model uses large amounts of unlabeled text to learn general language understanding and generation capabilities. Then, in a subsequent fine-tuning stage, the model learns to apply these abilities on a specific task, such as summarization or question answering.

The Pegasus work took this idea one step further, by introducing a pre-training objective customized to abstractive summarization. In Pegasus pre-training, also called Gap Sentence Prediction (GSP), full sentences from unlabeled news articles and web documents are masked from the input and the model is required to reconstruct them, conditioned on the remaining unmasked sentences. In particular, GSP attempts to mask sentences that are considered essential to the document through different heuristics. The intuition is to make the pre-training as close as possible to the summarization task. Pegasus achieved state-of-the-art results on a varied set of summarization datasets. However, a number of challenges remained to apply this research advancement into a product.

Applying Recent Research Advances to Google Docs

  • Data

    Self-supervised pre-training results in an ML model that has general language understanding and generation capabilities, but a subsequent fine-tuning stage is critical for the model to adapt to the application domain. We fine-tuned early versions of our model on a corpus of documents with manually-generated summaries that were consistent with typical use cases.

    However, early versions of this corpus suffered from inconsistencies and high variation because they included many types of documents, as well as many ways to write a summary — e.g., academic abstracts are typically long and detailed, while executive summaries are brief and punchy. This led to a model that was easily confused because it had been trained on so many different types of documents and summaries that it struggled to learn the relationships between any of them.

    Fortunately, one of the key findings in the Pegasus work was that an effective pre-training phase required less supervised data in the fine-tuning stage. Some summarization benchmarks required as few as 1,000 fine-tuning examples for Pegasus to match the performance of Transformer baselines that saw 10,000+ supervised examples — suggesting that one could focus on quality rather than quantity.

    We carefully cleaned and filtered the fine-tuning data to contain training examples that were more consistent and represented a coherent definition of summaries. Despite the fact that we reduced the amount of training data, this led to a higher quality model. The key lesson, consistent with recent work in domains like dataset distillation, was that it was better to have a smaller, high quality dataset, than a larger, high-variance dataset.

  • Serving

    Once we trained the high quality model, we turned to the challenge of serving the model in production. While the Transformer version of the encoder-decoder architecture is the dominant approach to train models for sequence-to-sequence tasks like abstractive summarization, it can be inefficient and impractical to serve in real-world applications. The main inefficiency comes from the Transformer decoder where we generate the output summary token by token through autoregressive decoding. The decoding process becomes noticeably slow when summaries get longer since the decoder attends to all previously generated tokens at each step. RNNs are a more efficient architecture for decoding since there is no self-attention with previous tokens as in a Transformer model.

    We used knowledge distillation, which is the process of transferring knowledge from a large model to a smaller more efficient model, to distill the Pegasus model into a hybrid architecture of a Transformer encoder and an RNN decoder. To improve efficiency we also reduced the number of RNN decoder layers. The resulting model had significant improvements in latency and memory footprint while the quality was still on par with the original model. To further improve the latency and user experience, we serve the summarization model using TPUs, which provide significant speed ups and allow more requests to be handled by a single machine.

Ongoing Challenges and Next Steps
While we are excited by the progress so far, there are a few challenges we are continuing to tackle:

  • Document coverage: Developing a set of documents for the fine-tuning stage was difficult due to the tremendous variety that exists among documents, and the same challenge is true at inference time. Some of the documents our users create (e.g., meeting notes, recipes, lesson plans and resumes) are not suitable for summarization or can be difficult to summarize. Currently, our model only suggests a summary for documents where it is most confident, but we hope to continue broadening this set as our model improves.
  • Evaluation: Abstractive summaries need to capture the essence of a document while being fluent and grammatically correct. A specific document may have many summaries that can be considered correct, and different readers may prefer different ones. This makes it hard to evaluate summaries with automatic metrics only, user feedback and usage statistics will be critical for us to understand and keep improving quality.
  • Long documents: Long documents are some of the toughest documents for the model to summarize because it is harder to capture all the points and abstract them in a single summary, and it can also significantly increase memory usage during training and serving. However, long documents are perhaps most useful for the model to automatically summarize because it can help document writers get a head start on this tedious task. We hope we can apply the latest ML advancements to better address this challenge.

Conclusion
Overall, we are thrilled that we can apply recent progress in NLU and NLG to continue assisting users with reading and writing. We hope the automatic suggestions now offered in Google Workspace make it easier for writers to annotate their documents with summaries, and help readers comprehend and navigate documents more easily.

Acknowledgements
The authors would like to thank the many people across Google that contributed to this work: AJ Motika, Matt Pearson-Beck, Mia Chen, Mahdis Mahdieh, Halit Erdogan, Benjamin Lee, Ali Abdelhadi, Michelle Danoff, Vishnu Sivaji, Sneha Keshav, Aliya Baptista, Karishma Damani, DJ Lick, Yao Zhao, Peter Liu, Aurko Roy, Yonghui Wu, Shubhi Sareen, Andrew Dai, Mekhola Mukherjee, Yinan Wang, Mike Colagrosso, and Behnoosh Hariri. .

Categories
Misc

What Is Path Tracing?

Turn on your TV. Fire up your favorite streaming service. Grab a Coke. A demo of the most important visual technology of our time is as close as your living room couch. Propelled by an explosion in computing power over the past decade and a half, path tracing has swept through visual media. It brings Read article >

The post What Is Path Tracing? appeared first on NVIDIA Blog.