Categories
Misc

NVIDIA Releases Open-Source GPU Kernel Modules

CUDA 16x9 Aspect RatioThe first open-source release of GPU kernel modules for the Linux community help improve NVIDIA GPU driver quality and security.CUDA 16x9 Aspect Ratio

NVIDIA is now publishing Linux GPU kernel modules as open source with dual GPL/MIT license, starting with the R515 driver release. You can find the source code for these kernel modules in the NVIDIA Open GPU Kernel Modules repo on GitHub.

This release is a significant step toward improving the experience of using NVIDIA GPUs in Linux, for tighter integration with the OS and for developers to debug, integrate, and contribute back. For Linux distribution providers, the open-source modules increase ease of use. They also improve the out-of-the-box user experience to sign and distribute the NVIDIA GPU driver. Canonical and SUSE are able to immediately package the open kernel modules with Ubuntu and SUSE Linux Enterprise Distributions. 

Developers can trace into code paths and see how kernel event scheduling is interacting with their workload for faster root cause debugging. In addition, enterprise software developers can now integrate the driver seamlessly into the customized Linux kernel configured for their project.

This will further help improve NVIDIA GPU driver quality and security with input and reviews from the Linux end-user community.

With each new driver release, NVIDIA publishes a snapshot of the source code on GitHub. Community submitted patches are reviewed and if approved, integrated into a future driver release. 

Refer to the NVIDIA contribution guidelines and overview of the driver release cadence and life-cycle documentation for more information.

Supported functionality

The first release of the open GPU kernel modules is R515. Along with the source code, fully-built and packaged versions of the drivers are provided.

For data center GPUs in the NVIDIA Turing and NVIDIA Ampere architecture families, this code is production ready. This was made possible by the phased rollout of the GSP driver architecture over the past year, designed to make the transition easy for NVIDIA customers. We focused on testing across a wide variety of workloads to ensure feature and performance parity with the proprietary kernel-mode driver.

In the future, functionality such as HMM will be a foundational component for confidential computing on the NVIDIA Hopper architecture.

In this open-source release, support for GeForce and Workstation GPUs is alpha quality. GeForce and Workstation users can use this driver on Turing and NVIDIA Ampere architecture GPUs to run Linux desktops and use features such as multiple displays, G-SYNC, and NVIDIA RTX ray tracing in Vulkan and NVIDIA OptiX. Users can opt in using the kernel module parameter NVreg_EnableUnsupportedGpus as highlighted in the documentation. More robust and fully featured GeForce and Workstation support will follow in subsequent releases and the NVIDIA Open Kernel Modules will eventually supplant the closed-source driver. 

Customers with Turing and Ampere GPUs can choose which modules to install. Pre-Turing customers will continue to run the closed source modules.

The open-source kernel-mode driver works with the same firmware and the same user-mode stacks such as CUDA, OpenGL, and Vulkan. However, all components of the driver stack must match versions within a release. For instance, you cannot take a release of the source code, build, and run it with the user-mode stack from a previous or future release. 

Refer to the driver README document for instructions on installing the right versions and additional troubleshooting steps.

Installation opt in

The R515 release contains precompiled versions of both the closed-source driver and the open-source kernel modules. These versions are mutually exclusive, and the user can make the choice at install time. The default option ensures that silent installs will pick the optimal path for NVIDIA Volta and older GPUs versus Turing+ GPUs.

Users can build kernel modules from the source code and install them with the relevant user-mode drivers.

Diagram with gray Installer ellipse pointing to two green rectangles showing how the CUDA R515 driver software is packaged with both binary and source modules.
Figure 1: Illustration of installation options for the end user to opt-in to open GPU kernel modules and the default path of closed source modules.

Partner ecosystem

NVIDIA has been working with Canonical, Red Hat, and SUSE for better packaging, deployment, and support models for our mutual customers.

Canonical

“The new NVIDIA open-source GPU kernel modules will simplify installs and increase security for Ubuntu users, whether they’re AI/ML developers, gamers, or cloud users,” commented Cindy Goldberg, VP of Silicon alliances at Canonical. “As the makers of Ubuntu, the most popular Linux-based operating system for developers, we can now provide even better support to developers working at the cutting edge of AI and ML by enabling even closer integration with NVIDIA GPUs on Ubuntu.”

In the coming months, the NVIDIA Open GPU kernel modules will make their way into the recently launched Canonical Ubuntu 22.04 LTS.

SUSE

“We at SUSE are excited that NVIDIA is releasing their GPU kernel-mode driver as open source. This is a true milestone for the open-source community and accelerated computing. SUSE is proud to be the first major Linux distribution to deliver this breakthrough with SUSE Linux Enterprise 15 SP4 in June. Together, NVIDIA and SUSE power your GPU-accelerated computing needs across cloud, data center, and edge with a secure software supply chain and excellence in support.” — Markus Noga, General Manager, Business Critical Linux at SUSE

Red Hat

“Enterprise open source can spur innovation and improve customers’ experience, something that Red Hat has always championed. We applaud NVIDIA’s decision to open source its GPU kernel driver. Red Hat has collaborated with NVIDIA for many years, and we are excited to see them take this next step. We look forward to bringing these capabilities to our customers and to improve interoperability with NVIDIA hardware.” — Mike McGrath, Vice President, Linux Engineering at Red Hat

Upstream approach

NVIDIA GPU drivers have been designed over the years to share code across operating systems, GPUs and Jetson SOCs so that we can provide a consistent experience across all our supported platforms. The current codebase does not conform to the Linux kernel design conventions and is not a candidate for Linux upstream.

There are plans to work on an upstream approach with the Linux kernel community and partners such as Canonical, Red Hat, and SUSE. 

In the meantime, published source code serves as a reference to help improve the Nouveau driver. Nouveau can leverage the same firmware used by the NVIDIA driver, exposing many GPU functionalities, such as clock management and thermal management, bringing new features to the in-tree Nouveau driver.

Stay tuned for more developments in future driver releases and collaboration on GitHub. 

Frequently asked questions

Where can I download the R515 driver?

You can download the R515 development driver as part of CUDA Toolkit 11.7, or from the driver downloads page under “Beta” drivers. The R515 data center driver will follow in subsequent releases per our usual cadence.

Can open GPU Kernel Modules be distributed?

Yes, the NVIDIA open kernel modules are licensed under a dual GPL/MIT license; and the terms of licenses govern the distribution and repackaging grants. 

Will the source for user-mode drivers such as CUDA be published?

These changes are for the kernel modules; while the user-mode components are untouched. So the user-mode will remain closed source and published with pre-built binaries in the driver and the CUDA toolkit.

Which GPUs are supported by Open GPU Kernel Modules?

Open kernel modules support all Ampere and Turing GPUs. Datacenter GPUs are supported for production, and support for GeForce and Workstation GPUs is alpha quality. Please refer to the Datacenter, NVIDIA RTX, and GeForce product tables for more details (Turing and above have compute capability of 7.5 or greater).

How to report bugs

Problems can be reported through the GitHub repository issue tracker or through our existing end-user support forum. Please report security issues through the channels listed on the GitHub repository security policy.

What is the process for patch submission and SLA/CLA for patches?

We encourage community submissions through pull requests on the GitHub page. The submitted patches will be reviewed and if approved, integrated with possible modifications into a future driver release. Please refer to the NVIDIA driver lifecycle document.

The published source code is a snapshot generated from a shared codebase, so contributions may not be reflected as separate Git commits in the GitHub repo. We are working on a process for acknowledging community contributions. We also advise against making significant reformatting of the code for the same reasons.

The process for submitting pull requests is described on our GitHub page and such contributions are covered under the contributor license agreement (CLA).

More detailed FAQs are also available on the GitHub page.

Categories
Misc

Optimize Your Ray Tracing Graphics with the New NVIDIA RTX Branch of Unreal Engine 5

This feature-rich branch is fully compatible with Unreal Engine 5, and contains all of the latest developments from NVIDIA in the world of ray tracing.

The NVIDIA Branch of Unreal Engine 5 (NvRTX 5.0) is now available. This feature-rich branch is fully compatible with Unreal Engine 5 and has all of the latest developments in the world of ray tracing.

NVIDIA makes it easy for developers to add leading-edge technologies to their Unreal Engine games and applications through custom branches on GitHub. With NvRTX 5.0, developers can leverage RTXGI, RTXDI, and other improvements to ray tracing, such as shadow mismatching when using Nanite.

NvRTX is an engaging and interactive solution for developers to create stunning visuals and enhance ray tracing developments on any DirectX ray tracing (DXR) capable GPU.  With these custom branches, developers are able to pick and choose which NVIDIA technologies they want to take advantage of in the games and applications.

Featured technologies include

  • RTX Global Illumination (RTXGI)
    • RTXGI provides scalable solutions to compute infinite multibounce lighting and soft-shadow occlusions without bake times, light leaks, or expensive per-frame costs.
  • RTX Direct Illumination (RTXDI)
    • RTXDI helps artists add unlimited shadow-casting and dynamic lights to game environments in real time without worrying about performance or resource constraints.
  • Deep Learning Super Sampling (DLSS)
    • DLSS leverages the power of a deep learning neural network to boost frame rates and generate beautiful, detailed images for your games.
  • Deep Learning Anti-Aliasing (DLAA)
    • DLAA is an AI-based anti-aliasing mode that uses the same technology powering NVIDIA DLSS, for even better graphics in your games.
  • NVIDIA Real-Time Denoisers (NRD)
    • NRD is a spatio-temporal, API-agnostic denoising library that’s designed to work with low ray-per-pixel signals.
  • NVIDIA Image Scaling
    • NVIDIA Image Scaling is a driver-based spatial upscaler and sharpener.

Learn more about NVIDIA resources for Unreal Engine developers.

Categories
Misc

Discover Industry Breakthroughs Using AI Technology at Microsoft Build 2022

Join Microsoft Build 2022 to learn how NVIDIA AI technology solutions are transforming industries such as retail, manufacturing, automotive, and healthcare.

AI continues to transform global industries such as retail, manufacturing, automotive, and healthcare. The NVIDIA and Microsoft Azure partnership provides developers with global access to AI infrastructure on-demand, simplified infrastructure management, and solutions deploying AI-enabled applications.

Join the NVIDIA team virtually at Microsoft Build May 24-26, and learn more about the latest developer technologies, tools, and techniques for data scientists and developers to take AI to production faster. Connect live with subject matter experts from NVIDIA and Microsoft, get your technical questions answered, and hear how customers like BMW and Archer Daniels Midland (ADM) are harnessing the power of NVIDIA technologies on Azure.

NVIDIA developer sessions at Microsoft Build 2022

The full NVIDIA content line-up can be found on our Microsoft Build showcase page.

Below is a quick preview:

Organizing data for machine learning

Live Customer Interview | 5/25, 10:45-11:15 am 
Isaac Himanga and Archer Daniels Midland, ADM
Watch the live interview. Registration is not required.

Many tools analyze equipment data to identify degraded performance or opportunities for improvement. What is not easy: finding relevant data for hundreds of thousands of assets to feed these models. ADM discusses how they’re organizing process data into a structure for quick deployment of AI to make data-based decisions. A new and better tool is needed to organize data and partnering with Sight Machine is helping move ADM closer to data-centric AI using NVIDIA GPU technology on Azure.

Azure Cognitive Service deployment: AI inference with NVIDIA Triton Server

Breakout | 5/25, 12-12:45 pm

Join this session to see how Azure Cognitive Services uses the NVIDIA Triton Inference Server for inference at scale. We highlight two use cases: deploying the first-ever Mixture of Expert model for document translation and acoustic model for Microsoft Teams Live Captioning. Tune in to learn about serving models with NVIDIA Triton, ONNX Runtime, and custom backends. 

How vision AI applications use NVIDIA DeepStream and Azure IOT edge services

Ask the Experts | 5/25, 1-1:30 pm

Join experts from NVIDIA and Microsoft where you can ask questions about developing applications Graph Composer and new DeepStream features, deployment through IoT Hub, connecting to other Azure IoT Services, or transmitting inference results to the cloud.

Discussing accelerated model inference for Azure ML deployment with ONNX-RT, OLive, NVIDIA Triton Inference Server and Triton Model Analyzer

Table Topic | 5/25, 2-2:30 pm

Leaving performance on the table for AI inference deployments leads to poor cloud infrastructure utilization, high operational costs, and sluggish UX. Learn how to optimize the model configuration to maximize inference performance by using ONNX Runtime, OLive, Azure ML, NVIDIA Triton Inference Server, and NVIDIA Triton Model Analyzer.

NVIDIA RAPIDS Spark plug-in on Azure Synapse

Video On-Demand

Accelerate your ETL and ML Spark applications using Azure Synapse with NVIDIA RAPIDS.

Hands-on labs, tutorials, and resources

As a developer, you are a key contributor to the advancement of every field. We have created an online space devoted to your needs, with access to free SDKs, technical documentation, peer and domain expert help, and information on hardware to tackle the biggest challenges.

Join the NVIDIA Developer Program for free and exclusive access to SDKs, technical documentation, peer, and domain expert help. NVIDIA offers tools and training to accelerate AI, HPC, and graphics applications.

Connect with NVIDIA at Microsoft Build 2022.

Categories
Offsites

Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate

Machine translation (MT) technology has made significant advances in recent years, as deep learning has been integrated with natural language processing (NLP). Performance on research benchmarks like WMT have soared, and translation services have improved in quality and expanded to include new languages. Nevertheless, while existing translation services cover languages spoken by the majority of people world wide, they only include around 100 languages in total, just over 1% of those actively spoken globally. Moreover, the languages that are currently represented are overwhelmingly European, largely overlooking regions of high linguistic diversity, like Africa and the Americas.

There are two key bottlenecks towards building functioning translation models for the long tail of languages. The first arises from data scarcity; digitized data for many languages is limited and can be difficult to find on the web due to quality issues with Language Identification (LangID) models. The second challenge arises from modeling limitations. MT models usually train on large amounts of parallel (translated) text, but without such data, models must learn to translate from limited amounts of monolingual text, which is a novel area of research. Both of these challenges need to be addressed for translation models to reach sufficient quality.

In “Building Machine Translation Systems for the Next Thousand Languages”, we describe how to build high-quality monolingual datasets for over a thousand languages that do not have translation datasets available and demonstrate how one can use monolingual data alone to train MT models. As part of this effort, we are expanding Google Translate to include 24 under-resourced languages. For these languages, we created monolingual datasets by developing and using specialized neural language identification models combined with novel filtering approaches. The techniques we introduce supplement massively multilingual models with a self supervised task to enable zero-resource translation. Finally, we highlight how native speakers have helped us realize this accomplishment.

Meet the Data
Automatically gathering usable textual data for under-resourced languages is much more difficult than it may seem. Tasks like LangID, which work well for high-resource languages, are unsuccessful for under-resourced languages, and many publicly available datasets crawled from the web often contain more noise than usable data for the languages they attempt to support. In our early attempts to identify under-resourced languages on the web by training a standard Compact Language Detector v3 (CLD3) LangID model, we too found that the dataset was too noisy to be usable.

As an alternative, we trained a Transformer-based, semi-supervised LangID model on over 1000 languages. This model supplements the LangID task with the MAsked Sequence-to-Sequence (MASS) task to better generalize over noisy web data. MASS simply garbles the input by randomly removing sequences of tokens from it, and trains the model to predict these sequences. We applied the Transformer-based model to a dataset that had been filtered with a CLD3 model and trained to recognize clusters of similar languages.

We then applied the open sourced Term Frequency-Inverse Internet Frequency (TF-IIF) filtering to the resulting dataset to find and discard sentences that were actually in related high-resource languages, and developed a variety of language-specific filters to eliminate specific pathologies. The result of this effort was a dataset with monolingual text in over 1000 languages, of which 400 had over 100,000 sentences. We performed human evaluations on samples of 68 of these languages and found that the majority (>70%) reflected high-quality, in-language content.

The amount of monolingual data per language versus the amount of parallel (translated) data per language. A small number of languages have large amounts of parallel data, but there is a long tail of languages with only monolingual data.

Meet the Models
Once we had a dataset of monolingual text in over 1000 languages, we then developed a simple yet practical approach for zero-resource translation, i.e., translation for languages with no in-language parallel text and no language-specific translation examples. Rather than limiting our model to an artificial scenario with only monolingual text, we also include all available parallel text data with millions of examples for higher resource languages to enable the model to learn the translation task. Simultaneously, we train the model to learn representations of under-resourced languages directly from monolingual text using the MASS task. In order to solve this task, the model is forced to develop a sophisticated representation of the language in question, developing a complex understanding of how words relate to other words in a sentence.

Relying on the benefits of transfer learning in massively multilingual models, we train a single giant translation model on all available data for over 1000 languages. The model trains on monolingual text for all 1138 languages and on parallel text for a subset of 112 of the higher-resourced languages.

At training time, any input the model sees has a special token indicating which language the output should be in, exactly like the standard formulation for multilingual translation. Our additional innovation is to use the same special tokens for both the monolingual MASS task and the translation task. Therefore, the token translate_to_french may indicate that the source is in English and needs to be translated to French (the translation task), or it may mean that the source is in garbled French and needs to be translated to fluent French (the MASS task). By using the same tags for both tasks, a translate_to_french tag takes on the meaning, “Produce a fluent output in French that is semantically close to the input, regardless of whether the input is garbled in the same language or in another language entirely. From the model’s perspective, there is not much difference between the two.

Surprisingly, this simple procedure produces high quality zero-shot translations. The BLEU and ChrF scores for the resulting model are in the 10–40 and 20–60 ranges respectively, indicating mid- to high-quality translation. We observed meaningful translations even for highly inflected languages like Quechua and Kalaallisut, despite these languages being linguistically dissimilar to all other languages in the model. However, we only computed these metrics on the small subset of languages with human-translated evaluation sets. In order to understand the quality of translation for the remaining languages, we developed an evaluation metric based on round-trip translation, which allowed us to see that several hundred languages are reaching high translation quality.

To further improve quality, we use the model to generate large amounts of synthetic parallel data, filter the data based on round-trip translation (comparing a sentence translated into another language and back again), and continue training the model on this filtered synthetic data via back-translation and self-training. Finally, we fine-tune the model on a smaller subset of 30 languages and distill it into a model small enough to be served.

Translation accuracy scores for 638 of the languages supported in our model, using the metric we developed (RTTLangIDChrF), for both the higher-resource supervised languages and the low-resource zero-resource languages.

Contributions from Native Speakers
Regular communication with native speakers of these languages was critical for our research. We collaborated with over 100 people at Google and other institutions who spoke these languages. Some volunteers helped develop specialized filters to remove out-of-language content overlooked by automatic methods, for instance Hindi mixed with Sanskrit. Others helped with transliterating between different scripts used by the languages, for instance between Meetei Mayek and Bengali, for which sufficient tools didn’t exist; and yet others helped with a gamut of tasks related to evaluation. Native speakers were also key for advising in matters of political sensitivity, like the appropriate name for the language, and the appropriate writing system to use for it. And only native speakers could answer the ultimate question: given the current quality of translation, would it be valuable to the community for Google Translate to support this language?

Closing Notes
This advance is an exciting first step toward supporting more language technologies in under-resourced languages. Most importantly, we want to stress that the quality of translations produced by these models still lags far behind that of the higher-resource languages supported by Google Translate. These models are certainly a useful first tool for understanding content in under-resourced languages, but they will make mistakes and exhibit their own biases. As with any ML-driven tool, one should consider the output carefully.

The complete list of new languages added to Google Translate in this update:

Acknowledgements
We would like to thank Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen, Yonghui Wu, and Macduff Hughes for their contributions to the research, engineering, and leadership of this project.

We would also like to extend our deepest gratitude to the following native speakers and members of affected communities, who helped us in a wide variety of ways: Yasser Salah Eddine Bouchareb (Algerian Arabic); Mfoniso Ukwak (Anaang); Bhaskar Borthakur, Kishor Barman, Rasika Saikia, Suraj Bharech (Assamese); Ruben Hilare Quispe (Aymara); Devina Suyanto (Balinese); Allahserix Auguste Tapo, Bakary Diarrassouba, Maimouna Siby (Bambara); Mohammad Jahangir (Baluchi); Subhajit Naskar (Bengali); Animesh Pathak, Ankur Bapna, Anup Mohan, Chaitanya Joshi, Chandan Dubey, Kapil Kumar, Manish Katiyar, Mayank Srivastava, Neeharika, Saumya Pathak, Tanya Sinha, Vikas Singh (Bhojpuri); Bowen Liang, Ellie Chio, Eric Dong, Frank Tang, Jeff Pitman, John Wong, Kenneth Chang, Manish Goregaokar, Mingfei Lau, Ryan Li, Yiwen Luo (Cantonese); Monang Setyawan (Caribbean Javanese); Craig Cornelius (Cherokee); Anton Prokopyev (Chuvash); Rajat Dogra, Sid Dogra (Dogri); Mohamed Kamagate (Dyula); Chris Assigbe, Dan Ameme, Emeafa Doe, Irene Nyavor, Thierry Gnanih, Yvonne Dumor (Ewe); Abdoulaye Barry, Adama Diallo, Fauzia van der Leeuw, Ibrahima Barry (Fulfulde); Isabel Papadimitriou (Greek); Alex Rudnick (Guarani); Mohammad Khdeir (Gulf Arabic); Paul Remollata (Hiligaynon); Ankur Bapna (Hindi); Mfoniso Ukwak (Ibibio); Nze Lawson (Igbo); D.J. Abuy, Miami Cabansay (Ilocano); Archana Koul, Shashwat Razdan, Sujeet Akula (Kashmiri); Jatin Kulkarni, Salil Rajadhyaksha, Sanjeet Hegde Desai, Sharayu Shenoy, Shashank Shanbhag, Shashi Shenoy (Konkani); Ryan Michael, Terrence Taylor (Krio); Bokan Jaff, Medya Ghazizadeh, Roshna Omer Abdulrahman, Saman Vaisipour, Sarchia Khursheed (Kurdish (Sorani));Suphian Tweel (Libyan Arabic); Doudou Kisabaka (Lingala); Colleen Mallahan, John Quinn (Luganda); Cynthia Mboli (Luyia); Abhishek Kumar, Neeraj Mishra, Priyaranjan Jha, Saket Kumar, Snehal Bhilare (Maithili); Lisa Wang (Mandarin Chinese); Cibu Johny (Malayalam); Viresh Ratnakar (Marathi); Abhi Sanoujam, Gautam Thockchom, Pritam Pebam, Sam Chaomai, Shangkar Mayanglambam, Thangjam Hindustani Devi (Meiteilon (Manipuri)); Hala Ajil (Mesopotamian Arabic); Hamdanil Rasyid (Minangkabau); Elizabeth John, Remi Ralte, S Lallienkawl Gangte,Vaiphei Thatsing, Vanlalzami Vanlalzami (Mizo); George Ouais (MSA); Ahmed Kachkach, Hanaa El Azizi (Morrocan Arabic); Ujjwal Rajbhandari (Newari); Ebuka Ufere, Gabriel Fynecontry, Onome Ofoman, Titi Akinsanmi (Nigerian Pidgin); Marwa Khost Jarkas (North Levantine Arabic); Abduselam Shaltu, Ace Patterson, Adel Kassem, Mo Ali, Yonas Hambissa (Oromo); Helvia Taina, Marisol Necochea (Quechua); AbdelKarim Mardini (Saidi Arabic); Ishank Saxena, Manasa Harish, Manish Godara, Mayank Agrawal, Nitin Kashyap, Ranjani Padmanabhan, Ruchi Lohani, Shilpa Jindal, Shreevatsa Rajagopalan, Vaibhav Agarwal, Vinod Krishnan (Sanskrit); Nabil Shahid (Saraiki); Ayanda Mnyakeni (Sesotho, Sepedi); Landis Baker (Seychellois Creole); Taps Matangira (Shona); Ashraf Elsharif (Sudanese Arabic); Sakhile Dlamini (Swati); Hakim Sidahmed (Tamazight); Melvin Johnson (Tamil); Sneha Kudugunta (Telugu); Alexander Tekle, Bserat Ghebremicael, Nami Russom, Naud Ghebre (Tigrinya); Abigail Annkah, Diana Akron, Maame Ofori, Monica Opoku-Geren, Seth Duodu-baah, Yvonne Dumor (Twi); Ousmane Loum (Wolof); and Daniel Virtheim (Yiddish).

Categories
Offsites

Language Models Perform Reasoning via Chain of Thought

In recent years, scaling up the size of language models has been shown to be a reliable way to improve performance on a range of natural language processing (NLP) tasks. Today’s language models at the scale of 100B or more parameters achieve strong performance on tasks like sentiment analysis and machine translation, even with little or no training examples. Even the largest language models, however, can still struggle with certain multi-step reasoning tasks, such as math word problems and commonsense reasoning. How might we enable language models to perform such reasoning tasks?

In “Chain of Thought Prompting Elicits Reasoning in Large Language Models,” we explore a prompting method for improving the reasoning abilities of language models. Called chain of thought prompting, this method enables models to decompose multi-step problems into intermediate steps. With chain of thought prompting, language models of sufficient scale (~100B parameters) can solve complex reasoning problems that are not solvable with standard prompting methods.

Comparison to Standard Prompting
With standard prompting (popularized by GPT-3) the model is given examples of input–output pairs (formatted as questions and answers) before being asked to predict the answer for a test-time example (shown below on the left). In chain of thought prompting (below, right), the model is prompted to produce intermediate reasoning steps before giving the final answer to a multi-step problem. The idea is that a model-generated chain of thought would mimic an intuitive thought process when working through a multi-step reasoning problem. While producing a thought process has been previously accomplished via fine-tuning, we show that such thought processes can be elicited by including a few examples of chain of thought via prompting only, which does not require a large training dataset or modifying the language model’s weights.

Whereas standard prompting asks the model to directly give the answer to a multi-step reasoning problem, chain of thought prompting induces the model to decompose the problem into intermediate reasoning steps, in this case leading to a correct final answer.

Chain of thought reasoning allows models to decompose complex problems into intermediate steps that are solved individually. Moreover, the language-based nature of chain of thought makes it applicable to any task that a person could solve via language. We find through empirical experiments that chain of thought prompting can improve performance on various reasoning tasks, and that successful chain of thought reasoning is an emergent property of model scale — that is, the benefits of chain of thought prompting only materialize with a sufficient number of model parameters (around 100B).

Arithmetic Reasoning
One class of tasks where language models typically struggle is arithmetic reasoning (i.e., solving math word problems). Two benchmarks in arithmetic reasoning are MultiArith and GSM8K, which test the ability of language models to solve multi-step math problems similar to the one shown in the figure above. We evaluate both the LaMDA collection of language models ranging from 422M to 137B parameters, as well as the PaLM collection of language models ranging from 8B to 540B parameters. We manually compose chains of thought to include in the examples for chain of thought prompting.

For these two benchmarks, using standard prompting leads to relatively flat scaling curves: increasing the scale of the model does not substantially improve performance (shown below). However, we find that when using chain of thought prompting, increasing model scale leads to improved performance that substantially outperforms standard prompting for large model sizes.

Employing chain of thought prompting enables language models to solve arithmetic reasoning problems for which standard prompting has a mostly flat scaling curve.

On the GSM8K dataset of math word problems, PaLM shows remarkable performance when scaled to 540B parameters. As shown in the table below, combining chain of thought prompting with the 540B parameter PaLM model leads to new state-of-the-art performance of 58%, surpassing the prior state of the art of 55% achieved by fine-tuning GPT-3 175B on a large training set and then ranking potential solutions via a specially trained verifier. Moreover, follow-up work on self-consistency shows that the performance of chain of thought prompting can be improved further by taking the majority vote of a broad set of generated reasoning processes, which results in 74% accuracy on GSM8K.

Chain of thought prompting with PaLM achieves a new state of the art on the GSM8K benchmark of math word problems. For a fair comparison against fine-tuned GPT-3 baselines, the chain of thought prompting results shown here also use an external calculator to compute basic arithmetic functions (i.e., addition, subtraction, multiplication and division).

Commonsense Reasoning
In addition to arithmetic reasoning, we consider whether the language-based nature of chain of thought prompting also makes it applicable to commonsense reasoning, which involves reasoning about physical and human interactions under the presumption of general background knowledge. For these evaluations, we use the CommonsenseQA and StrategyQA benchmarks, as well as two domain-specific tasks from BIG-Bench collaboration regarding date understanding and sports understanding. Example questions are below:

As shown below, for CommonsenseQA, StrategyQA, and Date Understanding, performance improved with model scale, and employing chain of thought prompting led to additional small improvements. Chain of thought prompting had the biggest improvement on sports understanding, for which PaLM 540B’s chain of thought performance surpassed that of an unaided sports enthusiast (95% vs 84%).

Chain of thought prompting also improves performance on various types of commonsense reasoning tasks.

Conclusions
Chain of thought prompting is a simple and broadly applicable method for improving the ability of language models to perform various reasoning tasks. Through experiments on arithmetic and commonsense reasoning, we find that chain of thought prompting is an emergent property of model scale. Broadening the range of reasoning tasks that language models can perform will hopefully inspire further work on language-based approaches to reasoning.

Acknowledgements
It was an honor and privilege to work with Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Quoc Le on this project.

Categories
Misc

Some results look interesting!

submitted by /u/limapedro
[visit reddit] [comments]

Categories
Misc

Need help to find an updated, without bug github code that shows how to use Object Detection API with Google Colab

Hi there, I’m trying to learn how to use the object detection models in the Tensorflow Object Detection API but so far all the .ipynb files from Github that I have found so far are outdated (Using Tensorflow 1.x instead of 2.x) or plagued with lots bugs. Any help will be appreciated.

Thank you,

submitted by /u/Puzzleheaded-Beat-42
[visit reddit] [comments]

Categories
Misc

Testing Container Images Against Multiple Platforms with Container Canary

This post details how to use Container Canary from installation and validation to writing custom manifests and container automation.

Bring-your-own-container models are widely supported on today’s modern compute platforms. In other words, you can provide your own container images within your custom software environment.

However, user-provided containers must satisfy each platform’s unique requirements, which can vary from platform to platform. For example, you may need to:

  • Use a specific non-root user.
  • Place the home directory in a certain location.
  • Install dependency packages.
  • Run web applications on designated ports.

Keeping your container images conformant with these arbitrary requirements can be challenging. As a result, we are eager to introduce a new open-source tool called Container Canary to capture these requirements and automatically test against them. Container Canary provides a specification for recording these requirements as a manifest that can be checked into version control. You can then use the canary CLI tool to validate containers against that manifest.

This is useful in test and continuous integration (CI) environments to avoid regressions in containers while allowing container developers to move quickly.

$ canary validate --file somespec.yaml foo/bar:latest
Validating foo/bar: latest against somespec
 📦 Required packages are installed              	[passed]
 🤖 Expected services are running                	[passed]
 🎉 Your container is awesome                    	[passed]
validation passed

Installing Container Canary

Container Canary is written in Golang and distributed as static binaries, making it portable and easy to install in CI environments.

To install it, go to the releases page and download the appropriate distribution for your system. For example, Linux users with x86_64 processors would use the canary_linux_amd64 binary. Be sure to replace VERSION in the following commands with the version to install.

$ curl -L https://github.com/NVIDIA/container-canary/releases/download/VERSION/canary_linux_amd64 > canary_linux_amd64

Container Canary also provides sha256 sums to verify the binaries.

$ curl -L https://github.com/NVIDIA/container-canary/releases/download/VERSION/canary_linux_amd64.sha256sum > canary_linux_amd64.sha256sum
 
$ sha256sum --check --status canary_linux_amd64.sha256sum

Now, you can put the binary somewhere on your path.

$ chmod +x canary_linux_amd64
 
$ mv canary_linux_amd64 /usr/local/bin/canary

Finally, validate that it works.

$ canary version
Container Canary
 Version:         VERSION
 ...

Validating containers with a Kubeflow example

With Container Canary installed, you can begin validating containers. The /examples/ GitHub directory contains some manifests for popular container platforms, including the Kubeflow example. You can use these manifests to get started right away.

Kubeflow is a popular platform for designing, training, and inferencing machine learning models. The Kubeflow Notebooks service enables you to launch web-based development environments inside Kubeflow. While it does have default containers maintained by the Kubeflow community for running tools like JupyterLab, RStudio, and Visual Studio Code (code-server), you can also choose your own container images with your own software environment.

The list of requirements specifies what your custom container must meet to run correctly on Kubeflow Notebooks. That list looks like the following example:

For Kubeflow Notebooks to work with a container image, the image must:

  • expose an HTTP interface on port 8888:
    • kubeflow sets an environment variable NB_PREFIX at runtime with the URL path that we expect the container be listening under
    • kubeflow uses IFrames, so ensure your application sets Access-Control-Allow-Origin: * in HTTP response headers
  • run as a user called jovyan:
    • the home directory of jovyan should be /home/jovyan
    • the UID of jovyan should be 1000
  • start successfully with an empty PVC mounted at /home/jovyan:
    • kubeflow mounts a PVC at /home/jovyan to keep state across Pod restarts

With Container Canary, we have written these requirements out in our example manifest. If you have ever written a Kubernetes pod manifest, this syntax should look familiar to you. You can see that each requirement has been written out in the form of a probe that Container Canary runs against your container to check that the requirement is met.

The process looks like the following example:

apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: kubeflow
description: Kubeflow notebooks
documentation: https://www.kubeflow.org/docs/components/notebooks/container-images/#custom-images
env:
 - name: NB_PREFIX
   value: /hub/jovyan/
ports:
 - port: 8888
   protocol: TCP
volumes:
 - mountPath: /home/jovyan
checks:
 - name: user
   description: 👩 User is jovyan
   probe:
     exec:
       command:
         - /bin/sh
         - -c
         - "[ $(whoami) = jovyan ]"
 - name: uid
   description: 🆔 User ID is 1000
   probe:
     exec:
       command:
         - /bin/sh
         - -c
         - "id | grep uid=1000"
 - name: home
   description: 🏠 Home directory is /home/jovyan
   probe:
     exec:
       command:
         - /bin/sh
         - -c
         - "[ $HOME = /home/jovyan ]"
 - name: http
   description: 🌏 Exposes an HTTP interface on port 8888
   probe:
     httpGet:
       path: /
       port: 8888
     initialDelaySeconds: 10
 - name: NB_PREFIX
   description: 🧭 Correctly routes the NB_PREFIX
   probe:
     httpGet:
       path: /hub/jovyan/lab
       port: 8888
     initialDelaySeconds: 10
 - name: allow-origin-all
   description: "🔓 Sets 'Access-Control-Allow-Origin: *' header"
   probe:
     httpGet:
       path: /
       port: 8888
       responseHttpHeaders:
         - name: Access-Control-Allow-Origin
           value: "*"
     initialDelaySeconds: 10

Now that there is a manifest, I can test a container against it. First, I chose a public image that I knew would not pass the requirements, such as the popular web server NGINX.

$ canary validate --file https://github.com/NVIDIA/container-canary/raw/main/examples/kubeflow.yaml nginx:latest   
Cannot find nginx:latest, pulling…
Validating nginx:latest against kubeflow
 🏠 Home directory is /home/jovyan               	[failed]
 👩 User is jovyan                               	[failed]
 🆔 User ID is 1000                              	[failed]
 🌏 Exposes an HTTP interface on port 8888       	[failed]
 🔓 Sets 'Access-Control-Allow-Origin: *' header 	[failed]
 🧭 Correctly routes the NB_PREFIX               	[failed]
validation failed

Unsurprisingly, this image fails validation.

Next, I tried one of the official Kubeflow images that have been designed to run on Kubeflow Notebooks.

$ canary validate --file https://github.com/NVIDIA/container-canary/raw/main/examples/kubeflow.yaml public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-pytorch-cuda:v1.5.0       	 
Cannot find public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-pytorch-cuda:v1.5.0, pulling…
Validating public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-pytorch-cuda:v1.5.0 against kubeflow
 🏠 Home directory is /home/jovyan               	[passed]
 👩 User is jovyan                               	[passed]
 🆔 User ID is 1000                              	[passed]
 🔓 Sets 'Access-Control-Allow-Origin: *' header 	[passed]
 🧭 Correctly routes the NB_PREFIX               	[passed]
 🌏 Exposes an HTTP interface on port 8888       	[passed]
validation passed

Success! This image passes validation. 

If you are building images for use on Kubeflow, you can validate them in the same way and be confident that changes you make will not cause issues when other users come to run them.

Writing your own validation manifest

You can also write your own manifests to validate containers. Container Canary can help you ensure that your container manifests will run in your own deployments and in third-party platforms. It also assists you with running unit tests on container builds.

Each manifest is a YAML file that begins with some metadata.

# Manifest versioning
apiVersion: container-canary.nvidia.com/v1
kind: Validator
 
# Metadata
name: foo  # The name of the platform that this manifest validates for
description: Foo runs containers for you  # Optional, A description of that platform
documentation: https://example.com  # Optional, A link to the documentation that defines the container requirements in prose

Next, you can configure some runtime options for the container. These are used when Container Canary starts the image to validate and should imitate the options set on your target platform. These include environment variables, ports to expose, and volumes to attach.

env:
  - name: NB_PREFIX
	value: /hub/jovyan/
ports:
  - port: 8888
    protocol: TCP
volumes:
  - mountPath: /home/jovyan

Then, you specify your checks. Checks are the tests to be run against the container to ensure it is compliant. Every check contains a probe that interacts with the container. These interactions include running commands, making HTTP requests, and pinging TCP sockets.

The probes in Container Canary are a superset of those in Kubernetes, so if you have used those before, they should be familiar.

checks:
  - name: mycheck  # Name of the check
    description: Ensuring a thing  # Description of what is being checked (will be used in output)
    probe:
      ...  # A probe to run

An exec check runs a command inside the running container. If the command exits with 0, the check passes.

checks:
  - name: uid
    description: User ID is 1234
    probe:
      exec:
        command:
          - /bin/sh
          - -c
          - "id | grep uid=1234"

An HTTP Get check performs an HTTP GET request against your container. If the response code is

checks:
  - name: http
    description: Exposes an HTTP interface on port 80
    probe:
      httpGet:
        path: /
        port: 80
        httpHeaders:  # Optional, headers to set in the request
          - name: Foo-Header
            value: "myheader"
        responseHttpHeaders:  # Optional, headers that you expect to see in the response
          - name: Access-Control-Allow-Origin
            value: "*"

For more information, see the Validator API reference.

After you’ve written your manifest, you can use canary to test containers with it.

$ canary validate --file examples/awesome.yaml your/container:latest
Validating your/container:latest against awesome
 📦 Required packages are installed                  [passed]
 🤖 Expected services are running                    [passed]
 🎉 Your container is awesome                        [passed]
validation passed

Example of automating Container Canary with GitHub Actions

Now that I’ve covered installing Container Canary, validating containers, and writing your own manifests, here’s a quick CI example.

Suppose that you want to build a container that should run a web application on a specific port and also has Python installed. In a new repository, you can create a small Python web application called app.py using fastapi.

from fastapi import FastAPI
import uvicorn
 
app = FastAPI()
 
 
@app.get("/")
def read_root():
	return {"Hello": "World"}
 
 
@app.get("/foo")
def foo():
	return {"foo": "bar"}
 
if __name__ == "__main__":
	uvicorn.run(app, host="0.0.0.0", port=5000, log_level="info")

Then you can create a Dockerfile to package the application into a container.

FROM python
 
COPY app.py /app.py
 
RUN pip install fastapi uvicorn[standard]
 
EXPOSE 5000
 
CMD python /app.py

Now, write a Container Canary Validator manifest that tests the container image to ensure that it runs a web server on port 5000 and has Python installed. Call it canary-validator.yaml.

apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: example
description: Container Canary CI Example
env: []
ports:
 - port: 5000
   protocol: TCP
volumes: []
checks:
 - name: http
   description: Exposes an HTTP interface on port 5000
   probe:
     httpGet:
       path: /foo
       port: 5000
     failureThreshold: 30
 - name: python
   description: Has Python installed
   probe:
     exec:
       command:
         - /bin/sh
         - -c
         - "which python"

Finally, create a GitHub Actions config to run this in CI. We chose GitHub Actions for this example because it is popular, free, and easily available, but this configuration should translate for all CI systems.

Create a file called .github/workflows/ci.yaml.

name: ci
 
on:
 push:
 pull_request:
 
jobs:
 canary:
   runs-on: ubuntu-latest
   steps:
     - name: Checkout
       uses: actions/checkout@v2
 
     - name: Install Container Canary
       run: |
         curl -L https://github.com/NVIDIA/container-canary/releases/download/v0.2.0/canary_linux_amd64 > /usr/local/bin/canary
         chmod +x /usr/local/bin/canary
 
     - name: Build Container
       run: docker build -t foo/canary-ci-example:latest .
 
     - name: Validate container
       run: canary validate --file canary-validator.yaml foo/canary-ci-example:latest

Now when you push your code to GitHub, the Actions runner checks out the code, installs Container Canary, builds the container image, and validates it with canary validate.

Screenshot of Canary validation running successfully in a GitHub Actions workflow.
Figure 1. Canary validation running successfully in a GitHub Actions workflow

The workflow has been executed and our container image has successfully been validated–and fast! For more information, see all the code for this example in the /jacobtomlinson/canary-ci-example GitHub repo.

Apply what you learned

With Container Canary, you can define concrete interfaces for your container images and validate them to ensure that the images you build always meet a defined specification.

If you are regularly building container images, Container Canary is a must-have in your testing toolkit due to its usefulness in test and CI environments. Container developers can successfully avoid regressions in containers, and move quicker through their projects to save time.

Categories
Misc

Creator Karen X. Cheng Brings Keen AI for Design ‘In the NVIDIA Studio’

The future of content creation is in AI. This week In the NVIDIA Studio, discover how AI-assisted painting is bringing a new level of inspiration to the next generation of artists.

The post Creator Karen X. Cheng Brings Keen AI for Design ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Categories
Misc

New lightweight library for head-gesture detection

The Nodding Pigeon library provides a pre-trained model and a simple inference API for detecting head gestures in short videos. Under the hood, it uses Google MediaPipe for collecting the landmark features.

For ML practitioners, this project is also an example of using generative data from a small base-dataset for model training.

Please take a look! 🙂

https://github.com/bhky/nodding-pigeon

submitted by /u/xtorch501
[visit reddit] [comments]