Posted by Jason Wei and Denny Zhou, Research Scientists, Google Research, Brain team
In recent years, scaling up the size of language models has been shown to be a reliable way to improve performance on a range of natural language processing (NLP) tasks. Today’s language models at the scale of 100B or more parameters achieve strong performance on tasks like sentiment analysis and machine translation, even with little or no training examples. Even the largest language models, however, can still struggle with certain multi-step reasoning tasks, such as math word problems and commonsense reasoning. How might we enable language models to perform such reasoning tasks?
In “Chain of Thought Prompting Elicits Reasoning in Large Language Models,” we explore a prompting method for improving the reasoning abilities of language models. Called chain of thought prompting, this method enables models to decompose multi-step problems into intermediate steps. With chain of thought prompting, language models of sufficient scale (~100B parameters) can solve complex reasoning problems that are not solvable with standard prompting methods.
Comparison to Standard Prompting With standard prompting (popularized by GPT-3) the model is given examples of input–output pairs (formatted as questions and answers) before being asked to predict the answer for a test-time example (shown below on the left). In chain of thought prompting (below, right), the model is prompted to produce intermediate reasoning steps before giving the final answer to a multi-step problem. The idea is that a model-generated chain of thought would mimic an intuitive thought process when working through a multi-step reasoning problem. While producing a thought process has been previously accomplished via fine-tuning, we show that such thought processes can be elicited by including a few examples of chain of thought via prompting only, which does not require a large training dataset or modifying the language model’s weights.
Whereas standard prompting asks the model to directly give the answer to a multi-step reasoning problem, chain of thought prompting induces the model to decompose the problem into intermediate reasoning steps, in this case leading to a correct final answer.
Chain of thought reasoning allows models to decompose complex problems into intermediate steps that are solved individually. Moreover, the language-based nature of chain of thought makes it applicable to any task that a person could solve via language. We find through empirical experiments that chain of thought prompting can improve performance on various reasoning tasks, and that successful chain of thought reasoning is an emergent property of model scale — that is, the benefits of chain of thought prompting only materialize with a sufficient number of model parameters (around 100B).
Arithmetic Reasoning One class of tasks where language models typically struggle is arithmetic reasoning (i.e., solving math word problems). Two benchmarks in arithmetic reasoning are MultiArith and GSM8K, which test the ability of language models to solve multi-step math problems similar to the one shown in the figure above. We evaluate both the LaMDA collection of language models ranging from 422M to 137B parameters, as well as the PaLM collection of language models ranging from 8B to 540B parameters. We manually compose chains of thought to include in the examples for chain of thought prompting.
For these two benchmarks, using standard prompting leads to relatively flat scaling curves: increasing the scale of the model does not substantially improve performance (shown below). However, we find that when using chain of thought prompting, increasing model scale leads to improved performance that substantially outperforms standard prompting for large model sizes.
Employing chain of thought prompting enables language models to solve arithmetic reasoning problems for which standard prompting has a mostly flat scaling curve.
On the GSM8K dataset of math word problems, PaLM shows remarkable performance when scaled to 540B parameters. As shown in the table below, combining chain of thought prompting with the 540B parameter PaLM model leads to new state-of-the-art performance of 58%, surpassing the prior state of the art of 55% achieved by fine-tuning GPT-3 175B on a large training set and then ranking potential solutions via a specially trained verifier. Moreover, follow-up work on self-consistency shows that the performance of chain of thought prompting can be improved further by taking the majority vote of a broad set of generated reasoning processes, which results in 74% accuracy on GSM8K.
Chain of thought prompting with PaLM achieves a new state of the art on the GSM8K benchmark of math word problems. For a fair comparison against fine-tuned GPT-3 baselines, the chain of thought prompting results shown here also use an external calculator to compute basic arithmetic functions (i.e., addition, subtraction, multiplication and division).
Commonsense Reasoning In addition to arithmetic reasoning, we consider whether the language-based nature of chain of thought prompting also makes it applicable to commonsense reasoning, which involves reasoning about physical and human interactions under the presumption of general background knowledge. For these evaluations, we use the CommonsenseQA and StrategyQA benchmarks, as well as two domain-specific tasks from BIG-Bench collaboration regarding date understanding and sports understanding. Example questions are below:
As shown below, for CommonsenseQA, StrategyQA, and Date Understanding, performance improved with model scale, and employing chain of thought prompting led to additional small improvements. Chain of thought prompting had the biggest improvement on sports understanding, for which PaLM 540B’s chain of thought performance surpassed that of an unaided sports enthusiast (95% vs 84%).
Chain of thought prompting also improves performance on various types of commonsense reasoning tasks.
Conclusions Chain of thought prompting is a simple and broadly applicable method for improving the ability of language models to perform various reasoning tasks. Through experiments on arithmetic and commonsense reasoning, we find that chain of thought prompting is an emergent property of model scale. Broadening the range of reasoning tasks that language models can perform will hopefully inspire further work on language-based approaches to reasoning.
Acknowledgements It was an honor and privilege to work with Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Quoc Le on this project.
Hi there, I’m trying to learn how to use the object detection models in the Tensorflow Object Detection API but so far all the .ipynb files from Github that I have found so far are outdated (Using Tensorflow 1.x instead of 2.x) or plagued with lots bugs. Any help will be appreciated.
This post details how to use Container Canary from installation and validation to writing custom manifests and container automation.
Bring-your-own-container models are widely supported on today’s modern compute platforms. In other words, you can provide your own container images within your custom software environment.
However, user-provided containers must satisfy each platform’s unique requirements, which can vary from platform to platform. For example, you may need to:
Use a specific non-root user.
Place the home directory in a certain location.
Install dependency packages.
Run web applications on designated ports.
Keeping your container images conformant with these arbitrary requirements can be challenging. As a result, we are eager to introduce a new open-source tool called Container Canary to capture these requirements and automatically test against them. Container Canary provides a specification for recording these requirements as a manifest that can be checked into version control. You can then use the canary CLI tool to validate containers against that manifest.
This is useful in test and continuous integration (CI) environments to avoid regressions in containers while allowing container developers to move quickly.
$ canary validate --file somespec.yaml foo/bar:latest
Validating foo/bar: latest against somespec
Required packages are installed [passed]
Expected services are running [passed]
Your container is awesome [passed]
validation passed
Installing Container Canary
Container Canary is written in Golang and distributed as static binaries, making it portable and easy to install in CI environments.
To install it, go to the releases page and download the appropriate distribution for your system. For example, Linux users with x86_64 processors would use the canary_linux_amd64 binary. Be sure to replace VERSION in the following commands with the version to install.
$ canary version
Container Canary
Version: VERSION
...
Validating containers with a Kubeflow example
With Container Canary installed, you can begin validating containers. The /examples/ GitHub directory contains some manifests for popular container platforms, including the Kubeflow example. You can use these manifests to get started right away.
Kubeflow is a popular platform for designing, training, and inferencing machine learning models. The Kubeflow Notebooks service enables you to launch web-based development environments inside Kubeflow. While it does have default containers maintained by the Kubeflow community for running tools like JupyterLab, RStudio, and Visual Studio Code (code-server), you can also choose your own container images with your own software environment.
The list of requirements specifies what your custom container must meet to run correctly on Kubeflow Notebooks. That list looks like the following example:
For Kubeflow Notebooks to work with a container image, the image must:
expose an HTTP interface on port 8888:
kubeflow sets an environment variable NB_PREFIX at runtime with the URL path that we expect the container be listening under
kubeflow uses IFrames, so ensure your application sets Access-Control-Allow-Origin: * in HTTP response headers
run as a user called jovyan:
the home directory of jovyan should be /home/jovyan
the UID of jovyan should be 1000
start successfully with an empty PVC mounted at /home/jovyan:
kubeflow mounts a PVC at /home/jovyan to keep state across Pod restarts
With Container Canary, we have written these requirements out in our example manifest. If you have ever written a Kubernetes pod manifest, this syntax should look familiar to you. You can see that each requirement has been written out in the form of a probe that Container Canary runs against your container to check that the requirement is met.
Now that there is a manifest, I can test a container against it. First, I chose a public image that I knew would not pass the requirements, such as the popular web server NGINX.
$ canary validate --file https://github.com/NVIDIA/container-canary/raw/main/examples/kubeflow.yaml nginx:latest
Cannot find nginx:latest, pulling…
Validating nginx:latest against kubeflow
Home directory is /home/jovyan [failed]
User is jovyan [failed]
User ID is 1000 [failed]
Exposes an HTTP interface on port 8888 [failed]
Sets 'Access-Control-Allow-Origin: *' header [failed]
Correctly routes the NB_PREFIX [failed]
validation failed
Unsurprisingly, this image fails validation.
Next, I tried one of the official Kubeflow images that have been designed to run on Kubeflow Notebooks.
$ canary validate --file https://github.com/NVIDIA/container-canary/raw/main/examples/kubeflow.yaml public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-pytorch-cuda:v1.5.0
Cannot find public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-pytorch-cuda:v1.5.0, pulling…
Validating public.ecr.aws/j1r0q0g6/notebooks/notebook-servers/jupyter-pytorch-cuda:v1.5.0 against kubeflow
Home directory is /home/jovyan [passed]
User is jovyan [passed]
User ID is 1000 [passed]
Sets 'Access-Control-Allow-Origin: *' header [passed]
Correctly routes the NB_PREFIX [passed]
Exposes an HTTP interface on port 8888 [passed]
validation passed
Success! This image passes validation.
If you are building images for use on Kubeflow, you can validate them in the same way and be confident that changes you make will not cause issues when other users come to run them.
Writing your own validation manifest
You can also write your own manifests to validate containers. Container Canary can help you ensure that your container manifests will run in your own deployments and in third-party platforms. It also assists you with running unit tests on container builds.
Each manifest is a YAML file that begins with some metadata.
# Manifest versioning
apiVersion: container-canary.nvidia.com/v1
kind: Validator
# Metadata
name: foo # The name of the platform that this manifest validates for
description: Foo runs containers for you # Optional, A description of that platform
documentation: https://example.com # Optional, A link to the documentation that defines the container requirements in prose
Next, you can configure some runtime options for the container. These are used when Container Canary starts the image to validate and should imitate the options set on your target platform. These include environment variables, ports to expose, and volumes to attach.
Then, you specify your checks. Checks are the tests to be run against the container to ensure it is compliant. Every check contains a probe that interacts with the container. These interactions include running commands, making HTTP requests, and pinging TCP sockets.
The probes in Container Canary are a superset of those in Kubernetes, so if you have used those before, they should be familiar.
checks:
- name: mycheck # Name of the check
description: Ensuring a thing # Description of what is being checked (will be used in output)
probe:
... # A probe to run
An exec check runs a command inside the running container. If the command exits with 0, the check passes.
checks:
- name: uid
description: User ID is 1234
probe:
exec:
command:
- /bin/sh
- -c
- "id | grep uid=1234"
An HTTP Get check performs an HTTP GET request against your container. If the response code is
checks:
- name: http
description: Exposes an HTTP interface on port 80
probe:
httpGet:
path: /
port: 80
httpHeaders: # Optional, headers to set in the request
- name: Foo-Header
value: "myheader"
responseHttpHeaders: # Optional, headers that you expect to see in the response
- name: Access-Control-Allow-Origin
value: "*"
After you’ve written your manifest, you can use canary to test containers with it.
$ canary validate --file examples/awesome.yaml your/container:latest
Validating your/container:latest against awesome
Required packages are installed [passed]
Expected services are running [passed]
Your container is awesome [passed]
validation passed
Example of automating Container Canary with GitHub Actions
Now that I’ve covered installing Container Canary, validating containers, and writing your own manifests, here’s a quick CI example.
Suppose that you want to build a container that should run a web application on a specific port and also has Python installed. In a new repository, you can create a small Python web application called app.py using fastapi.
Then you can create a Dockerfile to package the application into a container.
FROM python
COPY app.py /app.py
RUN pip install fastapi uvicorn[standard]
EXPOSE 5000
CMD python /app.py
Now, write a Container Canary Validator manifest that tests the container image to ensure that it runs a web server on port 5000 and has Python installed. Call it canary-validator.yaml.
apiVersion: container-canary.nvidia.com/v1
kind: Validator
name: example
description: Container Canary CI Example
env: []
ports:
- port: 5000
protocol: TCP
volumes: []
checks:
- name: http
description: Exposes an HTTP interface on port 5000
probe:
httpGet:
path: /foo
port: 5000
failureThreshold: 30
- name: python
description: Has Python installed
probe:
exec:
command:
- /bin/sh
- -c
- "which python"
Finally, create a GitHub Actions config to run this in CI. We chose GitHub Actions for this example because it is popular, free, and easily available, but this configuration should translate for all CI systems.
Now when you push your code to GitHub, the Actions runner checks out the code, installs Container Canary, builds the container image, and validates it with canary validate.
The workflow has been executed and our container image has successfully been validated–and fast! For more information, see all the code for this example in the /jacobtomlinson/canary-ci-example GitHub repo.
Apply what you learned
With Container Canary, you can define concrete interfaces for your container images and validate them to ensure that the images you build always meet a defined specification.
If you are regularly building container images, Container Canary is a must-have in your testing toolkit due to its usefulness in test and CI environments. Container developers can successfully avoid regressions in containers, and move quicker through their projects to save time.
The future of content creation is in AI. This week In the NVIDIA Studio, discover how AI-assisted painting is bringing a new level of inspiration to the next generation of artists.
The Nodding Pigeon library provides a pre-trained model and a simple inference API for detecting head gestures in short videos. Under the hood, it uses Google MediaPipe for collecting the landmark features.
For ML practitioners, this project is also an example of using generative data from a small base-dataset for model training.
How do you train a tensorflow model on a set of “background” or “normal” images in order to detect objects that are not “normal”?
I’m using tensorflow to detect birds at a bird feeder via images from a wyze security camera. I have had it working okay on and off over the last several weeks (3:1 false positives to actual positives is about as good as it’s gotten…), but lately it’s been really struggling, especially with wind moving the birdfeeders around (today I had almost 30,000 images, meaning it only dropped about half of all possible images run through the model…). This got me thinking that, from my point of view, it would be easiest to have a model that I continually teach what NOT to call an object and just retrain whenever I find the false positives to be a problem. THEN I can start training it to differentiate between the things that are “not-background”… I’m sure this isn’t a novel idea, so how is this done? I’ve found plenty of tutorials and articles about the opposite (training for recognition of specific objects) but can’t find anything helpful in taking this approach, at least nothing related to how to actually do this…
Where I’m coming from: I’ve some very basic ML exposure with Andrew Ng’s Machine Learning course. I think I’d like to prepare for Google’s TF certificate exam and take the exam by the end of this summer (end of August).
What I’m thinking of doing: I was thinking of implementing all assignments from Ng’s ML course in Python first, then do his Deep Learning specialization and then do Laurence Moroney’s course specifically designed for the exam. (This would probably take ~2 months if I give in 4-5 hrs per day, so I would still have ~1 month to do whatever you guys recommend.)
My questions:
Is Ng’s ML course + deep learning specialization enough to start Moroney’s course?
Should I do anything else before taking the exam? (Kaggle competitions, projects, etc.?)
Migrating from Onyx to NVIDIA Cumulus Linux optimizes operational efficiency, enabling a DevOps approach to data center operations.
Data center organizations are looking for more efficient, modern network architectures that can be managed, monitored, and deployed in a scalable manner. Emerging DevOps and NetDevOps operational models are bringing the agile development models of continuous integration and continuous development (CI/CD) to data center infrastructure.
Why Cumulus Linux?
The Cumulus Linux operating system was built from the ground up to optimize operational efficiency, enabling a DevOps approach to data center operations.
This DevOps-centric approach means that the complete data center network can be simulated in a digital twin hosted on the NVIDIA Air platform. Using a digital twin for validation and automation improves security, reliability, and productivity.
Migrate from Onyx to Cumulus Linux
NVIDIA recommends migrating to the latest version of Cumulus Linux (that is 5.x as of April 2022).
Before starting the Onyx to Cumulus Linux migration, make sure that you have a valid support contract with NVIDIA.
First, back up the Onyx configuration file. Run the following command on every Onyx switch and copy its output to a local file:
Create a Cumulus Linux configuration with NVUE (NVIDIA User Experience) per switch. Before creating a Cumulus configuration, confirm that you have received a valid license to the Cumulus Linux switches.
(Optional) Validate the configuration using a data center digital twin
To ensure configuration integrity, build a data center simulation on NVIDIA Air.
Log in with a business email address.
Choose BUILD A SIMULATION, Build Your Own, and Create your own.
Add Cumulus switches per the number of production switches and connect them accordingly.
Add servers as needed to enable end-to-end testing.
Choose START SIMULATION.
Log in to each switch by clicking it and applying the NVUE configuration created.
Configure the servers with the corresponding interfaces on the production network.
Conduct end-to-end testing.
When testing is complete, apply the configuration to production and repeat testing.
Summary
To maximize the Cumulus Linux operational efficiency features, organizations use NVIDIA Air and integrate it into their CI/CD workflow. Having a data center digital twin helps to eliminate production risks, perform end-to-end testing in a risk-free environment and deploy with confidence.
Using NVIDIA Air should be sufficient in helping you with testing and validating the migration. However, we strongly recommend that you work with an NVIDIA solution architect to validate the migration code integrity and ensure a non-eventful migration.
For more information, see the following resources: