Categories
Misc

CUDA 11.6 Toolkit New Release Revealed

New CUDA 11.6 Toolkit is focused on enhancing the programming model and performance of your CUDA applications.

NVIDIA announces the newest release of the CUDA development environment, CUDA 11.6. This release is focused on enhancing the programming model and performance of your CUDA applications. CUDA continues to push the boundaries of GPU acceleration and lay the foundation for new applications in HPC, visualization, AI, ML and DL, and data science.

CUDA 11.6 has several important features. This post offers an overview of the key capabilities:

  • GSP driver architecture now default on Turing and Ampere GPUs
  • New API to allow disabling nodes in instantiated graph
  • Full support of 128-bit integer type
  • Cooperative groups namespace update
  • CUDA compiler update
  • Nsight Compute 2022.1 release

CUDA 11.6 ships with the R510 driver, an update branch. CUDA 11.6 Toolkit is available to download.

GSP driver architecture

The GSP driver architecture is now the default driver mode for all listed Turing and Ampere GPUs. The older driver architecture is supported as a fallback. For more information, see R510 Driver Readme.

Instantiated Graph Node API additions

We added a new API, cudaGraphNodeSetEnabled, to allow disabling nodes in an instantiated graph. Support is limited to kernel nodes in this release.  A corresponding API, cudaGraphNodeGetEnabled, allows querying the enabled state of a node. We’ve also added the ability to disable NULL kernel graph node launches.

128-bit integer support

CUDA 11.6 includes the full release of 128-bit integer (__int128) data type, including compiler and developer tools support. The host-side compiler must support the __int128 type to use this feature.

Cooperative groups namespace

The cooperative groups namespace has been updated with new functions to improve consistency in naming, function scope, and unit dimension and size.

Implicit Group/Member Threads Blocks
thread_block:: dim_threads
num_threads
thread_rank
thread_index
(Not needed)
grid_group:: num_threads
thread_rank
dim_blocks
num_blocks
block_rank
block_index
Table 1. New functions in cooperative groups namespace

CUDA compiler

  • Added -arch=native compilation option to target installed GPUs during compilation. This extends the existing -gencode=arch=compute_xx,code=sm_xx architecture specification
  • Add the ability to create PTX files from nvlink

Deprecated features

  • The cudaDeviceSynchronize() used for on-device fork and join parallelism is deprecated in preparation for a replacement programming model with higher performance. These functions continue to work in this release, but the tools emit a warning about the upcoming change.
  • CentOS Linux 8 has reached End-of-Life on Dec 31, 2021, and support for this OS is now deprecated in the CUDA Toolkit. CentOS Linux 8 support will be completely removed in a future release. 

Additional resources

Leave a Reply

Your email address will not be published. Required fields are marked *