Thrust 1.12.0 and CUB 1.12.0 are distributed with the NVIDIA HPC SDK 21.3 and the CUDA Toolkit 11.4.
Thrust 1.12.0 is a major release providing bug fixes and performance enhancements. It includes a new thrust::universal_vector which holds data that is accessible from both host and device. This enables the use of CUDA unified memory with Thrust. Also added are new asynchronous versions of thrust::async:exclusive_scan and inclusive_scan algorithms. The synchronous versions of these have been updated to use cub::DeviceScan directly. This release deprecates support for Clang
CUB 1.12.0 is a major release providing bug fixes and performance enhancements. It includes improved Radix sort stability. Please see the CUB 1.12 Release Notes for more information.
Both packages are available today from GitHub. They are also distributed with the NVIDIA HPC SDK 21.3 and the CUDA Toolkit 11.4.
About Thrust and CUB
Thrust is a modern C++ parallel algorithms library which provides a std::-like interface. Thrust abstractions are agnostic of any particular parallel programming model or hardware. With Thrust, you can write code once and run it in parallel on either your CPU or GPU. CUB is a C++ library of collective primitives and utilities for parallel algorithm authors. CUB is specific to CUDA C++ and its interfaces explicitly accommodate CUDA-specific features.
Thrust and CUB are complementary and are often used together.