Accelerating ReLu and GeLu Activation Functions, and Batched Sparse GEMM in cuSPARSELt v0.2.0

NVIDIA cuSPARSELt v0.2 now supports ReLu and GeLu activation functions, bias vector, and batched Sparse GEMM.

Today, NVIDIA is announcing the availability of cuSPARSELt, version 0.2.0, which increases performance on activation functions, bias vectors, and Batched Sparse GEMM. This software can be downloaded now free of charge.

Download the cuSPARSELt software.

What’s New?

Support for activation functions and bias vector:
- ReLU + upper bound and threshold setting for all kernels.
- GeLU for INT8 I/O, INT32 Tensor Core compute kernels.
Support for Batched Sparse GEMM:
- Single sparse matrix / Multiple dense matrices (Broadcast).
- Multiple sparse and dense matrices.
- Batched bias vector.
Compatibility notes:
- cuSPARSELt does not require the nvrtc library anymore.
- Support for Ubuntu 16.04 (gcc-5) is now deprecated and it will be removed in future releases.

For more technical information, see the cuSPARSELt Release Notes.

About cuSPARSELt

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

$D=alpha op(A)*op(B)+beta op(C)$

In this equation, $op(A)$ and $op(B)$ refer to in-place operations such as transpose and nontranspose.

The cuSPARSELt APIs provide flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Key Features

NVIDIA Sparse MMA Tensor Core support.
Mixed-precision computation support:
- FP16 I/O, FP32 Tensor Core accumulate.
- BFLOAT16 I/O, FP32 Tensor Core accumulate.
- INT8 I/O, INT32 Tensor Core compute.
- FP32 I/O, TF32 Tensor Core compute.
- TF32 I/O, TF32 Tensor Core compute.

Matrix pruning and compression functionalities.
Auto-tuning functionality (see cusparseLtMatmulSearch()).

Learn more

For more about Math Libraries, see Recent Developments in NVIDIA Math Libraries (GTC 2021 #S31754).
To get the latest on HPC software, see A Deep Dive into the latest HPC software (GTC 2021 #S31286).
Catch up on Tensor Core-Accelerated Math Libraries for Dense and Sparse Linear Algebra in AI and HPC (GTC 2021 #CWES1098).
Read technical details in our cuSPARSELt Product Documentation.

Recent Developer posts

For advanced matrix multiply techniques, read Accelerating Matrix Multiplication with Block Sparse Format and NVIDIA Tensor Cores.
To leverage NVIDIA Ampere architecture performance, read Exploiting NVIDIA Ampere Structured Sparsity with cuSPARSELt.
To benefit from A100 acceleration, read Getting Immediate Speedups with NVIDIA A100 TF32.
To gain AI training benefits, see Accelerating AI Training with NVIDIA TF32 Tensor Cores.

What’s New?

About cuSPARSELt

Key Features

Learn more

Recent Developer posts

Leave a Reply Cancel reply