NVIDIA cuTENSOR, version 1.4, library supports 64-dimensional tensors, distributed multi-GPU tensor operations, and improves tensor contraction performance models.
Today, NVIDIA is announcing the availability of cuTENSOR, version 1.4, which supports up to 64-dimensional tensors, distributed multi-GPU tensor operations, and helps improve tensor contraction performance models. This software can be downloaded now free of charge.
- Supports up to 64-dimensional tensors.
- Supports distributed, multi-GPU tensor operations.
- Improved tensor contraction performance model (i.e.,
- Improved performance for tensor contraction that have an overall large contracted dimension (i.e., a parallel reduction was added).
- Improved performance for tensor contraction that have a tiny contracted dimension (
- Improved performance for outer-product-like tensor contractions (e.g.,
C[a,b,c,d] = A[b,d] * B[a,c]).
- Additional bug fixes.
For more information, see the cuTENSOR Release Notes.
cuTENSOR is a high-performance CUDA library for tensor primitives; its key features include:
- Extensive mixed-precision support:
- Complex-times-real operations.
- Conjugate (without transpose) support.
- Support for up to 64-dimensional tensors.
- Supports arbitrary data layouts.
- Supports trivially serializable data structures.
- Enhancements to main computational routines:
- On Math Libraries, see Recent Developments in NVIDIA Math Libraries (GTC #S31754).
- For the latest on HPC software, see A Deep Dive into the latest HPC software (GTC #S31286).
- Catch-up on Tensor Core-Accelerated Math Libraries for Dense and Sparse Linear Algebra in AI and HPC (GTC #CWES1098).
- Read technical details in our cuTENSOR Product Documentation.
Recent Developer posts
- On Fortran enhancements to support Tensor Cores, read Bringing Tensor Cores to Standard Fortran.
- Benefit from A100 acceleration and read Getting Immediate Speedups with NVIDIA A100 TF32.
- To gain AI training benefits, see Accelerating AI Training with NVIDIA TF32 Tensor Cores.