CUTLASS 3.8 extends support to NVIDIA Blackwell SM100 architecture with 99% peak performance for Tensor Core operations, bringing essential features like Mixed…
CUTLASS 3.8 extends support to NVIDIA Blackwell SM100 architecture with 99% peak performance for Tensor Core operations, bringing essential features like Mixed Input GEMMs for efficient model quantization and Grouped GEMM capabilities that accelerate MoE models through parallel expert computation.