Categories
Misc

Just Released: CUTLASS 3.8

CUTLASS 3.8 extends support to NVIDIA Blackwell SM100 architecture with 99% peak performance for Tensor Core operations, bringing essential features like Mixed…

CUTLASS 3.8 extends support to NVIDIA Blackwell SM100 architecture with 99% peak performance for Tensor Core operations, bringing essential features like Mixed Input GEMMs for efficient model quantization and Grouped GEMM capabilities that accelerate MoE models through parallel expert computation.

Source

Leave a Reply

Your email address will not be published. Required fields are marked *