NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over…
NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model, the largest and most powerful model available in the Llama 4 collection. This speed was independently measured by the AI benchmarking service…