Categories
Misc

Per-Tensor and Per-Block Scaling Strategies for Effective FP8 Training

Decorative image.In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the…Decorative image.

In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the Blackwell-backed MXFP8 format)—and explain why each is essential for maintaining numerical stability and accuracy during low-precision training. Understanding these approaches will help with choosing the right recipe for your own FP8 workflows.

Source

Categories
Misc

How to Build Custom AI Agents with NVIDIA NeMo Agent Toolkit Open Source Library

AI agents are revolutionizing the digital workforce by transforming business operations, automating complex tasks, and unlocking new efficiencies. With the…

AI agents are revolutionizing the digital workforce by transforming business operations, automating complex tasks, and unlocking new efficiencies. With the ability to collaborate, these agents can now work together to tackle complex problems and drive even greater impact. The NVIDIA NeMo Agent toolkit is an open source library that simplifies the integration of agents…

Source

Categories
Misc

How AI Factories Can Help Relieve Grid Stress

In many parts of the world, including major technology hubs in the U.S., there’s a yearslong wait for AI factories to come online, pending the buildout of new energy infrastructure to power them. Emerald AI, a startup based in Washington, D.C., is developing an AI solution that could enable the next generation of data centers
Read Article

Categories
Misc

Training and Finetuning Sparse Embedding Models with Sentence Transformers v5

Categories
Misc

NVIDIA NeMo Retriever Scores First Place for Visual Retrieval

NeMo Retriever tops several visual document retrieval leaderboards, setting new standards for RAG apps.

NeMo Retriever tops several visual document retrieval leaderboards, setting new standards for RAG apps.

Source

Categories
Misc

Best-in-Class Multimodal RAG: How the Llama 3.2 NeMo Retriever Embedding Model Boosts Pipeline Accuracy

Data goes far beyond text—it is inherently multimodal, encompassing images, video, audio, and more, often in complex and unstructured formats. While the…

Data goes far beyond text—it is inherently multimodal, encompassing images, video, audio, and more, often in complex and unstructured formats. While the common method is to convert PDFs, scanned images, slides, and other documents into text, it is challenging to capture all information in text format, as shown in Figure 1. The loss of visual information in text motivated the development of…

Source

Categories
Misc

Run Google DeepMind’s Gemma 3n on NVIDIA Jetson and RTX

NVIDIA now supports the general availability of Gemma 3n on NVIDIA RTX and Jetson. Gemma, previewed by Google DeepMind at Google I/O last month, includes two new models optimized for multi-modal on-device deployment. Gemma now includes audio in addition to the text and vision capabilities introduced in version 3.5. Each component integrates trusted research models: Universal Speech Model for audio, MobileNet v4 for vision, and MatFormer for text.
Read Article

Categories
Misc

Introducing NVFP4 for Efficient and Accurate Low-Precision Inference

To get the most out of AI, optimizations are critical. When developers think about optimizing AI models for inference, model compression techniques—such as quantization, distillation, and pruning—typically come to mind. The most common of the three, without a doubt, is quantization. This is typically due to its post-optimization task-specific accuracy performance and broad choice of supported frameworks and techniques.
Read Article

Categories
Misc

Just Released: NVIDIA PhysicsNeMo v25.06

New functionality to curate and train DoMINO at scale and validate against a physics-based benchmark suite.

New functionality to curate and train DoMINO at scale and validate against a physics-based benchmark suite.

Source

Categories
Misc

How to Work with Data Exceeding VRAM in the Polars GPU Engine

In high-stakes fields such as quant finance, algorithmic trading, and fraud detection, data practitioners frequently need to process hundreds of gigabytes (GB)…

In high-stakes fields such as quant finance, algorithmic trading, and fraud detection, data practitioners frequently need to process hundreds of gigabytes (GB) of data to make quick, informed decisions. Polars, one of the fastest-growing data processing libraries, meets this need with a GPU engine powered by NVIDIA cuDF that accelerates compute-bound queries that are common in these fields.

Source