Black Forest Labs, one of the world’s leading AI research labs, just changed the game for image generation. The lab’s FLUX.1 image models have earned global attention for delivering high-quality visuals with exceptional prompt adherence. Now, with its new FLUX.1 Kontext model, the lab is fundamentally changing how users can guide and refine the image
Read Article
FLUX.1 Kontext, the recently released model from Black Forest Labs, is a fascinating addition to the repertoire of community image generation models. The open…
FLUX.1 Kontext, the recently released model from Black Forest Labs, is a fascinating addition to the repertoire of community image generation models. The open weights FLUX.1 Kontext [dev] variant, the focus of this post, is a model meticulously optimized for image-to-image transformation tasks. This pioneering tool stands out for its incremental image editing capabilities…
In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the…
In this blog post, we’ll break down the main FP8 scaling strategies—per-tensor scaling, delayed and current scaling, and per-block scaling (including the Blackwell-backed MXFP8 format)—and explain why each is essential for maintaining numerical stability and accuracy during low-precision training. Understanding these approaches will help with choosing the right recipe for your own FP8 workflows.
AI agents are revolutionizing the digital workforce by transforming business operations, automating complex tasks, and unlocking new efficiencies. With the…
AI agents are revolutionizing the digital workforce by transforming business operations, automating complex tasks, and unlocking new efficiencies. With the ability to collaborate, these agents can now work together to tackle complex problems and drive even greater impact. The NVIDIA NeMo Agent toolkit is an open source library that simplifies the integration of agents…
In many parts of the world, including major technology hubs in the U.S., there’s a yearslong wait for AI factories to come online, pending the buildout of new energy infrastructure to power them. Emerald AI, a startup based in Washington, D.C., is developing an AI solution that could enable the next generation of data centers
Read Article
NeMo Retriever tops several visual document retrieval leaderboards, setting new standards for RAG apps.
NeMo Retriever tops several visual document retrieval leaderboards, setting new standards for RAG apps.
Data goes far beyond text—it is inherently multimodal, encompassing images, video, audio, and more, often in complex and unstructured formats. While the…
Data goes far beyond text—it is inherently multimodal, encompassing images, video, audio, and more, often in complex and unstructured formats. While the common method is to convert PDFs, scanned images, slides, and other documents into text, it is challenging to capture all information in text format, as shown in Figure 1. The loss of visual information in text motivated the development of…
To get the most out of AI, optimizations are critical. When developers think about optimizing AI models for inference, model compression techniques—such as quantization, distillation, and pruning—typically come to mind. The most common of the three, without a doubt, is quantization. This is typically due to its post-optimization task-specific accuracy performance and broad choice of supported frameworks and techniques.
Read Article
NVIDIA now supports the general availability of Gemma 3n on NVIDIA RTX and Jetson. Gemma, previewed by Google DeepMind at Google I/O last month, includes two new models optimized for multi-modal on-device deployment. Gemma now includes audio in addition to the text and vision capabilities introduced in version 3.5. Each component integrates trusted research models: Universal Speech Model for audio, MobileNet v4 for vision, and MatFormer for text.
Read Article