To get the most out of AI, optimizations are critical. When developers think about optimizing AI models for inference, model compression techniques—such as quantization, distillation, and pruning—typically come to mind. The most common of the three, without a doubt, is quantization. This is typically due to its post-optimization task-specific accuracy performance and broad choice of supported frameworks and techniques.
Read Article
NVIDIA now supports the general availability of Gemma 3n on NVIDIA RTX and Jetson. Gemma, previewed by Google DeepMind at Google I/O last month, includes two new models optimized for multi-modal on-device deployment. Gemma now includes audio in addition to the text and vision capabilities introduced in version 3.5. Each component integrates trusted research models: Universal Speech Model for audio, MobileNet v4 for vision, and MatFormer for text.
Read Article
Just Released: NVIDIA PhysicsNeMo v25.06
New functionality to curate and train DoMINO at scale and validate against a physics-based benchmark suite.
New functionality to curate and train DoMINO at scale and validate against a physics-based benchmark suite.
In high-stakes fields such as quant finance, algorithmic trading, and fraud detection, data practitioners frequently need to process hundreds of gigabytes (GB)…
In high-stakes fields such as quant finance, algorithmic trading, and fraud detection, data practitioners frequently need to process hundreds of gigabytes (GB) of data to make quick, informed decisions. Polars, one of the fastest-growing data processing libraries, meets this need with a GPU engine powered by NVIDIA cuDF that accelerates compute-bound queries that are common in these fields.
Researchers have developed an AI-powered tool that can analyze nurses’ shift notes to identify—far earlier than traditional methods—when an admitted…
Researchers have developed an AI-powered tool that can analyze nurses’ shift notes to identify—far earlier than traditional methods—when an admitted patient’s health may be deteriorating or on the cusp of “crashing.” In early trials, the AI-tool, dubbed the CONCERN Early Warning System (CONCERN EWS), helped lower a patient’s risk of death by more than 35% while reducing the average hospital…
Customizing embedding models is crucial for effective information retrieval, especially when working with domain-specific data like legal text, medical records,…
Customizing embedding models is crucial for effective information retrieval, especially when working with domain-specific data like legal text, medical records, or multi-turn customer conversations. Generic, open-domain models often struggle to capture the nuances and structure of such specialized content. Coxwave Align, an analytics platform for conversational-AI products…
As of today, NVIDIA now supports the general availability of Gemma 3n on NVIDIA RTX and Jetson. Gemma, previewed by Google DeepMind at Google I/O last month,…
As of today, NVIDIA now supports the general availability of Gemma 3n on NVIDIA RTX and Jetson. Gemma, previewed by Google DeepMind at Google I/O last month, includes two new models optimized for multi-modal on-device deployment. Gemma now includes audio in addition to the text and vision capabilities introduced in version 3.5. Each component integrates trusted research models: Universal…
Simulated driving environments enable engineers to safely and efficiently train, test and validate autonomous vehicles (AVs) across countless real-world and edge-case scenarios without the risks and costs of physical testing.
Mark Theriault founded the startup FITY envisioning a line of clever cooling products: cold drink holders that come with freezable pucks to keep beverages cold for longer without the mess of ice. The entrepreneur started with 3D prints of products in his basement, building one unit at a time, before eventually scaling to mass production.
Read Article