Categories
Misc

Finding the Best Chunking Strategy for Accurate AI Responses

Decorative image.A chunking strategy is the method of breaking down large documents into smaller, manageable pieces for AI retrieval. Poor chunking leads to irrelevant results,…Decorative image.

A chunking strategy is the method of breaking down large documents into smaller, manageable pieces for AI retrieval. Poor chunking leads to irrelevant results, inefficiency, and reduced business value. It determines how effectively relevant information is fetched for accurate AI responses. With so many options available—page-level, section-level, or token-based chunking with various sizes—how do…

Source

Categories
Misc

How Early Access to NVIDIA GB200 Systems Helped LMArena Build a Model to Evaluate LLMs

LMArena at the University of California, Berkeley is making it easier to see which large language models excel at specific tasks, thanks to help from NVIDIA and…

LMArena at the University of California, Berkeley is making it easier to see which large language models excel at specific tasks, thanks to help from NVIDIA and Nebius. Its rankings, powered by the Prompt-to-Leaderboard (P2L) model, collect votes from humans on which AI performs best in areas such as math, coding, or creative writing. “We capture user preferences across tasks and apply…

Source

Categories
Misc

Compiler Explorer: The Kernel Playground for CUDA Developers

Have you ever wondered exactly what the CUDA compiler generates when you write GPU kernels? Ever wanted to share a minimal CUDA example with a colleague…

Have you ever wondered exactly what the CUDA compiler generates when you write GPU kernels? Ever wanted to share a minimal CUDA example with a colleague effortlessly, without the need for them to install a specific CUDA toolkit version first? Or perhaps you’re completely new to CUDA and looking for an easy way to start without needing to install anything or even having a GPU on hand?

Source

Categories
Misc

Improved Performance and Monitoring Capabilities with NVIDIA Collective Communications Library 2.26

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL…

The NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multinode communication primitives optimized for NVIDIA GPUs and networking. NCCL is a central piece of software for multi-GPU deep learning training. It handles any kind of inter-GPU communication, be it over PCI, NVIDIA NVLink, or networking. It uses advanced topology detection, optimized communication graphs…

Source

Categories
Misc

AI in Manufacturing and Operations at NVIDIA: Accelerating ML Models with NVIDIA CUDA-X Data Science

NVIDIA leverages data science and machine learning to optimize chip manufacturing and operations workflows—from wafer fabrication and circuit probing to…

NVIDIA leverages data science and machine learning to optimize chip manufacturing and operations workflows—from wafer fabrication and circuit probing to packaged chip testing. These stages generate terabytes of data, and turning that data into actionable insights at speed and scale is critical to ensuring quality, throughput, and cost efficiency. Over the years, we’ve developed robust ML pipelines…

Source

Categories
Misc

Benchmarking LLM Inference Costs for Smarter Scaling and Deployment

Decorative image.This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of LLM…Decorative image.

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of LLM inference by estimating the total cost of ownership (TCO). See LLM Inference Benchmarking: Fundamental Concepts for background knowledge on common metrics for benchmarking and parameters. See LLM Inference Benchmarking Guide: NVIDIA…

Source

Categories
Misc

Plug and Play: Build a G-Assist Plug-In Today

Project G-Assist — available through the NVIDIA App — is an experimental AI assistant that helps tune, control and optimize NVIDIA GeForce RTX systems. NVIDIA’s Plug and Play: Project G-Assist Plug-In Hackathon — running virtually through Wednesday, July 16 — invites the community to explore AI and build custom G-Assist plug-ins for a chance to
Read Article

Categories
Misc

Fine-Tuning LLMOps for Rapid Model Evaluation and Ongoing Optimization

A decorative image.Large language models (LLMs) have created unprecedented opportunities across various industries. However, moving LLMs from research and development into…A decorative image.

Large language models (LLMs) have created unprecedented opportunities across various industries. However, moving LLMs from research and development into reliable, scalable, and maintainable production systems presents unique operational challenges. LLMOps, or large language model operations, are designed to address these challenges. Building upon the principles of traditional machine…

Source

Categories
Misc

Power Real-Time AI Media Effects with New AI Reference Apps on NVIDIA Holoscan for Media

AI Virtual Camera video input and output.Live media workflows are increasingly using AI microservices to augment production capabilities. However, advanced AI models are mostly hosted in the cloud,…AI Virtual Camera video input and output.

Live media workflows are increasingly using AI microservices to augment production capabilities. However, advanced AI models are mostly hosted in the cloud, making it challenging to process high-bitrate, uncompressed media streams due to constraints around network latency, bandwidth, and real-time scalability. NVIDIA released new AI reference applications that facilitate the ease of AI…

Source

Categories
Misc

R²D²: Building AI-based 3D Robot Perception and Mapping with NVIDIA Research

Robots must perceive and interpret their 3D environments to act safely and effectively. This is especially critical for tasks such as autonomous navigation,…

Robots must perceive and interpret their 3D environments to act safely and effectively. This is especially critical for tasks such as autonomous navigation, object manipulation, and teleoperation in unstructured or unfamiliar spaces. Advances in robotic perception increasingly focus on integrating 3D scene understanding, generalizable object tracking, and persistent spatial memory—using robust…

Source