Month: December 2024

Misc

Bamba: Inference-Efficient Hybrid Mamba2 Model

Post author By
Post date December 18, 2024
No Comments on Bamba: Inference-Efficient Hybrid Mamba2 Model

Misc

NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference

Post author By
Post date December 18, 2024
No Comments on NVIDIA TensorRT-LLM Now Supports Recurrent Drafting for Optimizing LLM Inference

Recurrent drafting (referred as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM)…

Recurrent drafting (referred as ReDrafter) is a novel speculative decoding technique developed and open-sourced by Apple for large language model (LLM) inference now available with NVIDIA TensorRT-LLM. ReDrafter helps developers significantly boost LLM workload performance on NVIDIA GPUs. NVIDIA TensorRT-LLM is a library for optimizing LLM inference. It provides an easy-to-use Python API to define…

Source

Misc

AI at Your Service: Digital Avatars With Speech Capabilities Offer Interactive Customer Experiences

Post author By
Post date December 18, 2024
No Comments on AI at Your Service: Digital Avatars With Speech Capabilities Offer Interactive Customer Experiences

To enhance productivity and upskill workers, organizations worldwide are seeking ways to provide consistent, around-the-clock customer service with greater speed, accuracy and scale.

Misc

Imbue’s Kanjun Qiu Shares Insights on How to Build Smarter AI Agents

Post author By
Post date December 18, 2024
No Comments on Imbue’s Kanjun Qiu Shares Insights on How to Build Smarter AI Agents

Imagine a future in which everyone is empowered to build and use their own AI agents. That future may not be far off, as new software is infused with intelligence through collaborative AI systems that work alongside users rather than merely automating tasks. In this episode of the NVIDIA AI Podcast, Kanjun Qiu, CEO of
Read Article

Misc

NVIDIA Awards up to $60,000 Research Fellowships to PhD Students

Post author By
Post date December 18, 2024
No Comments on NVIDIA Awards up to $60,000 Research Fellowships to PhD Students

For more than two decades, the NVIDIA Graduate Fellowship Program has supported graduate students doing outstanding work relevant to NVIDIA technologies. Today, the program announced the latest awards of up to $60,000 each to 10 Ph.D. students involved in research that spans all areas of computing innovation. Selected from a highly competitive applicant pool, the
Read Article

Misc

Data-Efficient Knowledge Distillation for Supervised Fine-Tuning with NVIDIA NeMo-Aligner

Post author By
Post date December 17, 2024
No Comments on Data-Efficient Knowledge Distillation for Supervised Fine-Tuning with NVIDIA NeMo-Aligner

Icon image of a chart and search symbol, on a purple background. Knowledge distillation is an approach for transferring the knowledge of a much larger teacher model to a smaller student model, ideally yielding a compact,…

Knowledge distillation is an approach for transferring the knowledge of a much larger teacher model to a smaller student model, ideally yielding a compact, easily deployable student with comparable accuracy to the teacher. Knowledge distillation has gained popularity in pretraining settings, but there are fewer resources available for performing knowledge distillation during supervised fine-tuning…

Source

Misc

Efficient Ray Tracing with NVIDIA OptiX Shader Binding Table Optimization

Post author By
Post date December 17, 2024
No Comments on Efficient Ray Tracing with NVIDIA OptiX Shader Binding Table Optimization

NVIDIA OptiX is the API for GPU-accelerated ray tracing with CUDA, and is often used to render scenes containing a wide variety of objects and materials. During…

NVIDIA OptiX is the API for GPU-accelerated ray tracing with CUDA, and is often used to render scenes containing a wide variety of objects and materials. During an OptiX launch, when a ray intersects a geometric primitive, a hit shader is executed. The question of which shader is executed for a given intersection is answered by the Shader Binding Table (SBT). The SBT may also be used to map input…

Source

Misc

Fine-Tuning Small Language Models to Optimize Code Review Accuracy

Post author By
Post date December 17, 2024
No Comments on Fine-Tuning Small Language Models to Optimize Code Review Accuracy

Generative AI is transforming enterprises by driving innovation and boosting efficiency across numerous applications. However, adopting large foundational…

Source

Misc

Deploy Agents, Assistants, and Avatars on NVIDIA RTX AI PCs with New Small Language Models

Post author By
Post date December 17, 2024
No Comments on Deploy Agents, Assistants, and Avatars on NVIDIA RTX AI PCs with New Small Language Models

Image of a photorealistic digital human looking at the camera. NVIDIA just announced a series of small language models (SLMs) that increase the amount and type of information digital humans can use to augment their…

NVIDIA just announced a series of small language models (SLMs) that increase the amount and type of information digital humans can use to augment their responses. This includes new large-context models that provide more relevant answers and new multi-modal models that allow images as inputs. These models are available now as part of NVIDIA ACE, a suite of digital human technologies that brings…

Source

Misc

Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding

Post author By
Post date December 17, 2024
No Comments on Boost Llama 3.3 70B Inference Throughput 3x with NVIDIA TensorRT-LLM Speculative Decoding

Meta’s Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only instruction-tuned model….

Meta’s Llama collection of open large language models (LLMs) continues to grow with the recent addition of Llama 3.3 70B, a text-only instruction-tuned model. Llama 3.3 provides enhanced performance respective to the older Llama 3.1 70B model and can even match the capabilities of the larger, more computationally expensive Llama 3.1 405B model on several tasks including math, reasoning, coding…

Source