DataBloom - Part 11

Misc

Delivering the Missing Building Blocks for NVIDIA CUDA Kernel Fusion in Python

Post author By
Post date July 9, 2025
No Comments on Delivering the Missing Building Blocks for NVIDIA CUDA Kernel Fusion in Python

Decorative image. C++ libraries like CUB and Thrust provide high-level building blocks that enable NVIDIA CUDA application and library developers to write speed-of-light code…

C++ libraries like CUB and Thrust provide high-level building blocks that enable NVIDIA CUDA application and library developers to write speed-of-light code that is portable across architectures. Many widely used projects, such as PyTorch, TensorFlow, XGBoost, and RAPIDS, use these abstractions to implement core functionality. The same abstractions are missing in Python. There are high-level…

Source

Misc

Creating custom kernels for the AMD MI300

Post author By
Post date July 9, 2025
No Comments on Creating custom kernels for the AMD MI300

Misc

Reachy Mini – The Open-Source Robot for AI Builders

Post author By
Post date July 9, 2025
No Comments on Reachy Mini – The Open-Source Robot for AI Builders

Misc

Upskill your LLMs with Gradio MCP Servers

Post author By
Post date July 9, 2025
No Comments on Upskill your LLMs with Gradio MCP Servers

Misc

New Learning Pathway: Deploy AI Models with NVIDIA NIM on GKE

Post author By
Post date July 8, 2025
No Comments on New Learning Pathway: Deploy AI Models with NVIDIA NIM on GKE

An image of two women working at a laptop. Get hands-on with Google Kubernetes Engine (GKE) and NVIDIA NIM when you join the new Google Cloud and NVIDIA community.

Get hands-on with Google Kubernetes Engine (GKE) and NVIDIA NIM when you join the new Google Cloud and NVIDIA community.

Source

Misc

SmolLM3: smol, multilingual, long-context reasoner

Post author By
Post date July 8, 2025
No Comments on SmolLM3: smol, multilingual, long-context reasoner

Misc

Asking an Encyclopedia-Sized Question: How To Make the World Smarter with Multi-Million Token Real-Time Inference

Post author By
Post date July 8, 2025
No Comments on Asking an Encyclopedia-Sized Question: How To Make the World Smarter with Multi-Million Token Real-Time Inference

Helix Parallelism, introduced in this blog, enables up to a 32x increase in the number of concurrent users at a given latency, compared to the best known prior parallelism methods for real-time decoding with ultra-long context.
Read Article

Misc

Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure

Post author By
Post date July 8, 2025
No Comments on Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure

Misc

Efficient MultiModal Data Pipeline

Post author By
Post date July 8, 2025
No Comments on Efficient MultiModal Data Pipeline

Misc

Asking an Encyclopedia-Sized Question: How To Make the World Smarter with Multi-Million Token Real-Time Inference

Post author By
Post date July 7, 2025
No Comments on Asking an Encyclopedia-Sized Question: How To Make the World Smarter with Multi-Million Token Real-Time Inference

Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents…

Modern AI applications increasingly rely on models that combine huge parameter counts with multi-million-token context windows. Whether it is AI agents following months of conversation, legal assistants reasoning through gigabytes of case law as big as an entire encyclopedia set, or coding copilots navigating sprawling repositories, preserving long-range context is essential for relevance and…

Source