NVIDIA Dynamo: Distributed LLM Inference

Dynamo is NVIDIA’s open-source datacenter-scale distributed inference serving framework for generative AI and reasoning models. Built in Rust for performance and Python for extensibility, it supports disaggregated prefill and decode, dynamic GPU scheduling, and LLM-aware request routing across multi-node multi-GPU topologies. The project has 6k+ GitHub stars and supports backends including TensorRT-LLM, vLLM, and SGLang. ...

October 1, 2025 · 10 min · 2045 words · PeaBrane