15 September 2026 16:00 - 16:30
Squeezing the hardware: Systems efficiency for LLM training and inference
Systems efficiency is what ultimately determines whether a model ships on time and on budget.
This talk gives practitioners a concrete toolkit to answer a critical question: is your hardware actually working? We walk through a diagnostic-first mindset using GPU memory utilization, SM occupancy, PyTorch Profiler, and Nsight Systems to identify where compute time is lost.
We will dissect common production bottlenecks, starting with data starvation in multimodal training pipelines, and introduce SPDL, a new open-source thread-based data loader designed to solve it. The session concludes with strategies for identifying and optimizing other core training and inference bottlenecks.