Breaking Down ML Inference Bottlenecks

Posted Jul 11, 2024 Updated Apr 23, 2025

Breaking Down ML Inference Bottlenecks

By Allen Philip J

views 1 min read

Breaking Down ML Inference Bottlenecks

Summary

Profile a sample inference pipeline
Identify the bottlenecks: CPU vs GPU
When we are CPU bound: Eg. Use NeMO GPU dataloader
When we are GPU bound: Eg. ??
Ensure overlap of CPU and GPU: Eg. Ensure CPU is ready to serve to GPU
Hidden I/O costs of moving back and forth from GPU

Optimization, Benchmarking

This post is licensed under CC BY 4.0 by the author.