Post

Breaking Down ML Inference Bottlenecks

Breaking Down ML Inference Bottlenecks

Summary

  • Profile a sample inference pipeline
  • Identify the bottlenecks: CPU vs GPU
  • When we are CPU bound: Eg. Use NeMO GPU dataloader
  • When we are GPU bound: Eg. ??
  • Ensure overlap of CPU and GPU: Eg. Ensure CPU is ready to serve to GPU
  • Hidden I/O costs of moving back and forth from GPU
This post is licensed under CC BY 4.0 by the author.