Breaking Down ML Inference Bottlenecks
Breaking Down ML Inference Bottlenecks
Summary
- Profile a sample inference pipeline
- Identify the bottlenecks: CPU vs GPU
- When we are CPU bound: Eg. Use NeMO GPU dataloader
- When we are GPU bound: Eg. ??
- Ensure overlap of CPU and GPU: Eg. Ensure CPU is ready to serve to GPU
- Hidden I/O costs of moving back and forth from GPU
This post is licensed under CC BY 4.0 by the author.