GPU 12
- FP8 Quantization - Viable for production?
- Attention Olympics - Fastest attention kernel?
- Building Custom TensorRT Plugins
- Understanding Attention Approximation
- FP8 Quantization using TensorRT
- TensorRT - From Frustration to Production
- Flash, Fused and Fast Attention
- Why attention deserves your attention?
- Understanding Inference Optimization Frameworks
- Breaking Down ML Inference Bottlenecks
- GPU vs CPU - Matmul, Sine Waves, and the Myth of Speed
- How does a GPU work?