GPU 12

FP8 Quantization - Viable for production? Feb 1, 2025
Attention Olympics - Fastest attention kernel? Jan 29, 2025
Building Custom TensorRT Plugins Dec 13, 2024
Understanding Attention Approximation Nov 5, 2024
FP8 Quantization using TensorRT Oct 27, 2024
TensorRT - From Frustration to Production Oct 11, 2024
Flash, Fused and Fast Attention Sep 23, 2024
Why attention deserves your attention? Aug 16, 2024
Understanding Inference Optimization Frameworks Aug 4, 2024
Breaking Down ML Inference Bottlenecks Jul 11, 2024
GPU vs CPU - Matmul, Sine Waves, and the Myth of Speed Jun 14, 2024
How does a GPU work? Jun 11, 2024