ML 26
- FP8 Quantization - Viable for production?
- Attention Olympics - Fastest attention kernel?
- Building Custom TensorRT Plugins
- Understanding Attention Approximation
- FP8 Quantization using TensorRT
- TensorRT - From Frustration to Production
- Flash, Fused and Fast Attention
- Why attention deserves your attention?
- Understanding Inference Optimization Frameworks
- Breaking Down ML Inference Bottlenecks
- GPU vs CPU - Matmul, Sine Waves, and the Myth of Speed
- How does a GPU work?
- LLMs Explained - Part 6 - Transformers
- LLMs Explained - Part 5 - Attention!
- LLMs Explained - Part 4 - seq2seq
- LLMs Explained - Part 3 - RNNs
- LLMs Explained - Part 2 - Word Embeddings
- LLMs Explained - Part 1 - Tokenizers
- Variational Autoencoders
- Deep Learning Basics
- Generative Modeling
- Understanding Feature Engineering
- Understanding Training Data
- Data Engineering Fundamentals
- Designing Machine Learning Systems
- Machine Learning Systems