Archives
- 01 Feb FP8 Quantization - Viable for production?
- 29 Jan Attention Olympics - Fastest attention kernel?
- 13 Dec Building Custom TensorRT Plugins
- 05 Nov Understanding Attention Approximation
- 27 Oct FP8 Quantization using TensorRT
- 11 Oct TensorRT - From Frustration to Production
- 23 Sep Flash, Fused and Fast Attention
- 16 Aug Why attention deserves your attention?
- 04 Aug Understanding Inference Optimization Frameworks
- 11 Jul Breaking Down ML Inference Bottlenecks
- 14 Jun GPU vs CPU - Matmul, Sine Waves, and the Myth of Speed
- 11 Jun How does a GPU work?
- 04 Jun LLMs Explained - Part 6 - Transformers
- 03 Jun LLMs Explained - Part 5 - Attention!
- 02 Jun LLMs Explained - Part 4 - seq2seq
- 01 Jun LLMs Explained - Part 3 - RNNs
- 28 May LLMs Explained - Part 2 - Word Embeddings
- 27 May LLMs Explained - Part 1 - Tokenizers
- 23 May Python Data Class Builders
- 22 May Python Mappings
- 21 May Python Sequences
- 06 Feb Variational Autoencoders
- 22 Jan Deep Learning Basics
- 22 Jan Generative Modeling
- 26 Jun Man In The Middle (MITM) attack
- 23 Jun What happens when you open a website?
- 18 Jun Authentication Fundamentals
- 07 Jun How does the internet work?
- 24 Mar Python Data Model
- 27 Jul Understanding Feature Engineering
- 27 Jul Understanding Training Data
- 26 Jul Data Engineering Fundamentals
- 26 Jul Designing Machine Learning Systems
- 22 Jul Machine Learning Systems
- 21 Dec Effective Java | Creating & Destroying objects
- 03 Sep Performance tuning the document distance problem
- 31 Aug Design Web Crawler from Go Tutorial
- 28 Aug MapReduce Explained
- 27 Aug Newton's method for finding roots
- 27 Aug Syntax highlighting & Mathematical markup in HTML
- 27 Aug Understanding peak finding algorithm
- 25 Aug Create a website using NodeJS in 5 mins
- 19 Aug Create your own podcast on Spotify for free