Blog | Allen Philip J

Jan 8, 2025

LLM Inference: From KV Caching to vLLM

LLM Inference Optimization

Jan 8, 2025

GRPO: From Supervised Finetuning to Reinforcement Learning

LLM GRPO Finetuning

Dec 13, 2024

Building Custom TensorRT Plugins

TensorRT GPU Optimization

Oct 15, 2024

The Mathematics of Transformers

Transformers Attention ML

Oct 11, 2024

TensorRT: From Frustration to Production

TensorRT GPU Optimization

Sep 23, 2024

Flash, Fused and Fast Attention

Attention GPU Optimization

Aug 16, 2024

Why Attention Deserves Your Attention?

Attention Transformers ML

Aug 4, 2024

Understanding Inference Optimization Frameworks

Optimization TensorRT PyTorch

Jun 14, 2024

GPU vs CPU: Matmul, Sine Waves, and the Myth of Speed

GPU Performance PyTorch

Jun 11, 2024

How Does a GPU Work?

GPU CUDA Hardware

Jan 29, 2024

Deep Learning Basics

Deep Learning PyTorch ML