ALLEN PHILIP J
Blog Projects About Resume
Jan 8, 2025

LLM Inference: From KV Caching to vLLM

LLM Inference Optimization
Jan 8, 2025

GRPO: From Supervised Finetuning to Reinforcement Learning

LLM GRPO Finetuning
Dec 13, 2024

Building Custom TensorRT Plugins

TensorRT GPU Optimization
Oct 15, 2024

The Mathematics of Transformers

Transformers Attention ML
Oct 11, 2024

TensorRT: From Frustration to Production

TensorRT GPU Optimization
Sep 23, 2024

Flash, Fused and Fast Attention

Attention GPU Optimization
Aug 16, 2024

Why Attention Deserves Your Attention?

Attention Transformers ML
Aug 4, 2024

Understanding Inference Optimization Frameworks

Optimization TensorRT PyTorch
Jun 14, 2024

GPU vs CPU: Matmul, Sine Waves, and the Myth of Speed

GPU Performance PyTorch
Jun 11, 2024

How Does a GPU Work?

GPU CUDA Hardware
Jan 29, 2024

Deep Learning Basics

Deep Learning PyTorch ML

© 2026 Allen Philip J

  • GitHub
  • LinkedIn
  • RSS