About
Hey 👋, I’m Allen Philip!
ML engineer optimizing big models for real-time performance. Writing about inference, attention, and the edge of what’s practical.
About me
I’m a Machine Learning Engineer currently at Adobe, working at the intersection of large vision and audio models, LLMs, and GPU-level performance optimization. My work lives in the wild: production pipelines, low-latency inference systems, and GenAI features used by millions.
I specialize in making large models run fast and cheap—whether that means quantization, attention kernel optimization, or pushing TFLOPs on A100s without melting the cluster.
I started my career in backend systems and large-scale data engineering, eventually making the jump into ML systems in 2016. Since then, I’ve led and built pipelines for everything from retail demand forecasting to anomaly detection to Adobe Firefly’s generative stack.
This blog is my personal space to document what I’m learning, debugging, optimizing or just obsessing over.
What You’ll Find Here
- Deep dives into ML inference and model optimization (FlashAttention, quantization, TensorRT, FSDP, etc.)
- Notes and explainers on attention mechanisms, transformer internals, and how things really work under the hood
- Reflections on books, engineering, and ideas that don’t fit into a Jira ticket
And off the clock?
- 💻 I love deconstructing & exploring new tech stacks to build cool stuff.
- 📰 Reading & writing about tech whenever possible.
- 🍕 Fan of horror & thriller movies, anime, music (literally anything).
Languages and Tools:
My ML Era
My DE Era
My SDE Era
What I use
Projects and Dev Stuffs:
Oh I wish this starts looking better soon :’(