Allen Philip J

Building AI platforms & making large models run fast

Dublin, Ireland

Background

I'm a Senior Product Engineer at Intercom, working on AI platforms. Previously, I spent 4 years at Adobe optimizing inference for Firefly's generative video and audio models.

My focus is post-training inference optimization: quantization, attention efficiency, runtime acceleration, and deep profiling. I work at the intersection of model architecture and system performance—getting the most out of inference frameworks and hardware (A100/H100).

I don't just run models—I make them run well.

Focus Areas

Attention & Kernels — Integrating FlashAttention variants, writing custom CUDA kernels, optimizing diffusion loop inference for video models.

Quantization — INT8 and FP8 workflows, calibration strategies, navigating the accuracy-performance tradeoff for production GenAI models.

Model Compilation — TensorRT optimization, torch.compile debugging, operator fusion strategies for real-world speedups on A100/H100.

AI Platforms — Building self-serve ML platforms, anomaly detection systems, and scalable inference pipelines.

Experience

2025 – Present

Senior Product Engineer, Intercom

AI Infrastructure

Building AI platform infrastructure for customer support automation
Scaling LLM-powered features for enterprise customers

2021 – 2025

ML Engineer 4, Adobe

Firefly Video/Audio GenAI

Reduced inference latency by 35% via FP8 quantization and 25% via custom TRT attention plugins
Scaled Enhance Speech pipeline to 150 RPM on 50+ A100 GPUs; cut ops costs by 33%

2016 – 2021

Senior ML Engineer, Blue Yonder

Promotional Demand Forecasting

Led ML pipelines for demand modeling serving 90M+ SKUs for Walmart, Loblaws, Woolworths
Optimized batch processing to meet SLAs for retailers with massive product catalogs

2014 – 2016

Software Engineer, Oracle

Cloud Commerce

Full-stack development for B2B catalog, pricing, and user management features
Recipient of Best Rookie Award (2015)

B.Tech Electrical Engineering, IIT Madras (2014)

Now

Building AI platform infrastructure at Intercom. Expanding deeper into LLMs while continuing to write about inference optimization, attention mechanisms, and the practical side of ML systems.

Tools

PyTorch CUDA TensorRT Triton Python C++ Docker Kubernetes

Get in touch

Have a question about ML inference or want to collaborate? Feel free to reach out.

allenphilip93@gmail.com