Philip Kiely is the Head of Developer Relations at Baseten, where he focuses on inference infrastructure for AI applications. He is the author of "Inference Engineering," an in-depth guide covering GPU architectures, CUDA, vLLM, TensorRT-LLM, and large-scale distributed model serving. He has spoken at NVIDIA GTC, PyTorch Conference, AI Engineer World's Fair, and AWS re:Invent.
Philip Kiely appeared as a guest on ThursdAI, the weekly AI news podcast hosted by Alex Volkov. Browse the full guest directory or subscribe on Substack to never miss an episode.
ThursdAI — The weekly AI podcast. Every Thursday, live.
Subscribe Free →