$ whoami

CS Ph.D. Student @ Georgia Tech
Research Area: Systems for AI, LLM Inference

I'm a final-year PhD student advised by Prof. Alexey Tumanov, building high-performance systems for foundation models. Currently building Vajra - the world's fastest open-source AI inference engine, designed for real-time multimodal agentic workloads. We're looking for talented systems builders to join us!

Experience Timeline

PresentProject Lead @ Project Vajra Mentor: Alexey Tumanov

2025Research Intern @ Microsoft Research Mentors: Sadjad Fouladi & Ganesh Ananthanarayanan

2024Research Intern @ Azure Systems Research Mentor: Esha Choukse

2023Research Intern @ MSR Research India Mentors: Ram Ramjee & Bhargav Gulavani

2021Research Engineer II @ Microsoft Research Mentor: Muthian Sivathanu

2018Member of Technical Staff II @ Qubole

$ ls projects/

Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving

Amey Agrawal*, Mayank Yadav*, Sukrit Kumar, Anirudha Agrawal, Garv Ghai, Souradeep Bera, Elton Pinto, Sirish Gambhira, Mohammad Adain, Kasra Sohrab, Chus Antonanzas, Alexey Tumanov

[PDF]

On Evaluating Performance of LLM Inference Serving Systems

Amey Agrawal, Nitin Kedia, Anmol Agarwal, Jayashree Mohan, Nipun Kwatra, Souvik Kundu, Ramachandran Ramjee, Alexey Tumanov

[PDF]

No Request Left Behind: Tackling Heterogeneity in Long-Context LLM Inference with Medha

Amey Agrawal, Haoran Qiu, Junda Chen, Íñigo Goiri, Chaojie Zhang, Rayyan Shahid, Ramachandran Ramjee, Alexey Tumanov, Esha Choukse

[PDF]

Maya: Optimizing Deep Learning Training Workloads using Emulated Virtual Accelerators

Srihas Yarlagadda*, Amey Agrawal*, Elton Pinto*, Hakesh Darapaneni, Mitali Meratwal, Shivam Mittal, Pranavi Bajjuri, Srinivas Sridharan, Alexey Tumanov

» EuroSys'26

[PDF]

Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization

Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov

» SoCC'24

[PDF]

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, and Ramachandran Ramjee

» OSDI'24

[PDF] [Code] [Video]

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Amey Agrawal*, Anmol Agarwal*, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov

[PDF] [Code]

Vidur: A Large Scale Simulation Framework For LLM Inference

Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee, and Alexey Tumanov

» MLSys'24

[PDF] [Code] [Video]

[PDF] [Code]

Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms

Amey Agrawal, Abhishek Dixit, Namrata Shettar, Darshil Kapadia, Rohit Karlupia, Vikram Agrawal, and Rajat Gupta

» IEEE Big Data 2019

[PDF]

Logan: A Distributed Online Log Parser

Amey Agrawal, Rajat Gupta, and Rohit Karlupiya

» ICDE 2019

[PDF] [Website]

Experience Timeline

$ ls projects/

Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving

On Evaluating Performance of LLM Inference Serving Systems

No Request Left Behind: Tackling Heterogeneity in Long-Context LLM Inference with Medha

Maya: Optimizing Deep Learning Training Workloads using Emulated Virtual Accelerators

Inshrinkerator: Compressing Deep Learning Training Checkpoints via Dynamic Quantization

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Vidur: A Large Scale Simulation Framework For LLM Inference

Sarathi: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Singularity: Planet-Scale, Preemptible and Elastic Scheduling of AI Workloads

Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks

Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms

Logan: A Distributed Online Log Parser