Amey Agrawal
I am a PhD student at Georgia Tech where I am advised by Prof. Alexey Tumanov. My primary area of interest is systems for foundation models.
Previously, I was a research engineer at Microsoft Research, where I worked in Dr. Muthian Sivathanu’s team on low-level systems for deep learning infrastructure. Before that, I spent a couple of years working at Qubole, a big data platform start-up. I did my bachelor’s in Computer Science from BITS Pilani, India in 2018. For more details, refer to my resume or drop me an email.
Publications
DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization
Amey Agrawal, Sameer Reddy, Satwik Bhattamishra, Venkata Prabhakara Sarath Nookala, Vidushi Vashishth, Kexin Rong, Alexey Tumanov
15th ACM Symposium on Cloud Computing (SoCC 2024), Redmond [pdf]
Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations
Amey Agrawal, Junda Chen, Íñigo Goiri, Ramachandran Ramjee, Chaojie Zhang, Alexey Tumanov, Esha Choukse
Preprint: arXiv:2409.17264 (2024) [pdf]
Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
Amey Agrawal, Anmol Agarwal, Nitin Kedia, Jayashree Mohan, Souvik Kundu, Nipun Kwatra, Ramachandran Ramjee, Alexey Tumanov
Preprint: arXiv:2407.07000 (2024) [pdf] [code]
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Alexey Tumanov, and Ramachandran Ramjee
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI’24), Santa Clara [pdf] [code] [video]
Vidur: A Large Scale Simulation Framework For LLM Inference
Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee, and Alexey Tumanov
7th Annual Conference on Machine Learning Systems (MLSys’24), Santa Clara [pdf] [code] [video]
Sarathi: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills
Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani and Ramachandran Ramjee
Preprint: arXiv:2308.16369 (2023) [pdf]
Singularity: Planet-Scale, Preemptible and Elastic Scheduling of AI Workloads
Singularity Team, Microsoft
Preprint: arXiv:2202.07848 (2022) [pdf]
Learning Digital Circuits: A Journey Through Weight Invariant Self-Pruning Neural Networks
Amey Agrawal, and Rohit Karlupiya
Proceedings of New in ML Workshop, NeurIPS, 2019, Vancouver Proceedings of Sparsity in Neural Networks Workshop, 2021, Virtual [pdf]
[code]
Delog: A Privacy Preserving Log Filtering Framework for Online Compute Platforms
Amey Agrawal, Abhishek Dixit, Namrata Shettar, Darshil Kapadia, Rohit Karlupia, Vikram Agrawal, and Rajat Gupta
Proceedings of IEEE International Conference on Big Data, 2019, Los Angeles
[pdf]
Logan: A Distributed Online Log Parser
Amey Agrawal, Rajat Gupta, and Rohit Karlupiya
Proceedings of IEEE International Conference on Data Engineering (ICDE), 2019, Macau
[pdf] [blog]
Select Projects
Learning Efficient Job Placement Policy for ETL jobs on Big Data Platforms
Mentors: Joydeep Sen Sarma, Rohit Karlupia
A learnt scheduling algorithm that leverages recurrent nature of ETL worloads to minimize operational cost by optimal job placement.
Callisto: Bringing Jupyter notebooks to classroom
Advisor: Prof. Surekha Bhanot
A cross-platform desktop application to host and grade assignments designed in Jupyter notebook. The system strives to lower the barrier to entry in the scientific Python ecosystem for newcomers by providing a one-click setup of development environment and Google Colab like interface for hosted assignments. This work was later presented at PyCon India, 2020. [blog] [code] [demo]
Deep Reinforcement Learning for Autonomous Warehouse Robots
Advisor: Prof. Surekha Bhanot
A framework to create Q-learning agents for autonomous navigation tasks in warehouses. The agents are pre-trained in a custom simulation environment built on top of V-REP, a popular robotics simulation package. [code]
Disentanglement Learning for Iris Image Indexing
Advisor: Prof. Kamlesh Tiwari
An autoencoder architecture to learn representations of normalized Iris images that are robust to geometric variations which occur in real-world Iris samples. [blog] [code]
Automated news-in-shorts
Advisor: Prof. Poonam Goyal
A news aggregation system that collects the latest posts from RSS feeds of multiple news agencies to automatically generate abstracts for top stories. Trending topics on Twitter are mapped to news articles and generate extractive text summaries using a natural language processing pipeline. [code]