I'm a Berkeley EECS student, currently pursuing my M.S. with a focus on ML Systems. I'm interested in backend and full-stack development. I'm currently researching at Berkeley Sky Lab, collaborating with Profs. Ion Stoica and Joseph Gonzalez. My work has improved LLM serving performance by implementing efficient attention kernels and integrating control vectors in vLLM. Previously, I interned at Myko AI and ASUS, developing NLP business analytics and optimizing AI compute environments.
Here are some of my recent projects and research work.
Implemented a two-level cascade-attention kernel in vLLM, unifying the FlashInfer backend and cutting kernel latency by 2.1× on beam-search workloads.
A weekly financial information newsletter that leverages custom AI agents to research and provide analysis of the latest information about several industies.
Created a multi-hop, multi-document question-answering system leveraging LangChain, Pinecone, and OpenAI API that empowers users to query across multiple file formats.
Developed core components for the PintOS Operating System by implementing a priority-based scheduler, virtual memory, buffer page caching, and file systems.
Built a version-control system in Java with commits, branching, merging, and checkouts functionality using serialization and blob storage.
Client Project for Payload CMS. Engineered a configuration import/export module for Payload, enabling seamless transfer of content models across environments.
Client Project for Spoak.com. Created an automated solution that detects and removes furniture in room photos using computer vision, cutting manual editing time by 80%.