Gaurav Tarlok Kakkar

Ph.D. Student @ Georgia Tech

Research Area: Data Systems for AI, NL2SQL

I am a final year Ph.D. student at Georgia Tech's College of Computing. My research focuses on the intersection of databases and machine learning, where I work on improving resource efficiency and enhancing the usability of data systems. I am passionate about building novel systems to accelerate workloads for emerging AI applications.

Gaurav Tarlok Kakkar

Research

I build efficient AI-powered data systems. My work lies at the intersection of databases and machine learning, with a focus on improving resource efficiency and enhancing usability. I started by rethinking how databases handle video analytics and extended those ideas to multimodal data. Most recently, my focus has shifted to making databases more usable and accessible through natural language.

Video & Multimodal Analytics

  • EvaDB ( SIGMOD 2022, DEEM @ SIGMOD 2023, VLDB 2023 (Seiden) ) — An open-source database system that enables users to query multimodal data using a SQL/dataframe‑like interface. Under the hood, it transparently provides AI UDF‑centric optimizations such as UDF result reuse, UDF reordering, and parallelization to improve hardware utilization. EvaDB has gained significant recognition, amassing approximately 2.7K GitHub stars.
  • Aero (SIGMOD 2025) — Argues that ML‑centric DBMSs should replace static, profile‑based optimizers with adaptive query processing. Aero notes that collecting accurate UDF (AI operator) statistics is costly and unreliable; instead, it measures selectivity and cost at runtime to reorder and route predicates on the fly, and employs dynamic scaling to improve resource utilization.

Natural Language Interfaces to Databases (NL2SQL)

  • PRISM (under review) — While NL2SQL research has largely focused on maximizing accuracy, the monetary cost of LLM-based pipelines has been overlooked. Real-world deployments must balance accuracy and cost, but the configuration space (LLM choice, prompting, schema linking) is highly interdependent and schema-sensitive. PRISM tackles this schema-aware tuning challenge.

Industrial Experience

Google ResearchResearch Intern in Systems Research Group | May 2024 – Dec 2024

SnowflakeSoftware Development Intern in SQL Optimization Team | Summer 2021

Google Cloud SQLSoftware Development Intern in Cloud SQL | Summer 2020

AdobeMember of Technical Staff | Jul 2017 – Aug 2019

Adobe ResearchResearch Intern in Big Data Experience Lab | May 2016 – July 2016