I'm a third-year Computer Science PhD candidate at Stanford, where I'm co-advised by Kayvon Fatahalian and Chris Ré, affiliated with the Stanford AI Lab, the Statistical Machine Learning Group, DAWN, and the Stanford Computer Graphics Lab. In my research, I'm interested in building systems for machine learning, computer vision, and graphics, and I'm currently working on systems for rapidly creating machine learning models. I'm very fortunate to be supported by a Department of Defense NDSEG fellowship and a Magic Grant from the Brown Institute.
In 2018, I graduated from Harvard with an AB and an SM in Computer Science, cum laude with highest thesis honors. When I'm not working on school work or other projects, I spend most of my free time ballroom dancing.
|03/18/21||New blog post about three lessons I've learned about applying the design process to my ML research.|
|08/17/20||Preprint of our work on analyzing a decade of US cable TV news is now available on arXiv!|
|06/30/20||Epoxy, our new work on using weak supervision + pre-trained embeddings without fine-tuning is now available - paper, code, and a short video online!|
|06/29/20||A pre-recording of our FlyingSquid talk for ICML 2020 is available on YouTube!|
|06/01/20||FlyingSquid paper Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods accepted to ICML 2020!|
|02/28/20||Excited to announce FlyingSquid! Blog post, paper, and code now available!|
|02/28/20||New blog post on Software 2.0 and Data Programming: Lessons Learned and What's Next for systems and machine learning!|
|12/12/19||Presented Multi-Resolution weak Supervision for Sequential Data at NeurIPS 2019 in Vancouver!|
|11/13/19||Gave a talk on Rekall at Intel's Autonomous Driving Community of Practice Event!|
|10/27/19||Presented Video Event Specification using Programmatic Composition at AI Systems @ SOSP 2019 - poster and oral presentation!|
|10/21/19||Preprint of our paper on Multi-Resolution Weak Supervision for Sequential Data now available on arXiv!|
|10/09/19||A new blog post about Rekall up on the DAWN blog - Why Train What You Can Code? Rekall: A Compositional Approach to Video Analysis!|
|10/03/19||Our poster on Video Event Specification using Programmatic Composition accepted to AI Systems @ SOSP 2019!|
|09/03/19||Our paper on Multi-Resolution Weak Supervision for Sequential Data accepted to NeurIPS 2019!||05/15/19||I was awarded a Department of Defense NDSEG fellowship!|
|05/03/19||James Hong and I won a Magic Grant from the Brown Institute for Media Innovation for Public Analysis of TV News!|
Cable TV news reaches millions of U.S. households each day, and
decisions about who appears on the news, and what stories get talked
bout, can profoundly influence public opinion and discourse.
We use computational techniques to analyze a data set of nearly 24/7
video, audio, and text captions from three major U.S. cable TV
networks (CNN, FOX News, and MSNBC) from the last decade.
arXiv / stanford cable tv news analyzer
We take a first step towards building models that
can be iterated on at programmatically-interactive speeds (seconds
instead of hours or days).
We build on work in weak supervision and pre-trained embeddings to
create models that can approach the quality of training deep networks,
but without the cost of training to fine-tune feature representations.
arXiv / code / video
We present FlyingSquid, a new weak supervision framework that runs
orders of magnitude faster than previous work.
Our speedups come from a closed-form solution to latent variable
estimation, which enables weakly-supervised video analysis and online
arXiv / blog post / code / talk
We present a framework to apply weak supervision to multi-resolution
data like videos and time-series data that can handle sequential
correlations among supervision sources.
We experimentally validate our system over population-level video
datasets and gait sensor data.
We present Rekall, a data model and programming model for detecting
new events in video collections by composing the outputs of
We demonstrate the use of Rekall in analyzing video from cable TV
news broadcasts, films, static-camera vehicular video streams, and
commercial autonomous vehicle logs.
arXiv / blog post / code / demo videos
We study the problem of how to control flocking behavior in
We find that some strategies that have worked well in high-density
conditions are often less effective in lower-density environments,
and propose new strategies for low-density environments.
In an undergraduate thesis,
we also explored using genetic algorithms to evolve influencing agent
aamas paper / undergraduate thesis / code
The ASC project seeks to create a powerful, practical automatic
parallelization runtime by using machine learning to predict future
states of a program and running speculative executions from those
If the main program hits any of the predicted states, ASC can speed
up execution by skipping the main program to the end of the
has shown that this approach is promising, but did not
actually manage to achieve speedup on native hardware,
since it the main program on an emulator.
We have been working on making this approach work on native hardware.
We develop mathematical models to understand the complex response
of body temperature to methamphetamine and use our models to separate
out the individual components of the response.
We further analyze the ways that this response changes when
orexinergic neurotransmission is inhibited.
We analyze a family of delay differential equations used to model
genetic oscillatory networks, and use our findings to develop two new
mathematical methods for analyzing such models.
pdf / talk / coverage
In Summer 2017, I conducted a machine learning/data science
project for the Google Flights team to try to predict flight arrival
times using a mix of historical data and live positional data.
I built a few models (including two unique ML models) and trained
and evaluated them.
Some of these models could outperform official airline
estimates in many cases.
This exploratory work was converted into production after my
In Summer 2016, I did some work on image classification with the
Google My Business team (subset of maps).
I was very fortunate to intern at Tamr in summer 2015. I worked on two main projects - augmenting the main product to handle a new use case for a client, and working on Tamr on Google Cloud Platform Tamr on Google Cloud Platform prior to its launch. I've been told my internship work eventually helped land a client and launch a new product--but what I really took away from the internship was an early love for MLSys!
I did a lot of ballroom dancing in undergrad, and put together a few
apps to help me run the team.
One of the common problems is matching up new first-year dancers
based on their interests (and how much time they want to practice).
The ballroom matchmaker project uses a simple local search with a
complex cost function.
I also wrote a simple Python script to run competitive dance rounds
from my laptop.
Check out both projects below!
For the final project of an AI class I took in fall 2016, I helped
build a ballroom matchmaker program to intelligently form
partnerships in a rookie class.
We treated the task as a local search problem with a complex cost
We are hopeful that this tool will help make the lives of future
HBDT rookie captains much easier.
ballroom matchmaker / dance rounds
When I was a freshman in high school, I built a sports app for
students to report live game results with a writeup.
A surprisingly fun project looking back--my high school's first brush
with social media!