Dan Fu

About Me

I'm building a world-class kernels team at Together AI. Please reach out if you're interested in joining the team!

I'm an assistant professor at UCSD leading the SandyResearch Lab and affiliated with the MLSys group. I'm also VP of Kernels at Together AI, where I'm thrilled to be building a world-class team focused on low-level performance engineering and GPU kernels. My research focuses on making machine learning models faster and more efficient, especially by designing efficient ML architectures that scale well on modern hardware.

Selected Research Interests:

Hardware-aware algorithms for ML - how can we build the best systems & kernels algorithms for ML primitives like attention on modern hardware? ThunderKittens, ThunderMLA, FlashAttention, FlashFFTConv
Efficient ML architectures - how do we change ML models to scale better along, e.g. sequence length? Chipmunk, Hungry Hungry Hippos, Monarch Mixer

Some links:

An Opinionated Perspective on the Paths to AGI
My lab website: SandyResearch
How I Structure Introductions to Research Papers
The Coffee Experiment
Stanford MLSys Seminar on YouTube (we started during the pandemic)

About Me

Latest News

All Publications

Open-Source Artifacts

Teaching