Work

I’m chief scientist and co-founder of Goodfire. We’re an AI interpretability startup.

I was at DeepMind from 2019 to late 2023, where I worked on:

Interpretability for LLMs (e.g. the Hydra Effect, Copy Suppression) and AlphaZero.
Science of training data.
RLHF data quality and self-annotation.
Evaluation of generalist deep RL agents.

I did my PhD (thesis) at Imperial College with Nick Jones and Kevin Murphy.

Research

If there’s something on there you’re interested in collaborating on, please get in touch!

I have a substack if you prefer to read there.

Email is probably best, but you can reach me on Twitter or LinkedIn as well.