Eric Mitchell

Eric Anthony Mitchell

Ph.D. Candidate, Stanford University

I co-lead the Post-training Frontiers team at OpenAI with Yann Dubois. We post-train and deploy large frontier models like o1, o3, and GPT-5-Thinking.

Before OpenAI, I earned my PhD from Stanford’s CS department. I was fortunate to be advised by Chelsea Finn and Christopher D. Manning. My PhD work focused on making foundation models, particularly language models, more trustworthy and easy to use. Some particular topics of interest were (and still are) factuality, continual learning, intent understanding, and scalable oversight. Much of my PhD was generously supported by a Knight-Hennessy Graduate Fellowship and a Stanford Accelerator for Learning grant for Generative AI for the Future of Learning.

In the summer of 2022, I was a research scientist intern at DeepMind in London, where I was lucky to spend four months working with Junyoung Chung, Nate Kushman, and Aäron van den Oord.

Before my PhD, I was a research engineer at Samsung’s AI Center in New York City, where I learned constantly from Volkan Isler, Daniel D. Lee, and many other wonderful (and patient) people. As an undergraduate, I completed my thesis under the guidance of H. Sebastian Seung after many hours in the Seung Lab at the Princeton Neuroscience Institute. I was also a captain of Princeton’s varsity men’s golf team.

In my free time, I make music for guitar and voice. I enjoy the outdoors, particularly playing golf, exploring mountains, and SCUBA diving.

Selected Works

Fine-tuning Language Models for Factuality

Katherine Tian*, Eric Mitchell*, Huaxiu Yao,
Christopher D. Manning, Chelsea Finn
ICLR, 2024

paper

An Emulator for Fine-Tuning Large Language Models using Small Language Models

Eric Mitchell, Rafael Rafailov, Archit Sharma,
Chelsea Finn, Christopher D. Manning
ICLR, 2024

paper

Meta-Learning Online Adaptation of Language Models

Nathan Hu*, Eric Mitchell*, Christopher D. Manning,
Chelsea Finn
EMNLP, 2023

code paper

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

Katherine Tian*, Eric Mitchell*, Allan Zhou,
Archit Sharma, Rafael Rafailov, Huaxiu Yao,
Chelsea Finn, Christopher D. Manning
EMNLP, 2023

paper

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov*, Archit Sharma*, Eric Mitchell*,
Stefano Ermon, Christopher D. Manning, Chelsea Finn
NeurIPS (Oral), 2023

code paper

DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

Eric Mitchell, Yoonho Lee, Sasha Khazatsky,
Christopher D. Manning, Chelsea Finn
ICML (Oral), 2023

code paper website

Enhancing Self-Consistency and Performance of Pretrained Language Models with NLI

Eric Mitchell, Joseph J. Noh, Siyan Li,
William S. Armstrong, Ananth Agarwal, Patrick Liu,
Chelsea Finn, Christopher D. Manning
EMNLP (Oral), 2022

code paper website

Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models

Peter Henderson*, Eric Mitchell*, Christopher D. Manning,
Dan Jurafsky, Chelsea Finn
AAAI/ACM Conference on AI, Ethics, and Society, 2022

code paper

Fast Model Editing at Scale

Eric Mitchell, Charles Lin, Antoine Bosselut,
Chelsea Finn, Christopher D. Manning
ICLR, 2022

code paper website