profile_picture
Eric Anthony Mitchell
Ph.D. Candidate, Stanford University

I co-lead the Post-training Frontiers team at OpenAI with Yann Dubois. We post-train and deploy large frontier models like o1, o3, and GPT-5-Thinking.

Before OpenAI, I earned my PhD from Stanford’s CS department. I was fortunate to be advised by Chelsea Finn and Christopher D. Manning. My PhD work focused on making foundation models, particularly language models, more trustworthy and easy to use. Some particular topics of interest were (and still are) factuality, continual learning, intent understanding, and scalable oversight. Much of my PhD was generously supported by a Knight-Hennessy Graduate Fellowship and a Stanford Accelerator for Learning grant for Generative AI for the Future of Learning.

In the summer of 2022, I was a research scientist intern at DeepMind in London, where I was lucky to spend four months working with Junyoung Chung, Nate Kushman, and AƤron van den Oord.

Before my PhD, I was a research engineer at Samsung’s AI Center in New York City, where I learned constantly from Volkan Isler, Daniel D. Lee, and many other wonderful (and patient) people. As an undergraduate, I completed my thesis under the guidance of H. Sebastian Seung after many hours in the Seung Lab at the Princeton Neuroscience Institute. I was also a captain of Princeton’s varsity men’s golf team.

In my free time, I make music for guitar and voice. I enjoy the outdoors, particularly playing golf, exploring mountains, and SCUBA diving.

Selected Works

Fine-tuning Language Models for Factuality
Katherine Tian*, Eric Mitchell*, Huaxiu Yao,
Christopher D. Manning, Chelsea Finn
ICLR, 2024
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Eric Mitchell, Rafael Rafailov, Archit Sharma,
Chelsea Finn, Christopher D. Manning
ICLR, 2024
Meta-Learning Online Adaptation of Language Models
Nathan Hu*, Eric Mitchell*, Christopher D. Manning,
Chelsea Finn
EMNLP, 2023
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian*, Eric Mitchell*, Allan Zhou,
Archit Sharma, Rafael Rafailov, Huaxiu Yao,
Chelsea Finn, Christopher D. Manning
EMNLP, 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov*, Archit Sharma*, Eric Mitchell*,
Stefano Ermon, Christopher D. Manning, Chelsea Finn
NeurIPS (Oral), 2023
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
Eric Mitchell, Yoonho Lee, Sasha Khazatsky,
Christopher D. Manning, Chelsea Finn
ICML (Oral), 2023
Enhancing Self-Consistency and Performance of Pretrained Language Models with NLI
Eric Mitchell, Joseph J. Noh, Siyan Li,
William S. Armstrong, Ananth Agarwal, Patrick Liu,
Chelsea Finn, Christopher D. Manning
EMNLP (Oral), 2022
Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models
Peter Henderson*, Eric Mitchell*, Christopher D. Manning,
Dan Jurafsky, Chelsea Finn
AAAI/ACM Conference on AI, Ethics, and Society, 2022
Fast Model Editing at Scale
Eric Mitchell, Charles Lin, Antoine Bosselut,
Chelsea Finn, Christopher D. Manning
ICLR, 2022