DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature

Eric Mitchell
Yoonho Lee
Alexander (Sasha) Khazatsky
Christopher D. Manning
Chelsea Finn
Stanford University
Paper
Twitter Thread
Implementation

Abstract
The fluency and factual knowledge of large language models (LLMs) heightens the need for corresponding systems to detect whether a piece of text is machine-written. For example, students may use LLMs to complete written assignments, leaving instructors unable to accurately assess student learning. In this paper, we first demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g, T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT.
TL;DR: We introduce DetectGPT, a method that detects samples from pre-trained LLMs using the local curvature of the model's log probability function.


Check out some other cool work building on DetectGPT!
  • Mireshghallah, Mattern, Gao, Shokri, & Berg-Kirkpatrick. Smaller Language Models are Better Black-box Machine-Generated Text Detectors, 2023. TL;DR: in the black box setting when we don't know which model generated the text, using a very small model to compute log probabilities for detection actually works best!
  • Bao, Zhao, Teng, Yang, & Zhang. Fast-DetectGPT: Efficient Zero-Shot Detection of Machine-Generated Text via Conditional Probability Curvature, 2023. TL;DR: we can make DetectGPT 300x faster while increasing its accuracy as well!
  • Mireshghallah, Jin, Schölkopf, Sachan, & Berg-Kirkpatrick. Membership Inference Attacks against Language Models via Neighbourhood Comparison, 2023. TL;DR: DetectGPT-style local curvature is also useful for membership inference attacks (identifying if a piece of data was in a model's train set)!
  • Citing the paper

    @inproceedings{mitchell2023detectgpt,
        author = {Mitchell, Eric and Lee, Yoonho and Khazatsky, Alexander and 
                  Manning, Christopher D. and Finn, Chelsea},
        title = {DetectGPT: Zero-Shot Machine-Generated Text Detection Using Probability Curvature},
        year = {2023},
        booktitle = {Proceedings of the 40th International Conference on Machine Learning},
        articleno = {1038},
        numpages = {13},
        series = {ICML'23}
    }


    This website is adapted from this website, which was adapted from this website, which was in turn adapted from this website. Feel free to use this website as a template for your own projects by referencing this!