Summary: We formalize the correctness of the main stochastic oracle doubly-efficient debate protocol from Brown-Cohen, Irving, Piliouras (2023), Scalable AI safety via doubly-efficient debate in Lean 4.
Irving, Christiano, Amodei (2018), AI safety via debate is one approach to AI alignment of strong agents, using two agents ("provers") competing in a zero-sum game to convince a human judge ("verifier") of the truth or falsity of a claim. Theoretically, if we model the judge as a polynomial time Turing machine, optimal play in the debate game can convince the judge of any statement in PSPACE. However, this theoretical model is limited in several ways: the agents are assumed to have unbounded computational power, which is not a realistic assumption for ML agents, and the results consider only deterministic arguments.
Brown-Cohen, Irving, Piliouras (2023), Scalable AI safety via doubly-efficient debate improves the complexity theoretic model of debate to be "doubly-efficient": both the provers and the verifier have limited computational power. It also treats stochastic arguments: the provers try to convince the judge of the result of a randomized computation involving calls to a random oracle. Concretely, the main formalized result is
Definition 6.1 (Lipschitz oracle machines). An oracle Turing machine
Theorem 6.2 (doubly-efficient stochastic oracle debate). Let
The formalized result differs from the paper result slightly in that we focus
only on correctness: we do not yet formalize space complexity, count only oracle
queries for time complexity, and represent those queries only via the code for
the protocol. We also define Lipschitz oracle machines differently: the paper
fixes
Prob/Defs.lean
: Finitely supported probability distributions, representing stochastic computations.Comp/Defs.lean
: Stochastic computations relative to a set of oracles.Comp/Oracle.lean
: Our computation model, including the definition of Lipschitz oracles.Debate/Protocol.lean
: The debate protocol, honest players, and the definition of correctness.Debate/Correct.lean
: The final correctness theorems.Debate/Details.lean
: Proof details.Debate/Cost.lean
: Query complexity of each player.
We thank Eric Wieser for his careful review and many helpful suggestions. The code is much better as a result!
- Install Lean 4 and Lake via elan: https://github.com/leanprover/elan
- Run
lake build
within the directory.
Copyright 2023 DeepMind Technologies Limited
All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0
All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode
Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.
This is not an official Google product.