agent0.py
:
git clone [email protected]:msyvr/ai-control.git
cd ai-control
uv venv
source .venv/bin/activate
uv run src/agent0.py
Optionally, in agent0.py
, provide a list of task lists to instruction_lists
for multiple sequential trials, or a single task list to instruction_list
for a single trial.
AI Control is a subfield of AI safety research that aims to identify safeguard interventions capable of preventing a model from performing misaligned actions.
The src
folder holds scaffolding for conducting AI Control experiments.
Currently, agent0.py
enables basic experiments in which a (local) model is instructed to perform user-defined actions and must decide to either follow or defy instructions.
Both gridworlds and more advanced agent scaffolding will be added to src
to provide some flexibility with different types of rudimentary AI Control experiments.
The goal of this work is simply to provide an entry point for AI Control experimentation.