AI Control

Usage

agent0.py:

git clone [email protected]:msyvr/ai-control.git
cd ai-control
uv venv
source .venv/bin/activate
uv run src/agent0.py

Optionally, in agent0.py, provide a list of task lists to instruction_lists for multiple sequential trials, or a single task list to instruction_list for a single trial.

What is this?

AI Control is a subfield of AI safety research that aims to identify safeguard interventions capable of preventing a model from performing misaligned actions.

The src folder holds scaffolding for conducting AI Control experiments.

Currently, agent0.py enables basic experiments in which a (local) model is instructed to perform user-defined actions and must decide to either follow or defy instructions.

Both gridworlds and more advanced agent scaffolding will be added to src to provide some flexibility with different types of rudimentary AI Control experiments.

The goal of this work is simply to provide an entry point for AI Control experimentation.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Control

Usage

What is this?

About

Uh oh!

Releases

Packages

Languages

msyvr/ai-control

Folders and files

Latest commit

History

Repository files navigation

AI Control

Usage

What is this?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages