Skip to content

msyvr/ai-control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Control

Usage

agent0.py:

git clone [email protected]:msyvr/ai-control.git
cd ai-control
uv venv
source .venv/bin/activate
uv run src/agent0.py

Optionally, in agent0.py, provide a list of task lists to instruction_lists for multiple sequential trials, or a single task list to instruction_list for a single trial.

What is this?

AI Control is a subfield of AI safety research that aims to identify safeguard interventions capable of preventing a model from performing misaligned actions.

The src folder holds scaffolding for conducting AI Control experiments.

Currently, agent0.py enables basic experiments in which a (local) model is instructed to perform user-defined actions and must decide to either follow or defy instructions.

Both gridworlds and more advanced agent scaffolding will be added to src to provide some flexibility with different types of rudimentary AI Control experiments.

The goal of this work is simply to provide an entry point for AI Control experimentation.

About

scaffolding for basic AI Control research experiments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages