This repository contains the code for reproducing the experiments and figures presented in the paper Statistical Collusion by Collectives on Learning Platforms.
The repository is organized into four main folders:
├── notebooks/
│ ├── dataset/
│ │ └── create_synthetic_dataset.ipynb # generates the dataset
│ ├── run/
│ │ ├── signal-planting.ipynb # signal planting using the feature-label strategy
│ │ ├── signal-planting-fo.ipynb # signal planting using the feature-only strategy
│ │ ├── signal-planting-N.ipynb # signal planting for y^* = Poor across different N values
│ │ └── signal-unplanting.ipynb # signal unplanting for y^* = Excellent
│ └── plot/ # generates plots based on runs
├── output/ # contains outputs generated by the code
├── plots/ # stores the final plots
├── src/
│ └── utils.py # main code used across all notebooks
This repository considers four key scenarios from the paper:
- Signal planting with feature-label strategy
- Signal planting with feature-only strategy
- Signal planting for y* = Poor across different values of N
- Signal unplanting for y* = Excellent
- Create the Dataset:
- Run the notebook
notebooks/dataset/create_synthetic_dataset.ipynb
to generate the dataset. - This will create a
data/
folder at the root level containing the dataset.
Note
The dataset is not included in this repository due to size constraints. Running the dataset generation notebook will create the necessary data locally.
- Run the Experiments:
- Navigate to
notebooks/run/
and execute any of the following notebooks based on the scenario you wish to reproduce:notebooks/run/signal-planting.ipynb
notebooks/run/signal-planting-fo.ipynb
notebooks/run/signal-planting-N.ipynb
notebooks/run/signal-unplanting.ipynb
- The outputs from the experiments will be stored in the
output/
folder.
- Plot the Figures:
- Execute the notebook in
notebooks/plot/
to generate the figures. - The resulting figures will be saved in the
plots/
folder.