Applied Deep Learning: Project Proposal

A deep learning project focused on detecting media bias using a multi-task learning approach. This model adapts the MAGPIE architecture to create a more computationally efficient solution while maintaining high accuracy in bias detection. A demo interface of the model can be tried out here.

📚 Table of Contents

Click to expand

Setup & Installation
- Prerequisites
- Step-by-Step Installation
Project Overview & Proposal
References
Hacking Phase Documentation
Demo Application
License

🛠 Setup & Installation

Prerequisites

Python 3.8 or higher
CUDA-compatible GPU or CPU
8GB RAM minimum

Step-by-Step Installation

Clone Repository

git clone [https://github.com/heddels/applied-deep-learning-project-24.git]
cd [repository-name]

Set up Python Environment

python -m venv venv
source venv/bin/activate  # Linux/Mac
# OR
venv\Scripts\activate     # Windows

Install Dependencies
```
pip install -r requirements.txt
```
Configure WandB
- Create account at wandb.ai
- Run wandb login and enter your API key
- Set up a new project and link it to your repository

Run Scripts

For testing the code:

python scripts/tests/train_debug.py

For training the baseline model:

 python scripts/training_baseline/pre_finetune.py
 python scripts/training_baseline/finetune.py

For hyperparameter tuning:

python scripts/hyperparameter_tuning/hyperparameter_tuning.py

For training the final model:

python scripts/training_final_model/train_prefinetuning_v2.py
python scripts/training_final_model/finetuning_BABE_final_model_robust.py

📋 Project Overview & Proposal

In this project, I aim to tackle the challenge of media bias detection, particularly in an age where information overload and cognitive biases make it increasingly difficult for individuals to critically analyze the media they consume. Drawing on established research in automated bias detection, I plan to build a model that helps identify potential biases in news articles, primarily as a reminder to remain critical without replacing personal judgment.

Idea and Approach

The project will be a mix of Bring Your Own Method and Bring Your Own Data. Existing models, such as MAGPIE (Horych et al., 2024) and earlier works like from Spinde et al. (2022), have already made significant advancements in automated media bias detection with deep learning models, particularly for English-language data. My approach involves adapting these models by simplifying the architecture to make it computationally more feasible while trying to preserve accuracy.

Key modifications include:

Replacing MAGPIE’s pre-trained model encoder RoBERTa with DistilBERT, which is more efficient and suitable for the available computational resources.
Simplifying and redesigning MAGPIE’s architecture, leveraging the foundational work of Spinde et al. (2022), whose model uses a simpler framework.
Reducing the number of datasets and tasks in the multitask learning setup to decrease the computational load.

To present my work, I aim to build a simple interface where users can input text and receive bias detection results. Menzner & Leidner (2024) developed a similar interface using GPT-3.5. However, I will implement the architecture described above for a more targeted and resource-efficient bias detection solution.

The Datasets

As mentioned above, in order for the project to be feasible, I will need to simplify the data for the multitask learning setup. Therefore, I will use the following datasets from the MAGPIE paper, being the smallest datasets in each task family, except for the News bias task family:

Task Family	Dataset	# sentences	Task
Subjective bias	CW_HARD (Hube and Fetahu, 2019)	6.843	Binary Classification
News bias	BABE (Spinde et al., 2021c)	3,672	Binary Classification
Hate speech	MeTooMA (Gautam et al., 2020)	7,388	Binary Classification
Gender bias	GAP (Webster et al., 2018)	4,373	Binary Classification
Sentiment analysis	MDGender (Din et al., 2020)	2,332	Binary Classification
Fake news	MPQA (Wilson, 2008)	5,508	Token-Level Classification
Emotionally	GoodNewsEveryone (Bostan et al., 2020)	4,428	Token-Level Classification
Group bias	StereotypeDataset (Pujari et al., 2022)	2,208	Binary Classification
Stance detection	GWSD (Luo et al., 2020)	2,010	Multi-Class Classification

The datasets are available in the MAGPIE repository, and I will use the same data preprocessing steps as in the original paper.

However, the final choice of datasets is subject to change based on the computational resources available and the model's performance during the hacking phase of the project.

Initial Work Breakdown Structure

The following table outlines the tasks and their respective time estimates and due dates for the project:

Task	Time Estimate (hrs)	Due Date
Dataset Collection	2	24.10.24
Designing and Building an Appropriate Network	15-20	17.11.24
Training and Fine-tuning that Network	20-25	17.12.24
Building an Application to Present the Results	15-20	05.01.25
Writing the Final Report	10	19.01.25
Preparing the Presentation of Your Work	5-10	28.01.25

📚 References

Horych, T., Wessel, M., Wahle, J. P., Ruas, T., Waßmuth, J., Greiner-Petter, A., Aizawa, A., Gipp, B., & Spinde, T. ( 2024). Magpie: Multi-task media-bias analysis generalization for pre-trained identification of expressions. Paper | Repository
Menzner, T., & Leidner, J. L. (2024). BiasScanner: Automatic detection and classification of news bias to strengthen democracy. arXiv. Paper | Web Page
Rodrigo-Ginés, F.-J., Carrillo-de-Albornoz, J., & Plaza, L. (2024). A systematic review on media bias detection: What is media bias, how it is expressed, and how to detect it. Expert Systems with Applications, 237, 121641. Paper
Spinde, T., Hinterreiter, S., Haak, F., Ruas, T., Giese, H., Meuschke, N., & Gipp, B. (2024). The media bias taxonomy: A systematic literature review on the forms and automated detection of media bias. Paper
Spinde, T., Krieger, J.-D., Ruas, T., Mitrovi´c, J., G¨otz-Hahn, F., Aizawa, A., & Gipp, B. (2022). Exploiting transformer-based multitask learning for the detection of media bias in news articles. In Information for a better world: Shaping the global future (pp. 225–235). Springer International Publishing. Paper | Repository

🔬 Hacking Phase Documentation

Brief Summary of Hacking Phase

Error Metric: Macro F1 Score
Target Metric: 0.78 (Spinde et al., 2022)

Note: Details regarding target in the Target Metric Specification section

Achieved Metrics:
- 0.71 (Maximum of Baseline Model runs)
- 0.78 (Maximum after hyperparameter tuning)

Note: Full results and analysis available in the Results section

Time Tracking

Initial Setup (12h)
- Environment setup and MLFlow configuration with MLOps tutorial (6h)
- Code understanding and repository analysis (6h)
First Implementation Attempt (30h)
- Data pipeline development (4h)
- Baseline model notebook (16h)
- Code modularization (10h)
Project Reset and Main Implementation (29h)
- New architecture setup (13h)
- Training pipeline debugging (11h)
- Baseline model training (4h)
- Hyperparameter optimization setup (6h)
Running Experiments (~52h compute time)
- Pre-finetuning run for baseline (8h)
- Finetuning across 30 seeds for baseline (16h)
- Hyperparameter optimization (16h)
- Final model training (12h)

Target Metric Specification

The error metric for this project is the (Macro) F1 score, which is the harmonic mean of precision and recall. It is chosen, because it is suited for the classification problem at hand and also used in the MAGPIE paper, which is the basis for this project.

The target for my project however is not the one achieved by MAGPIE, since I chose a simpler setup. Therefore, the target I take is the one from the other MTL Approach by Spinde et al. (2022), which is a Macro F1 score of 0.78 for the MTL Model.

Spinde et al. (2022):

Horych et al. (2024):

Final Model Architecture

The final model architecture is a simplified version of the MAGPIE model, using DistilBERT as the backbone and a multitask learning setup with 9 datasets containing 11 Subtasks. The model consists of the following components:

Data processing and handling
Tokenizer
Model architecture components
Training components
Source-specific utilities

The code was taken from the MAGPIE repository and adapted to the chosen setting. Apart from that, error handling and logging were added, and the code was modularized in a slightly different way.

Repository Structure

project_root/
├── README.md                        # Main documentation file
├── 2501_ADL_report.pdf              # Project report
├── requirements.txt                 # Package dependencies
├── setup.py                         # Installation configuration
├── bias-detection-app/              # Code and requirements for the Streamlit app
├── datasets/                        # Raw and processed data files
├── src/                             # Source code directory
│   │
│   ├── tokenizer.py                 # Text tokenization logic
│   ├── config/                      # Configuration files
│   │   ├── config.py                # Model and training settings
│   ├── data/                        # Data handling
│   │   ├── __init__.py              # Package initialization
│   │   ├── task.py                  # Task definitions
│   │   ├── dataset.py               # Dataset operations
│   │
│   ├── model/                       # Model components
│   │   ├── model.py                 # Main MTL implementation
│   │   ├── heads.py                 # Task-specific layers
│   │   ├── model_factory.py         # Model creation
│   │   └── backbone.py              # Base DistilBERT model
│   │
│   ├── training/                    # Training logic
│   │   ├── trainer.py               # Training loop
│   │   ├── logger.py                # WandB logging
│   │   ├── metrics.py               # Performance metrics
│   │   ├── gradient.py              # Gradient operations
│   │   └── training_utils.py        # Helper functions
│   │
│   └── utils/                       # Utility functions
│       ├── common.py                # Shared utilities
│       ├── enums.py                 # Constants and enums
│       └── logger.py                # Debug logging
│
├── research/                        # Analysis notebooks
│   ├── magpie_repo_test.ipynb       # MAGPIE testing
│   └── updated_code_test.ipynb      # Updated Code validation
│
└── scripts/                         # Execution scripts
    │
    ├── tests/                       # Testing scripts
    │   ├── train_debug.py           # Single-step test
    │   └── full_train_debug.py      # Full pipeline test
    │
    ├── training_baseline/            # Baseline training
    │   ├── pre_finetune.py          # Initial training
    │   └── finetune.py              # Fine-tuning
    │   
    ├── hyperparameter_tuning/        # Parameter optimization
    │   └── hyperparameter_tuning.py  # Grid search
    │
    └── training_final_model/         # Production training
        ├── train_prefinetuning_v2.py # Enhanced pre-training
        └── finetuning_BABE_final_model_robust.py  # Final model training

Training and Evaluation

The process of training the model is as follows:

Data Initialization (preprocessed from repository)
Train a baseline model with the hyperparameter setting of the MAGPIE paper:
- pre-finetune the DistilBERT Model on all datasets except for the BABE dataset
- finetune the model on the BABE dataset (Subtask 1) and compare over different random seeds
Perform hyperparameter tuning to find the optimal hyperparameters for the model (see next chapter for details)
Train and evaluate the final model with the optimal hyperparameters

Results

All training steps were done on my computer (MacBook Air M2, 16GB RAM, 8 cores) and the results were tracked with wandb.

Baseline Model

Pre Finetuning Results (plots from wandb):

100 steps for pre-finetuning, should be increased, loss is still "moving around" quite a lot
Some tasks perform very bad, with F1 Scores under 0.5:
- MeTooMA (108, F1 <0.1)
- MDGender (116, F1 <0.35)
- Stereotype (109 subtask 2, F1 <0.45)

Finetuning with BABE (over 30 seeds, plots from wandb):

Mean Test F1: 68,77% Max Test F1: 71,43%

50 steps for finetuning, should be increased, loss is still "moving around" quite a lot

Hyperparameter Tuning

Since in the MAGPIE repository, they already did a hyperparameter tuning for the subtasks for:

Learning rate
Max epochs and
Early stopping patience,

I will use the hyperparameters from the MAGPIE paper for both, pre-finetuning and finetuning and increase the number of max_steps to 500 in comparison to my baseline, as well as the warmup steps to 10% of the max_steps for the pre-finetuning.

For the finetuning step, I will do a hyperparameter optimization with a grid search for the following parameters:

Dropout rate for regularization
Batch size variations
Warmup steps for learning rate scheduler

Results:

Best configuration: dropout_rate: 0.1, batch_size: 64 for 01, warmup_steps: 100 (for 500 steps)

Final Model Results

With the optimal hyperparameters from the hyperparameter optimization, I trained the final model finetuning step over 10 random seeds.

Metric	Mean	Std Dev	Min	Max
F1 Score	0.7587	0.0108	0.7415	0.7773
Accuracy	0.7772	0.0098	0.7604	0.7946
Loss	0.5120	0.0134	0.4920	0.5282

Open Issues

Did not build a proper CI pipeline (only manual testing)
Config File for hyperparameters etc. not in a good format and should be in a different place
Better organization of the scripts would have helped to build the code base

🚀 Demo Application

For the demo application, I built a simple Streamlit app that allows users to input text and receive bias detection results. In order to use the trained model, I uploaded the final model weights to hugginghace.co.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Last updated: January 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Applied Deep Learning: Project Proposal

📚 Table of Contents

🛠 Setup & Installation

Prerequisites

Step-by-Step Installation

📋 Project Overview & Proposal

Idea and Approach

The Datasets

Initial Work Breakdown Structure

📚 References

🔬 Hacking Phase Documentation

Brief Summary of Hacking Phase

Time Tracking

Target Metric Specification

Final Model Architecture

Repository Structure

Training and Evaluation

Results

Baseline Model

Pre Finetuning Results (plots from wandb):

Finetuning with BABE (over 30 seeds, plots from wandb):

Hyperparameter Tuning

Final Model Results

Open Issues

🚀 Demo Application

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
bias-detection-app		bias-detection-app
datasets		datasets
plots		plots
research		research
results/babe_robust_evaluation_final		results/babe_robust_evaluation_final
scripts		scripts
src		src
.gitignore		.gitignore
2501_ADL_report.pdf		2501_ADL_report.pdf
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

heddels/applied-deep-learning-project-24

Folders and files

Latest commit

History

Repository files navigation

Applied Deep Learning: Project Proposal

📚 Table of Contents

🛠 Setup & Installation

Prerequisites

Step-by-Step Installation

📋 Project Overview & Proposal

Idea and Approach

The Datasets

Initial Work Breakdown Structure

📚 References

🔬 Hacking Phase Documentation

Brief Summary of Hacking Phase

Time Tracking

Target Metric Specification

Final Model Architecture

Repository Structure

Training and Evaluation

Results

Baseline Model

Pre Finetuning Results (plots from wandb):

Finetuning with BABE (over 30 seeds, plots from wandb):

Hyperparameter Tuning

Final Model Results

Open Issues

🚀 Demo Application

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages