Skip to content

heddels/applied-deep-learning-project-24

Repository files navigation

Applied Deep Learning: Project Proposal

Project Status: Completed Python 3.8+ License: MIT

A deep learning project focused on detecting media bias using a multi-task learning approach. This model adapts the MAGPIE architecture to create a more computationally efficient solution while maintaining high accuracy in bias detection. A demo interface of the model can be tried out here.

πŸ“š Table of Contents

Click to expand

πŸ›  Setup & Installation

Prerequisites

  • Python 3.8 or higher
  • CUDA-compatible GPU or CPU
  • 8GB RAM minimum

Step-by-Step Installation

  1. Clone Repository

    git clone [https://github.com/heddels/applied-deep-learning-project-24.git]
    cd [repository-name]
  2. Set up Python Environment

    python -m venv venv
    source venv/bin/activate  # Linux/Mac
    # OR
    venv\Scripts\activate     # Windows
  3. Install Dependencies

    pip install -r requirements.txt
  4. Configure WandB

    • Create account at wandb.ai
    • Run wandb login and enter your API key
    • Set up a new project and link it to your repository
  5. Run Scripts

    • For testing the code:
    python scripts/tests/train_debug.py
    • For training the baseline model:
     python scripts/training_baseline/pre_finetune.py
     python scripts/training_baseline/finetune.py
    • For hyperparameter tuning:
    python scripts/hyperparameter_tuning/hyperparameter_tuning.py
    • For training the final model:
    python scripts/training_final_model/train_prefinetuning_v2.py
    python scripts/training_final_model/finetuning_BABE_final_model_robust.py

πŸ“‹ Project Overview & Proposal

In this project, I aim to tackle the challenge of media bias detection, particularly in an age where information overload and cognitive biases make it increasingly difficult for individuals to critically analyze the media they consume. Drawing on established research in automated bias detection, I plan to build a model that helps identify potential biases in news articles, primarily as a reminder to remain critical without replacing personal judgment.

Idea and Approach

The project will be a mix of Bring Your Own Method and Bring Your Own Data. Existing models, such as MAGPIE (Horych et al., 2024) and earlier works like from Spinde et al. (2022), have already made significant advancements in automated media bias detection with deep learning models, particularly for English-language data. My approach involves adapting these models by simplifying the architecture to make it computationally more feasible while trying to preserve accuracy.

Key modifications include:

  • Replacing MAGPIE’s pre-trained model encoder RoBERTa with DistilBERT, which is more efficient and suitable for the available computational resources.
  • Simplifying and redesigning MAGPIE’s architecture, leveraging the foundational work of Spinde et al. (2022), whose model uses a simpler framework.
  • Reducing the number of datasets and tasks in the multitask learning setup to decrease the computational load.

To present my work, I aim to build a simple interface where users can input text and receive bias detection results. Menzner & Leidner (2024) developed a similar interface using GPT-3.5. However, I will implement the architecture described above for a more targeted and resource-efficient bias detection solution.

The Datasets

As mentioned above, in order for the project to be feasible, I will need to simplify the data for the multitask learning setup. Therefore, I will use the following datasets from the MAGPIE paper, being the smallest datasets in each task family, except for the News bias task family:

Task Family Dataset # sentences Task
Subjective bias CW_HARD (Hube and Fetahu, 2019) 6.843 Binary Classification
News bias BABE (Spinde et al., 2021c) 3,672 Binary Classification
Hate speech MeTooMA (Gautam et al., 2020) 7,388 Binary Classification
Gender bias GAP (Webster et al., 2018) 4,373 Binary Classification
Sentiment analysis MDGender (Din et al., 2020) 2,332 Binary Classification
Fake news MPQA (Wilson, 2008) 5,508 Token-Level Classification
Emotionally GoodNewsEveryone (Bostan et al., 2020) 4,428 Token-Level Classification
Group bias StereotypeDataset (Pujari et al., 2022) 2,208 Binary Classification
Stance detection GWSD (Luo et al., 2020) 2,010 Multi-Class Classification

The datasets are available in the MAGPIE repository, and I will use the same data preprocessing steps as in the original paper.

However, the final choice of datasets is subject to change based on the computational resources available and the model's performance during the hacking phase of the project.

Initial Work Breakdown Structure

The following table outlines the tasks and their respective time estimates and due dates for the project:

Task Time Estimate (hrs) Due Date
Dataset Collection 2 24.10.24
Designing and Building an Appropriate Network 15-20 17.11.24
Training and Fine-tuning that Network 20-25 17.12.24
Building an Application to Present the Results 15-20 05.01.25
Writing the Final Report 10 19.01.25
Preparing the Presentation of Your Work 5-10 28.01.25

πŸ“š References

  1. Horych, T., Wessel, M., Wahle, J. P., Ruas, T., Waßmuth, J., Greiner-Petter, A., Aizawa, A., Gipp, B., & Spinde, T. ( 2024). Magpie: Multi-task media-bias analysis generalization for pre-trained identification of expressions. Paper | Repository

  2. Menzner, T., & Leidner, J. L. (2024). BiasScanner: Automatic detection and classification of news bias to strengthen democracy. arXiv. Paper | Web Page

  3. Rodrigo-GinΓ©s, F.-J., Carrillo-de-Albornoz, J., & Plaza, L. (2024). A systematic review on media bias detection: What is media bias, how it is expressed, and how to detect it. Expert Systems with Applications, 237, 121641. Paper

  4. Spinde, T., Hinterreiter, S., Haak, F., Ruas, T., Giese, H., Meuschke, N., & Gipp, B. (2024). The media bias taxonomy: A systematic literature review on the forms and automated detection of media bias. Paper

  5. Spinde, T., Krieger, J.-D., Ruas, T., MitroviΒ΄c, J., GΒ¨otz-Hahn, F., Aizawa, A., & Gipp, B. (2022). Exploiting transformer-based multitask learning for the detection of media bias in news articles. In Information for a better world: Shaping the global future (pp. 225–235). Springer International Publishing. Paper | Repository


πŸ”¬ Hacking Phase Documentation

Brief Summary of Hacking Phase

  • Error Metric: Macro F1 Score
  • Target Metric: 0.78 (Spinde et al., 2022)

Note: Details regarding target in the Target Metric Specification section

  • Achieved Metrics:
    • 0.71 (Maximum of Baseline Model runs)
    • 0.78 (Maximum after hyperparameter tuning)

Note: Full results and analysis available in the Results section

Time Tracking

  1. Initial Setup (12h)

    • Environment setup and MLFlow configuration with MLOps tutorial (6h)
    • Code understanding and repository analysis (6h)
  2. First Implementation Attempt (30h)

    • Data pipeline development (4h)
    • Baseline model notebook (16h)
    • Code modularization (10h)
  3. Project Reset and Main Implementation (29h)

    • New architecture setup (13h)
    • Training pipeline debugging (11h)
    • Baseline model training (4h)
    • Hyperparameter optimization setup (6h)
  4. Running Experiments (~52h compute time)

    • Pre-finetuning run for baseline (8h)
    • Finetuning across 30 seeds for baseline (16h)
    • Hyperparameter optimization (16h)
    • Final model training (12h)

Target Metric Specification

The error metric for this project is the (Macro) F1 score, which is the harmonic mean of precision and recall. It is chosen, because it is suited for the classification problem at hand and also used in the MAGPIE paper, which is the basis for this project.

The target for my project however is not the one achieved by MAGPIE, since I chose a simpler setup. Therefore, the target I take is the one from the other MTL Approach by Spinde et al. (2022), which is a Macro F1 score of 0.78 for the MTL Model.

Spinde et al. (2022):

img.png

Horych et al. (2024):

img_1.png

Final Model Architecture

The final model architecture is a simplified version of the MAGPIE model, using DistilBERT as the backbone and a multitask learning setup with 9 datasets containing 11 Subtasks. The model consists of the following components:

  • Data processing and handling
  • Tokenizer
  • Model architecture components
  • Training components
  • Source-specific utilities

The code was taken from the MAGPIE repository and adapted to the chosen setting. Apart from that, error handling and logging were added, and the code was modularized in a slightly different way.

Repository Structure

project_root/
β”œβ”€β”€ README.md                        # Main documentation file
β”œβ”€β”€ 2501_ADL_report.pdf              # Project report
β”œβ”€β”€ requirements.txt                 # Package dependencies
β”œβ”€β”€ setup.py                         # Installation configuration
β”œβ”€β”€ bias-detection-app/              # Code and requirements for the Streamlit app
β”œβ”€β”€ datasets/                        # Raw and processed data files
β”œβ”€β”€ src/                             # Source code directory
β”‚   β”‚
β”‚   β”œβ”€β”€ tokenizer.py                 # Text tokenization logic
β”‚   β”œβ”€β”€ config/                      # Configuration files
β”‚   β”‚   β”œβ”€β”€ config.py                # Model and training settings
β”‚   β”œβ”€β”€ data/                        # Data handling
β”‚   β”‚   β”œβ”€β”€ __init__.py              # Package initialization
β”‚   β”‚   β”œβ”€β”€ task.py                  # Task definitions
β”‚   β”‚   β”œβ”€β”€ dataset.py               # Dataset operations
β”‚   β”‚
β”‚   β”œβ”€β”€ model/                       # Model components
β”‚   β”‚   β”œβ”€β”€ model.py                 # Main MTL implementation
β”‚   β”‚   β”œβ”€β”€ heads.py                 # Task-specific layers
β”‚   β”‚   β”œβ”€β”€ model_factory.py         # Model creation
β”‚   β”‚   └── backbone.py              # Base DistilBERT model
β”‚   β”‚
β”‚   β”œβ”€β”€ training/                    # Training logic
β”‚   β”‚   β”œβ”€β”€ trainer.py               # Training loop
β”‚   β”‚   β”œβ”€β”€ logger.py                # WandB logging
β”‚   β”‚   β”œβ”€β”€ metrics.py               # Performance metrics
β”‚   β”‚   β”œβ”€β”€ gradient.py              # Gradient operations
β”‚   β”‚   └── training_utils.py        # Helper functions
β”‚   β”‚
β”‚   └── utils/                       # Utility functions
β”‚       β”œβ”€β”€ common.py                # Shared utilities
β”‚       β”œβ”€β”€ enums.py                 # Constants and enums
β”‚       └── logger.py                # Debug logging
β”‚
β”œβ”€β”€ research/                        # Analysis notebooks
β”‚   β”œβ”€β”€ magpie_repo_test.ipynb       # MAGPIE testing
β”‚   └── updated_code_test.ipynb      # Updated Code validation
β”‚
└── scripts/                         # Execution scripts
    β”‚
    β”œβ”€β”€ tests/                       # Testing scripts
    β”‚   β”œβ”€β”€ train_debug.py           # Single-step test
    β”‚   └── full_train_debug.py      # Full pipeline test
    β”‚
    β”œβ”€β”€ training_baseline/            # Baseline training
    β”‚   β”œβ”€β”€ pre_finetune.py          # Initial training
    β”‚   └── finetune.py              # Fine-tuning
    β”‚   
    β”œβ”€β”€ hyperparameter_tuning/        # Parameter optimization
    β”‚   └── hyperparameter_tuning.py  # Grid search
    β”‚
    └── training_final_model/         # Production training
        β”œβ”€β”€ train_prefinetuning_v2.py # Enhanced pre-training
        └── finetuning_BABE_final_model_robust.py  # Final model training

Training and Evaluation

The process of training the model is as follows:

  1. Data Initialization (preprocessed from repository)

  2. Train a baseline model with the hyperparameter setting of the MAGPIE paper:

    • pre-finetune the DistilBERT Model on all datasets except for the BABE dataset
    • finetune the model on the BABE dataset (Subtask 1) and compare over different random seeds
  3. Perform hyperparameter tuning to find the optimal hyperparameters for the model (see next chapter for details)

  4. Train and evaluate the final model with the optimal hyperparameters

Results

All training steps were done on my computer (MacBook Air M2, 16GB RAM, 8 cores) and the results were tracked with wandb.

Baseline Model

Pre Finetuning Results (plots from wandb):

img.png img.png

  • 100 steps for pre-finetuning, should be increased, loss is still "moving around" quite a lot
  • Some tasks perform very bad, with F1 Scores under 0.5:
    • MeTooMA (108, F1 <0.1)
    • MDGender (116, F1 <0.35)
    • Stereotype (109 subtask 2, F1 <0.45)
Finetuning with BABE (over 30 seeds, plots from wandb):

Mean Test F1: 68,77% Max Test F1: 71,43% img_3.png img_3.png img_3.png

  • 50 steps for finetuning, should be increased, loss is still "moving around" quite a lot

Hyperparameter Tuning

Since in the MAGPIE repository, they already did a hyperparameter tuning for the subtasks for:

  • Learning rate
  • Max epochs and
  • Early stopping patience,

I will use the hyperparameters from the MAGPIE paper for both, pre-finetuning and finetuning and increase the number of max_steps to 500 in comparison to my baseline, as well as the warmup steps to 10% of the max_steps for the pre-finetuning.

For the finetuning step, I will do a hyperparameter optimization with a grid search for the following parameters:

  • Dropout rate for regularization
  • Batch size variations
  • Warmup steps for learning rate scheduler

Results: img.png img_2.png

  • Best configuration: dropout_rate: 0.1, batch_size: 64 for 01, warmup_steps: 100 (for 500 steps)

Final Model Results

With the optimal hyperparameters from the hyperparameter optimization, I trained the final model finetuning step over 10 random seeds.

Metric Mean Std Dev Min Max
F1 Score 0.7587 0.0108 0.7415 0.7773
Accuracy 0.7772 0.0098 0.7604 0.7946
Loss 0.5120 0.0134 0.4920 0.5282

Open Issues

  • Did not build a proper CI pipeline (only manual testing)
  • Config File for hyperparameters etc. not in a good format and should be in a different place
  • Better organization of the scripts would have helped to build the code base

πŸš€ Demo Application

For the demo application, I built a simple Streamlit app that allows users to input text and receive bias detection results. In order to use the trained model, I uploaded the final model weights to hugginghace.co.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Last updated: January 14, 2025

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published