A deep learning project focused on detecting media bias using a multi-task learning approach. This model adapts the MAGPIE architecture to create a more computationally efficient solution while maintaining high accuracy in bias detection. A demo interface of the model can be tried out here.
Click to expand
- Python 3.8 or higher
- CUDA-compatible GPU or CPU
- 8GB RAM minimum
-
Clone Repository
git clone [https://github.com/heddels/applied-deep-learning-project-24.git] cd [repository-name]
-
Set up Python Environment
python -m venv venv source venv/bin/activate # Linux/Mac # OR venv\Scripts\activate # Windows
-
Install Dependencies
pip install -r requirements.txt
-
Configure WandB
- Create account at wandb.ai
- Run
wandb login
and enter your API key - Set up a new project and link it to your repository
-
Run Scripts
- For testing the code:
python scripts/tests/train_debug.py
- For training the baseline model:
python scripts/training_baseline/pre_finetune.py python scripts/training_baseline/finetune.py
- For hyperparameter tuning:
python scripts/hyperparameter_tuning/hyperparameter_tuning.py
- For training the final model:
python scripts/training_final_model/train_prefinetuning_v2.py python scripts/training_final_model/finetuning_BABE_final_model_robust.py
In this project, I aim to tackle the challenge of media bias detection, particularly in an age where information overload and cognitive biases make it increasingly difficult for individuals to critically analyze the media they consume. Drawing on established research in automated bias detection, I plan to build a model that helps identify potential biases in news articles, primarily as a reminder to remain critical without replacing personal judgment.
The project will be a mix of Bring Your Own Method and Bring Your Own Data. Existing models, such as MAGPIE (Horych et al., 2024) and earlier works like from Spinde et al. (2022), have already made significant advancements in automated media bias detection with deep learning models, particularly for English-language data. My approach involves adapting these models by simplifying the architecture to make it computationally more feasible while trying to preserve accuracy.
Key modifications include:
- Replacing MAGPIEβs pre-trained model encoder RoBERTa with DistilBERT, which is more efficient and suitable for the available computational resources.
- Simplifying and redesigning MAGPIEβs architecture, leveraging the foundational work of Spinde et al. (2022), whose model uses a simpler framework.
- Reducing the number of datasets and tasks in the multitask learning setup to decrease the computational load.
To present my work, I aim to build a simple interface where users can input text and receive bias detection results. Menzner & Leidner (2024) developed a similar interface using GPT-3.5. However, I will implement the architecture described above for a more targeted and resource-efficient bias detection solution.
As mentioned above, in order for the project to be feasible, I will need to simplify the data for the multitask learning setup. Therefore, I will use the following datasets from the MAGPIE paper, being the smallest datasets in each task family, except for the News bias task family:
Task Family | Dataset | # sentences | Task |
---|---|---|---|
Subjective bias | CW_HARD (Hube and Fetahu, 2019) | 6.843 | Binary Classification |
News bias | BABE (Spinde et al., 2021c) | 3,672 | Binary Classification |
Hate speech | MeTooMA (Gautam et al., 2020) | 7,388 | Binary Classification |
Gender bias | GAP (Webster et al., 2018) | 4,373 | Binary Classification |
Sentiment analysis | MDGender (Din et al., 2020) | 2,332 | Binary Classification |
Fake news | MPQA (Wilson, 2008) | 5,508 | Token-Level Classification |
Emotionally | GoodNewsEveryone (Bostan et al., 2020) | 4,428 | Token-Level Classification |
Group bias | StereotypeDataset (Pujari et al., 2022) | 2,208 | Binary Classification |
Stance detection | GWSD (Luo et al., 2020) | 2,010 | Multi-Class Classification |
The datasets are available in the MAGPIE repository, and I will use the same data preprocessing steps as in the original paper.
However, the final choice of datasets is subject to change based on the computational resources available and the model's performance during the hacking phase of the project.
The following table outlines the tasks and their respective time estimates and due dates for the project:
Task | Time Estimate (hrs) | Due Date |
---|---|---|
Dataset Collection | 2 | 24.10.24 |
Designing and Building an Appropriate Network | 15-20 | 17.11.24 |
Training and Fine-tuning that Network | 20-25 | 17.12.24 |
Building an Application to Present the Results | 15-20 | 05.01.25 |
Writing the Final Report | 10 | 19.01.25 |
Preparing the Presentation of Your Work | 5-10 | 28.01.25 |
-
Horych, T., Wessel, M., Wahle, J. P., Ruas, T., WaΓmuth, J., Greiner-Petter, A., Aizawa, A., Gipp, B., & Spinde, T. ( 2024). Magpie: Multi-task media-bias analysis generalization for pre-trained identification of expressions. Paper | Repository
-
Menzner, T., & Leidner, J. L. (2024). BiasScanner: Automatic detection and classification of news bias to strengthen democracy. arXiv. Paper | Web Page
-
Rodrigo-GinΓ©s, F.-J., Carrillo-de-Albornoz, J., & Plaza, L. (2024). A systematic review on media bias detection: What is media bias, how it is expressed, and how to detect it. Expert Systems with Applications, 237, 121641. Paper
-
Spinde, T., Hinterreiter, S., Haak, F., Ruas, T., Giese, H., Meuschke, N., & Gipp, B. (2024). The media bias taxonomy: A systematic literature review on the forms and automated detection of media bias. Paper
-
Spinde, T., Krieger, J.-D., Ruas, T., MitroviΒ΄c, J., GΒ¨otz-Hahn, F., Aizawa, A., & Gipp, B. (2022). Exploiting transformer-based multitask learning for the detection of media bias in news articles. In Information for a better world: Shaping the global future (pp. 225β235). Springer International Publishing. Paper | Repository
- Error Metric: Macro F1 Score
- Target Metric: 0.78 (Spinde et al., 2022)
Note: Details regarding target in the Target Metric Specification section
- Achieved Metrics:
- 0.71 (Maximum of Baseline Model runs)
- 0.78 (Maximum after hyperparameter tuning)
Note: Full results and analysis available in the Results section
-
Initial Setup (12h)
- Environment setup and MLFlow configuration with MLOps tutorial (6h)
- Code understanding and repository analysis (6h)
-
First Implementation Attempt (30h)
- Data pipeline development (4h)
- Baseline model notebook (16h)
- Code modularization (10h)
-
Project Reset and Main Implementation (29h)
- New architecture setup (13h)
- Training pipeline debugging (11h)
- Baseline model training (4h)
- Hyperparameter optimization setup (6h)
-
Running Experiments (~52h compute time)
- Pre-finetuning run for baseline (8h)
- Finetuning across 30 seeds for baseline (16h)
- Hyperparameter optimization (16h)
- Final model training (12h)
The error metric for this project is the (Macro) F1 score, which is the harmonic mean of precision and recall. It is chosen, because it is suited for the classification problem at hand and also used in the MAGPIE paper, which is the basis for this project.
The target for my project however is not the one achieved by MAGPIE, since I chose a simpler setup. Therefore, the target I take is the one from the other MTL Approach by Spinde et al. (2022), which is a Macro F1 score of 0.78 for the MTL Model.
Spinde et al. (2022):
Horych et al. (2024):
The final model architecture is a simplified version of the MAGPIE model, using DistilBERT as the backbone and a multitask learning setup with 9 datasets containing 11 Subtasks. The model consists of the following components:
- Data processing and handling
- Tokenizer
- Model architecture components
- Training components
- Source-specific utilities
The code was taken from the MAGPIE repository and adapted to the chosen setting. Apart from that, error handling and logging were added, and the code was modularized in a slightly different way.
project_root/
βββ README.md # Main documentation file
βββ 2501_ADL_report.pdf # Project report
βββ requirements.txt # Package dependencies
βββ setup.py # Installation configuration
βββ bias-detection-app/ # Code and requirements for the Streamlit app
βββ datasets/ # Raw and processed data files
βββ src/ # Source code directory
β β
β βββ tokenizer.py # Text tokenization logic
β βββ config/ # Configuration files
β β βββ config.py # Model and training settings
β βββ data/ # Data handling
β β βββ __init__.py # Package initialization
β β βββ task.py # Task definitions
β β βββ dataset.py # Dataset operations
β β
β βββ model/ # Model components
β β βββ model.py # Main MTL implementation
β β βββ heads.py # Task-specific layers
β β βββ model_factory.py # Model creation
β β βββ backbone.py # Base DistilBERT model
β β
β βββ training/ # Training logic
β β βββ trainer.py # Training loop
β β βββ logger.py # WandB logging
β β βββ metrics.py # Performance metrics
β β βββ gradient.py # Gradient operations
β β βββ training_utils.py # Helper functions
β β
β βββ utils/ # Utility functions
β βββ common.py # Shared utilities
β βββ enums.py # Constants and enums
β βββ logger.py # Debug logging
β
βββ research/ # Analysis notebooks
β βββ magpie_repo_test.ipynb # MAGPIE testing
β βββ updated_code_test.ipynb # Updated Code validation
β
βββ scripts/ # Execution scripts
β
βββ tests/ # Testing scripts
β βββ train_debug.py # Single-step test
β βββ full_train_debug.py # Full pipeline test
β
βββ training_baseline/ # Baseline training
β βββ pre_finetune.py # Initial training
β βββ finetune.py # Fine-tuning
β
βββ hyperparameter_tuning/ # Parameter optimization
β βββ hyperparameter_tuning.py # Grid search
β
βββ training_final_model/ # Production training
βββ train_prefinetuning_v2.py # Enhanced pre-training
βββ finetuning_BABE_final_model_robust.py # Final model training
The process of training the model is as follows:
-
Data Initialization (preprocessed from repository)
-
Train a baseline model with the hyperparameter setting of the MAGPIE paper:
- pre-finetune the DistilBERT Model on all datasets except for the BABE dataset
- finetune the model on the BABE dataset (Subtask 1) and compare over different random seeds
-
Perform hyperparameter tuning to find the optimal hyperparameters for the model (see next chapter for details)
-
Train and evaluate the final model with the optimal hyperparameters
All training steps were done on my computer (MacBook Air M2, 16GB RAM, 8 cores) and the results were tracked with wandb.
- 100 steps for pre-finetuning, should be increased, loss is still "moving around" quite a lot
- Some tasks perform very bad, with F1 Scores under 0.5:
- MeTooMA (108, F1 <0.1)
- MDGender (116, F1 <0.35)
- Stereotype (109 subtask 2, F1 <0.45)
Mean Test F1: 68,77%
Max Test F1: 71,43%
- 50 steps for finetuning, should be increased, loss is still "moving around" quite a lot
Since in the MAGPIE repository, they already did a hyperparameter tuning for the subtasks for:
- Learning rate
- Max epochs and
- Early stopping patience,
I will use the hyperparameters from the MAGPIE paper for both, pre-finetuning and finetuning and increase the number of max_steps to 500 in comparison to my baseline, as well as the warmup steps to 10% of the max_steps for the pre-finetuning.
For the finetuning step, I will do a hyperparameter optimization with a grid search for the following parameters:
- Dropout rate for regularization
- Batch size variations
- Warmup steps for learning rate scheduler
- Best configuration: dropout_rate: 0.1, batch_size: 64 for 01, warmup_steps: 100 (for 500 steps)
With the optimal hyperparameters from the hyperparameter optimization, I trained the final model finetuning step over 10 random seeds.
Metric | Mean | Std Dev | Min | Max |
---|---|---|---|---|
F1 Score | 0.7587 | 0.0108 | 0.7415 | 0.7773 |
Accuracy | 0.7772 | 0.0098 | 0.7604 | 0.7946 |
Loss | 0.5120 | 0.0134 | 0.4920 | 0.5282 |
- Did not build a proper CI pipeline (only manual testing)
- Config File for hyperparameters etc. not in a good format and should be in a different place
- Better organization of the scripts would have helped to build the code base
For the demo application, I built a simple Streamlit app that allows users to input text and receive bias detection results. In order to use the trained model, I uploaded the final model weights to hugginghace.co.
This project is licensed under the MIT License - see the LICENSE file for details.
Last updated: January 14, 2025