Our solution to LLMs4Subjects (SemEval'25 task #5) 🚀

This repo contains both files from LLMs4Subjects' original repository and our files 📂

Our approach

We explore three different approaches of increasing complexity:

Embedding-based retrieval: Generate embeddings for both documents and tags using an encoder LLM, compute cosine similarity, and assign the top-k highest scoring tags. 🧩
Fine-tuned embedding model: Fine-tune a transformer-based encoder and apply the previous embedding-based retrieval approach. 🔧
Binary classification model: Train a binary MLP that, given a document and a tag, predicts a similarity score, then selects the top-k highest scoring tags. 🧠

Repository Structure 🗂️

In the following, we'll report only the files we created/modified:

├── images/                          # Directory containing images used in the project 🖼️
├── results/                         # Directory where results are stored 📊
├── main.ipynb                       # Main Jupyter notebook for running experiments 📓
├── embedding_similarity_tagging.py  # Script for tagging using embedding similarity 🏷️
├── finetune_sentence_transformer.py # Script for fine-tuning a sentence transformer model 🔧
├── binary_classifier.py             # Script for defining the multi-layer perceptron 🧠
├── binary_mlp.py                    # Script for training the multi-layer perceptron 🏋️
├── performances.py                  # Script for evaluating model performances 📈
├── plots.py                         # Script for generating plots 📉
├── README.md                        # Project documentation 📃
├── requirements.txt                 # List of dependencies required for the project 📋

Best results obtained 🏆

The best results are obtained by fine-tuning for three epochs using MultipleNegativesRankingLoss 🔄

k	Precision (%)	Recall (%)	F1 Score (%)
5	9.74	21.70	13.28
10	6.30	27.12	10.14
15	4.73	29.75	8.11
20	3.87	32.16	6.88
25	3.30	33.89	5.98
30	2.91	35.68	5.36
35	2.61	36.99	4.86
40	2.36	37.83	4.43
45	2.16	38.92	4.08
50	1.99	39.76	3.87

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
images		images
img		img
results		results
shared-task-datasets		shared-task-datasets
shared-task-eval-script		shared-task-eval-script
submission-format		submission-format
supplementary-datasets		supplementary-datasets
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
LLMs4Subjects presentation.pdf		LLMs4Subjects presentation.pdf
LLMs4Subjects project report.pdf		LLMs4Subjects project report.pdf
README.md		README.md
README_SemEval.md		README_SemEval.md
binary_classifier.py		binary_classifier.py
binary_mlp.py		binary_mlp.py
embedding_similarity_tagging.py		embedding_similarity_tagging.py
finetune_sentence_transformer.py		finetune_sentence_transformer.py
main.ipynb		main.ipynb
performances.py		performances.py
plots.py		plots.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Our solution to LLMs4Subjects (SemEval'25 task #5) 🚀

Our approach

Repository Structure 🗂️

Best results obtained 🏆

About

Uh oh!

License

RonPlusSign/llms4subjects

Folders and files

Latest commit

History

Repository files navigation

Our solution to LLMs4Subjects (SemEval'25 task #5) 🚀

Our approach

Repository Structure 🗂️

Best results obtained 🏆

About

Resources

License

Uh oh!

Stars

Watchers

Forks