Official implementation of Revisiting Hierarchical Text Classification: Inference and Metrics, CoNLL 2024.
Based on HITIN repo.
Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy. Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such. We instead propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method. We introduce a new challenging dataset and we evaluate fairly, recent sophisticated models, comparing them with a range of simple but strong baselines, including a new theoretically motivated loss. Finally, we show that those baselines are very often competitive with the latest models. This highlights the importance of carefully considering the evaluation methodology when proposing new methods for HTC.
We present Hierarchical Wikivitals, a novel high-quality HTC dataset, extracted from Wikipedia. Equipped with a deep and complex hierarchy, it provides a harder challenge.
The conditional probability is computed as follows :
The loss function is defined as:
For full details regarding the derivation and implications of these formulas, please refer to the article.
We quantitatively evaluate HTC methods based on specifically designed hierarchical metrics and with a rigorous methodology.
- Clone the repository:
git clone https://github.com/RomanPlaud/revisitingHTC.git cd revisitingHTC
- Create and activate the conda environment:
conda create -n revisiting_htc_env --file revisiting-htc.txt conda activate revisiting_htc_env
Our newly introduced dataset is available here. Feel free to use it for your experiments. This dataset is released under the MIT License.
Obtain the RCV1, WOS, and BGC datasets by referring to:
Ensure the datasets match the format:
{
"token": ["Sample input text"],
"label": ["Category", "Subcategory", "Further Subcategory"]
}
In addition a taxonomy file (such as hwv.taxonomy) is required where each line represents a parent category followed by its children, separated by tabs. Ensure all labels used in the dataset are covered.
Example :
Root Science Technology Arts
Science Physics Chemistry Biology
To accelerate the training process, you can tokenize your dataset. Below are the instructions for tokenizing the HWV dataset:
python3 tokenize_dataset.py --data_train_path data/HWV/hwv_train.json --data_test_path data/HWV/hwv_test.json --data_valid_path data/HWV/hwv_val.json --config_file data/HWV/config_hwv.json
To reproduce the results of our article, execute the following command:
bash bash_files/hwv/train_hwv_hitin_cond_softmax_la.sh
You may also use any other bash file contained in the bash_files folder.
Note: If your dataset is not tokenized, please set "tokenized" to false in the config file and update the paths to the dataset accordingly.
python3 evaluate.py --config_file configs/aaaa_final_hwv/vanilla_bert_hwv_conditional_softmax_la.json --model_file ckpt/1001_1653_vanilla_bert_hwv_conditional_softmax_la/best_micro_Origin --output_file results_hwv_conditional_softmax_la.json
This project and dataset is released under the MIT License.
@inproceedings{plaud-etal-2024-revisiting,
title = "Revisiting Hierarchical Text Classification: Inference and Metrics",
author = "Plaud, Roman and
Labeau, Matthieu and
Saillenfest, Antoine and
Bonald, Thomas",
editor = "Barak, Libby and
Alikhani, Malihe",
booktitle = "Proceedings of the 28th Conference on Computational Natural Language Learning",
month = nov,
year = "2024",
address = "Miami, FL, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.conll-1.18",
doi = "10.18653/v1/2024.conll-1.18",
pages = "231--242",
abstract = "Hierarchical text classification (HTC) is the task of assigning labels to a text within a structured space organized as a hierarchy. Recent works treat HTC as a conventional multilabel classification problem, therefore evaluating it as such. We instead propose to evaluate models based on specifically designed hierarchical metrics and we demonstrate the intricacy of metric choice and prediction inference method. We introduce a new challenging dataset and we evaluate fairly, recent sophisticated models, comparing them with a range of simple but strong baselines, including a new theoretically motivated loss. Finally, we show that those baselines are very often competitive with the latest models. This highlights the importance of carefully considering the evaluation methodology when proposing new methods for HTC",
}