This repository contains the implementation associated with the paper Klironomos A., Zhou B., Zheng Z., Gad-Elrab M., Paulheim H., Kharlamov E. ReaLitE: Enrichment of Relation Embeddings in Knowledge Graphs using Numeric Literals
accepted in ESWC 2025. The paper presents a novel Knowledge Graph Embedding (KGE) model, namely ReaLitE
, that demonstrates improved performance compared to the state of the art both in link prediction and node classification tasks.
Our contributions are summarized below:
- We propose
ReaLitE
, an approach that can be combined with any vanilla KGE method with relation embeddings. In addition, we demonstrate the integration of our method into existing KGE frameworks, highlighting its versatility and ease of adoption. - We experiment with different methods of aggregating numeric literals, including an automated method to learn a combination of multiple aggregation types.
- We evaluate
ReaLitE
extensively and compare it with state-of-the-art in two tasks: link prediction and node classification. For the former, we evaluate on the standard setting of link prediction, along with a more granular relation-focused evaluation. The results show that our approach is comparable or superior compared to the state-of-the-art methods, particularly on the numeric literals with higher correlation and long-tail relations.
- Dependencies for the outsourced and modified codebase in
MKGA/
- Details in the original repo
- Dependencies for the codebase in
experiments/
- Python 3.10.13
- Poetry 1.7.1 (https://python-poetry.org/docs/#installation)
- The remaining dependencies are in
pyproject.toml
and are installed usingpoetry
(See section Preparation).
This project consists of a main codebase (in experiments/
) and 2 outsourced codebases. The top-level structure is shown below:
├── MKGA # outsourced and modified codebase for training and testing KGE models on node classification
├── pykeen-with-realite # outsourced and modified codebase for training and testing KGE models on link prediction, incl. proposed KGE model
└── experiments # codebase for link prediction evaluation as presented in the paper
More detailed structure of the main codebase (experiments/
):
├── datasets # download locations for datasets used for link prediction
│ ├── fb15k237 # FB15k-237 dataset enhanced with numeric literals
│ ├── yago15k # YAGO15k dataset enhanced with numeric literals
├── results # link prediction results for ReaLitE incl. the best found configurations
│ ├── FB15K237Literal
│ ├── YAGO15KLiteral
└── model_evaluation # scripts for extended link prediction evaluation
└── trained_model_tests # results of additional link prediction tests
├── test_with_relation_filter_kga # results of relation-focused link prediction tests for existing KGE models
├── test_with_relation_filter_pykeen # results of relation-focused link prediction tests for ReaLitE
The implementation of ReaLitE
can be found in pykeen-with-realite/src/pykeen/models/multimodal/
in the following files:
├── base.py
├── complex_realite_variations.py
├── conve_realite_variations.py
├── distmult_realite_variations.py
├── rotate_realite_variations.py
├── transe_realite_variations.py
└── tucker_realite_variations.py
- Run
poetry install
- For downloading the datasets and preparing the KGA codebase, follow the instructions in the Preparation_README.md file.
python experiments/reproduce_pipeline.py <dataset> <model>
- Options for
<dataset>
:YAGO15KLiteral
orFB15K237Literal
- Options for
<model>
:TransEReaLitE
,DistMultReaLitE
,ComplExReaLitE
,RotatEReaLitE
orTuckERReaLitE
- Activate Environment: Ensure you are in the Python environment for the main
ReaLitE
project, not the KGA environment. - Navigate: Change your current directory to the root of the
ReaLitE
project. - Run: Execute the following script. This script internally calls the KGA environment (using the
KGA_ENV_PYTHON_PATH
you configured) to perform tests using the trained KGA models.python experiments/model_evaluation/relation_focused_test_kga.py
ReaLitE
model. This should be done using either the configurations from the paper (see Link Prediction Overall Evaluation) or the PyKEEN
interface.
Note: Step 1 will generate (and overwrite)
experiments/model_evaluation/best_runs.csv
file, so for each dataset these steps should be repeated.
- Identify Best Model Instances: Find the best trained model instances per vanilla KGE model on a specific dataset.
Options for
python experiments/model_evaluation/best_runs_finder.py <dataset>
<dataset>
:YAGO15KLiteral
orFB15K237Literal
- Perform Relation-Focused Testing: Execute relation-specific tests using the identified model instances.
Note: This step is configured for the
YAGO15KLiteral
dataset. To run it for other datasets, you need to modify theYAGO15K_RELS_WITH_HIGH_LITERAL_CORR
,YAGO15K_SYMMETRIC_RELS
, andDATASET_CLASS_TO_EVALUATE
variables in the script.python experiments/model_evaluation/relation_focused_test_pykeen.py
- Configure: Modify the contents of
MKGA/config/multiple_realite.yaml
. Choose the dataset(s) andReaLitE
model variation(s) you want to evaluate by tweaking thedataload
andembed
properties. - Overwrite: Copy the contents of
MKGA/config/multiple_realite.yaml
and use them to overwrite the contents ofMKGA/config/multiple.yaml
. - Activate Environment: Switch to your Python environment created for the
MKGA/
codebase. - Navigate: Change your current directory to
MKGA/src/
.(Adjust path relative to your current location if needed)cd MKGA/src/
- Run: Execute the auto-evaluation script:
python autoevaluate.py
This software is open-sourced under the AGPL-3.0 license. See the LICENSE file for details. For a list of open source components included in this project, see the file 3rd-party-licenses.txt.
If you use our software in your scientific work, please cite our paper:
@article{klironomos2025realite,
title={ReaLitE: Enrichment of Relation Embeddings in Knowledge Graphs using Numeric Literals},
author={Klironomos, Antonis and Zhou, Baifan and Zheng, Zhuoxun and Mohamed, Gad-Elrab and Paulheim, Heiko and Kharlamov, Evgeny},
journal={arXiv preprint arXiv:2504.00852},
year={2025}
}