This repository contains the official implementation of our SIGIR 2025 paper:
π Lightweight and Direct Document Relevance Optimization for Generative IR (DDRO)
- Optimizing Generative Retrieval with Ranking-Aligned Objectives
This repository is actively under development. Thanks for your patience, changes and improvements may be applied frequently. Stay tuned for updates!
- Motivation
- What DDRO Does
- Learning Objectives
- π οΈ Setup & Dependencies - Steps to Reproduce π―
- Preprocessed Data & Model Checkpoints
- π¬ Evaluate Pre-trained Models from HuggingFace
- Citation
Misalignment in Learning Objectives:
Gen-IR models are typically trained via next-token prediction (cross-entropy loss) over docid tokens.
While effective for language modeling, this objective:
- π― Optimizes token-level generation
- β Not designed for document-level ranking
As a result, Gen-IR models are not directly optimized for learning-to-rank, which is the core requirement in IR systems.
In this work, we ask:
How can Gen-IR models directly learn to rank documents, instead of just predicting the next token?
We propose DDRO:
Lightweight and Direct Document Relevance Optimization for Gen-IR
- Aligns training objective with ranking by using pairwise preference learning
- Trains the model to prefer relevant documents over non-relevant ones
- Bridges the gap between autoregressive training and ranking-based optimization
- Requires no reinforcement learning or reward modeling
We optimize DDRO in two phases:
Learn to generate the correct docid sequence given a query by minimizing the autoregressive token-level cross-entropy loss:
Maximize the likelihood of generating the correct docid given a query:
This phase improves the ranking quality of generated document identifiers by applying a pairwise learning-to-rank objective inspired by Direct Preference Optimization (DPO).
π Rafailov et al., 2023 β Direct Preference Optimization: Your Language Model is Secretly a Reward Model
This Direct Document Relevance Optimization (DDRO) loss guides the model to prefer relevant documents (docidβΊ
) over non-relevant ones (docidβ»
) by comparing how both the current model and a frozen reference model score each document:
-
docidβΊ
: A relevant document for the queryq
-
docidβ»
: A non-relevant or less relevant document -
$\pi_\theta$ : The current model being optimized -
$\pi^{\text{ref}}$ : A frozen reference model (typically trained with SFT in Phase 1) - Ξ²: Temperature-like factor controlling sensitivity.
-
$\sigma$ : Sigmoid function, to map scores to [0,1] preference space
Encourage the model to rank relevant docidβΊ higher than non-relevant docidβ»:
The DPO loss is used after the SFT phase to fine-tune the ranking behavior of the model. Instead of just generating docid
, the model now learns to rank docidβΊ
higher than docidβ»
in a relevance/preference-aligned manner.
- Directly encourages higher generation scores for relevant documents
- Uses contrastive ranking rather than token-level generation
- Avoids reward modeling or RL while remaining efficient and scalable
While our optimization is inspired by the DPO framework Rafailov et al., 2023, its adaptation to Generative Document Retrieval is non-trivial:
- In contrast to open-ended preference alignment, our task involves structured docid generation under beam decoding constraints
- Our model uses an encoder-decoder architecture rather than decoder-only
- The objective is document-level ranking, not open-ended preference generation
This required novel integration of preference optimization into retrieval-specific pipelines, making DDRO uniquely suited for GenIR.
src/
βββ data/ # Data downloading, preprocessing, and docid instance generation
βββ pretrain/ # DDRO model training and evaluation logic (incl. ddro)
βββ scripts/ # Entry-point shell scripts for SFT, ddro, BM25, and preprocessing
βββ utils/ # Core utilities (tokenization, trie, metrics, trainers)
βββ ddro.yml # Conda environment (for training DDRO)
βββ pyserini.yml # Conda environment (for BM25 retrieval with Pyserini)
βββ README.md # You're here!
βββ requirements.txt # Additional Python dependencies
π Each subdirectory includes a detailed
README.md
with instructions.
Clone the repository and create the conda environment:
git clone https://github.com/kidist-amde/ddro.git
cd ddro
conda env create -f ddro_env.yml
conda activate ddro_env
We use MS MARCO document (top-300k) and Natural Questions (NQ-320k) datasets, and a pretrained T5 model.
To download them, run the following commands from the project root (ddro/):
bash ./src/data/download/download_msmarco_datasets.sh
bash ./src/data/download/download_nq_datasets.sh
python ./src/data/download/download_t5_model.py
π For details and download links, refer to: src/data/download/README.md
DDRO evaluated both on Natural Questions (NQ) and MS MARCO datasets.
β Sample Top-300K MS MARCO Subset Run the following script to preprocess and extract the top-300K most relevant MS MARCO documents based on qrels:
bash scripts/preprocess/sample_top_docs.sh
- π This will generate: resources/datasets/processed/msmarco-docs-sents.top.300k.json.gz (sentence-tokenized JSONL format, ranked by relevance frequency)
Once everything is downloaded and processed, your resources/ directory should look like this:
resources/
βββ datasets/
β βββ raw/
β β βββ msmarco-data/ # Raw MS MARCO dataset
β β βββ nq-data/ # Raw Natural Questions dataset
β βββ processed/ # Preprocessed outputs
βββ transformer_models/
βββ t5-base/ # Local copy of T5 model & tokenizer
π To process and sample both datasets, generate document IDs, and prepare training/evaluation instances, please refer to the corresponding README:
We first train a Supervised Fine-Tuning (SFT) model using next-token prediction across three stages:
- Pretraining on document content (
doc β docid
) - Search Pretraining on pseudo queries (
pseudoquery β docid
) - Finetuning on real queries using supervised pairs from qrels (with gold docids) (
query β docid
)
This results in a seed model trained to autoregressively generate document identifiers.
You can run all stages with a single command:
bash ddro/src/scripts/sft/launch_SFT_training.sh
After training the SFT model (Phase 1), we apply Phase 2: Direct Document Relevance Optimization, which fine-tunes the model using a pairwise ranking objective, that trains the model to prefer relevant documents over non-relevant ones.
This bridges the gap between autoregressive generation and ranking-based retrieval.
We implement this using a custom version of Hugging Face's DPOTrainer
.
Run DDRO training and evaluation:
bash scripts/ddro/slurm_submit_ddro_training.sh
bash scripts/ddro/slurm_submit_ddro_eval.sh
You can directly evaluate our published models without training from scratch:
kiyam/ddro-msmarco-pq
- MS MARCO with PQ encodingkiyam/ddro-msmarco-tu
- MS MARCO with Title+URL encodingkiyam/ddro-nq-pq
- Natural Questions with PQ encodingkiyam/ddro-nq-tu
- Natural Questions with Title+URL encoding
# For SLURM clusters:
sbatch src/pretrain/hf_eval/slurm_submit_hf_eval.sh
# Or run directly:
encoding="url_title" # Choose from: "url_title", "pq"
python src/pretrain/hf_eval/eval_hf_docid_ranking.py \
--per_gpu_batch_size 4 \
--log_path logs/msmarco/dpo_HF_url.log \
--pretrain_model_path kiyam/ddro-msmarco-tu \
--docid_path resources/datasets/processed/msmarco-data/encoded_docid/${encoding}_docid.txt \
--test_file_path resources/datasets/processed/msmarco-data/eval_data/query_dev.${encoding}.jsonl \
--dataset_script_dir src/data/data_scripts \
--dataset_cache_dir ./cache \
--num_beams 15 \
--add_doc_num 6144 \
--max_seq_length 64 \
--max_docid_length 100 \
--use_docid_rank True \
--docid_format msmarco \
--lookup_fallback True \
--device cuda:0
--encoding
: Use"url_title"
or"pq"
to match your model type--docid_format
: Use"msmarco"
or"nq"
depending on the dataset--pretrain_model_path
: Specify the HuggingFace model you want to evaluate
You can use our pre-generated encoded document IDs from HuggingFace Datasets to skip the data preparation step.
π Evaluation logs and metrics are saved to:
logs/
outputs/
We evaluate DDRO on two standard retrieval benchmarks:
All datasets, pseudo queries, docid encodings, and model checkpoints are available here:
π DDRO Generative IR Collection on Hugging Face π€
We gratefully acknowledge the following open-source projects:
This project is licensed under the Apache 2.0 License.
@inproceedings{mekonnen2025lightweight,
title={Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval},
author={Mekonnen, Kidist Amde and Tang, Yubao and de Rijke, Maarten},
booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages={1327--1338},
year={2025}
}
}
For questions, please open an issue.
Β© 2025 Kidist Amde Mekonnen Β· Made with β€οΈ at IRLab, University of Amsterdam.