AutoRefine

🔥News

Update results of additional model size (7B) under more metrics (F1, Cover EM).
Support quick start of gradio demo or quick inference. Refer to Quick Start.
Homepage is available at [Here]
Paper is available on [Arxiv]
Checkpoints are released at [🤗HuggingFace].

Official implementation of paper Search and Refine During Think: Autonomous Retrieval‑Augmented Reasoning of LLMs.

AutoRefine is an RL post-training framework that adopts a new "search-and-refine-during-think" paradigm. It introduces:

explicit knowledge refinement steps between successive search calls, enabling the model to iteratively filter, distill, and organize evidence before generating an answer.
tailored retrieval-specific rewards alongside answer correctness rewards to guide the searching behaviors.

🛠️Installation

Main Environment

The enrivonment for training/testing of AutoRefine can be built by running:

conda create -n autorefine python=3.9
conda activate autorefine
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip3 install vllm==0.5.4

# build verl
pip install -e .

# flash attention 2
pip install flash-attn==2.7.0.post2
pip install wandb

Retrieval Environment

This environment is for the local retrieval server.

conda create -n faiss_env python=3.10
conda activate faiss_env

conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets pyserini

conda install -c pytorch -c nvidia faiss-gpu=1.8.0

pip install uvicorn fastapi

💫Quick Start

To quickly test the model, you can run the demo script:

Start the retrieval server:

conda activate faiss_env
bash retrieval_launch.sh

Please refer to the Retrieval Corpus section for the preparation of the retrieval corpus. This won't take long if your internet connection is good.

Run the demo script:

conda activate autorefine
python demo.py

This will start a Gradio interface where you can input questions and see the model's responses.

If you prefer a local inference without the Gradio interface, you can directly run the inference script:

conda activate autorefine
python infer.py

This will print the model's response to the console. You may modify the infer.py script to change the input question or adjust the model parameters.

📂Data Preparation

Retrieval Corpus

save_path=./data
python preprocess/download.py --save_path $save_path
cat $save_path/part_* > $save_path/e5_Flat.index
gzip -d $save_path/wiki-18.jsonl.gz

Training/Evaluation Dataset

We download the data for model training/evaluation from FlashRAG Collection.

To download and build the dataset, run:

bash preprocess/scripts/data_process.sh

This will merge the training set of NQ and HotpotQA as the training data, and merge the test/dev sets of nq,triviaqa,popqa,hotpotqa,2wikimultihopqa,musique,bamboogle as the test set.

🚀Reproduction

Retirever Server

Before running the code for training/evaluation, you need to load the retrieval server first:

conda activate faiss_env
bash retrieval_launch.sh

This will start a server listening on http://127.0.0.1:8000/retrieve.

Training

To reproduce the result in the paper (Table 1), run the following code for training:

conda activate autorefine
bash cmd/train.sh

The script above will train the model for 300 steps while saving checkpoints with (1) highest reward (2) highest evaluation accuracy.

If you want to log the results onto wandb, you may set the wandb_token and WAND_PROJECT variables in the scripts to your wandb token and prefered project name.

Inference

For evaluation, run:

conda activate autorefine
bash cmd/eval.sh

🙏Acknowledgements

This project is built upon the foundational work of VeRL and Search-R1. We sincerely thank the authors of these projects for their valuable contributions, which have significantly supported and inspired our work.

Thanks for the mention by Search-R1 at Here.

🎓Citations

@article{AutoRefine,
    title={Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs},
    author={Yaorui, Shi and Shihan, Li and Chang, Wu and Zhiyuan, Liu and Junfeng, Fang and Hengxing, Cai and An, Zhang and Xiang, Wang},
    journal={arXiv preprint arXiv:2505.11277},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assets		assets
cmd		cmd
docs		docs
preprocess		preprocess
search_r1		search_r1
verl		verl
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
infer.py		infer.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
retrieval_launch.sh		retrieval_launch.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoRefine

🔥News

🛠️Installation

💫Quick Start

📂Data Preparation

Retrieval Corpus

Training/Evaluation Dataset

🚀Reproduction

Retirever Server

Training

Inference

🙏Acknowledgements

🎓Citations

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

syr-cn/AutoRefine

Folders and files

Latest commit

History

Repository files navigation

AutoRefine

🔥News

🛠️Installation

💫Quick Start

📂Data Preparation

Retrieval Corpus

Training/Evaluation Dataset

🚀Reproduction

Retirever Server

Training

Inference

🙏Acknowledgements

🎓Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!

Languages