[IJCAI 2025] The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection
📃Arxiv Paper • 🛢️Data • 📖Citation
MAEDet utilizes self-supervised learning and the Masked Autoencoder (MAE) to learn better text representations from unlabeled data, providing a notable approach for addressing the diverse challenges of scene text detection.
This implementation is based on MMOCR v1.0.1. Please install the official repository first, then replace the following three directories with the provided code:
mmocr
├── configs
├── mmocr
└── tools
- MARIO-LAION-OCR [download]
- ICDAR2013 [download]
- ICDAR2015 [download]
- COCO-Text [download]
- Total-Text [download]
- MLT2017 [download]
- MLT2019 [download]
- ArT [download]
- LSVT [download]
- TextOCR [download]
Please adjust the dataset settings in configs/textdet/_base_/datasets/english_scene_text.py
according to your environment. Refer to Dataset Preparation Guide for detailed instructions.
cd MAE-std
CUDA_VISIBLE_DEVICES=<gpu_ids> python -m torch.distributed.launch --nproc_per_node=<gpu_num> std_pretrain.py
cd mmocr
CUDA_VISIBLE_DEVICES=<gpu_ids> bash tools/dist_train.sh configs/textdet/dbnetpp/dbnetpp_vit_w_pretrain.py <save_dir> <gpu_num>
See document for more usage.
cd mmocr
CUDA_VISIBLE_DEVICES=<gpu_id> python python tools/test.py configs/textdet/dbnetpp/dbnetpp_vit_w_pretrain.py <checkpoint_path> --eval hmean-iou
See document for more usage.
LTB poses 13 long-tailed challenges that comprehensively evaluate the capabilities of scene text detectors in real-world scenarios. The benchmark includes 924 carefully curated images and 2770 challenging text instances that are hard for detection.
We hope LTB will inspire the development of more robust text detection algorithms and facilitate research into unified approaches for tackling these diverse and complex problems:
Download LTB and read instructions.
If you find MAEDet or LTB useful for your research, please kindly cite using this BibTeX:
@inproceedings{cao2025devil,
title={The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection},
author={Cao, Tian-Jiao and Lyu, Jia-Hao and Zeng, Wei-Chao and Mu, Wei-Min and Zhou Yu},
booktitle={Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence},
year={2025}
}
This work is heavily based on MAE, MMOCR. Thanks to all the authors for their great work.