[IJCAI 2025] The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection

MAEDet

MAEDet utilizes self-supervised learning and the Masked Autoencoder (MAE) to learn better text representations from unlabeled data, providing a notable approach for addressing the diverse challenges of scene text detection.

🛠️ Prerequisites

Python > 3.7
PyTorch > 1.7
CUDA > 10.2
MMOCR 1.0.1 [Source]
Clip [Source]

This implementation is based on MMOCR v1.0.1. Please install the official repository first, then replace the following three directories with the provided code:

mmocr
├── configs
├── mmocr
└── tools

🎒 Data Preparation

SSL Pretraining

MARIO-LAION-OCR [download]

Joint training

ICDAR2013 [download]
ICDAR2015 [download]
COCO-Text [download]
Total-Text [download]
MLT2017 [download]
MLT2019 [download]
ArT [download]
LSVT [download]
TextOCR [download]

Config

Please adjust the dataset settings in configs/textdet/_base_/datasets/english_scene_text.py according to your environment. Refer to Dataset Preparation Guide for detailed instructions.

🖥️ Usage

SSL Learning

cd MAE-std
CUDA_VISIBLE_DEVICES=<gpu_ids> python -m torch.distributed.launch --nproc_per_node=<gpu_num> std_pretrain.py

Joint Training

cd mmocr
CUDA_VISIBLE_DEVICES=<gpu_ids> bash tools/dist_train.sh configs/textdet/dbnetpp/dbnetpp_vit_w_pretrain.py <save_dir> <gpu_num>

See document for more usage.

Evaluation

cd mmocr
CUDA_VISIBLE_DEVICES=<gpu_id> python python tools/test.py configs/textdet/dbnetpp/dbnetpp_vit_w_pretrain.py <checkpoint_path> --eval hmean-iou

See document for more usage.

🛢️ LTB

LTB poses 13 long-tailed challenges that comprehensively evaluate the capabilities of scene text detectors in real-world scenarios. The benchmark includes 924 carefully curated images and 2770 challenging text instances that are hard for detection.

We hope LTB will inspire the development of more robust text detection algorithms and facilitate research into unified approaches for tackling these diverse and complex problems:

Download LTB and read instructions.

📖 Citation

If you find MAEDet or LTB useful for your research, please kindly cite using this BibTeX:

@inproceedings{cao2025devil,
  title={The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection},
  author={Cao, Tian-Jiao and Lyu, Jia-Hao and Zeng, Wei-Chao and Mu, Wei-Min and Zhou Yu},
  booktitle={Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence},
  year={2025}
}

🙏 Acknowledgments

This work is heavily based on MAE, MMOCR. Thanks to all the authors for their great work.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LTB		LTB
MAEDet		MAEDet
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[IJCAI 2025] The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection

MAEDet

🛠️ Prerequisites

🎒 Data Preparation

SSL Pretraining

Joint training

Config

🖥️ Usage

SSL Learning

Joint Training

Evaluation

🛢️ LTB

📖 Citation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

pd162/LTB

Folders and files

Latest commit

History

Repository files navigation

[IJCAI 2025] The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection

MAEDet

🛠️ Prerequisites

🎒 Data Preparation

SSL Pretraining

Joint training

Config

🖥️ Usage

SSL Learning

Joint Training

Evaluation

🛢️ LTB

📖 Citation

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages