Skip to content
/ LTB Public

[2025-IJCAI] The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection

Notifications You must be signed in to change notification settings

pd162/LTB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

[IJCAI 2025] The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection

📃Arxiv Paper🛢️Data📖Citation

MAEDet

MAEDet utilizes self-supervised learning and the Masked Autoencoder (MAE) to learn better text representations from unlabeled data, providing a notable approach for addressing the diverse challenges of scene text detection.

MAEDet

🛠️ Prerequisites

  • Python > 3.7
  • PyTorch > 1.7
  • CUDA > 10.2
  • MMOCR 1.0.1 [Source]
  • Clip [Source]

This implementation is based on MMOCR v1.0.1. Please install the official repository first, then replace the following three directories with the provided code:

mmocr
├── configs
├── mmocr
└── tools

🎒 Data Preparation

SSL Pretraining

Joint training

Config

Please adjust the dataset settings in configs/textdet/_base_/datasets/english_scene_text.py according to your environment. Refer to Dataset Preparation Guide for detailed instructions.

🖥️ Usage

SSL Learning

cd MAE-std
CUDA_VISIBLE_DEVICES=<gpu_ids> python -m torch.distributed.launch --nproc_per_node=<gpu_num> std_pretrain.py

Joint Training

cd mmocr
CUDA_VISIBLE_DEVICES=<gpu_ids> bash tools/dist_train.sh configs/textdet/dbnetpp/dbnetpp_vit_w_pretrain.py <save_dir> <gpu_num>

See document for more usage.

Evaluation

cd mmocr
CUDA_VISIBLE_DEVICES=<gpu_id> python python tools/test.py configs/textdet/dbnetpp/dbnetpp_vit_w_pretrain.py <checkpoint_path> --eval hmean-iou

See document for more usage.

🛢️ LTB

LTB poses 13 long-tailed challenges that comprehensively evaluate the capabilities of scene text detectors in real-world scenarios. The benchmark includes 924 carefully curated images and 2770 challenging text instances that are hard for detection.

We hope LTB will inspire the development of more robust text detection algorithms and facilitate research into unified approaches for tackling these diverse and complex problems:

Challenges

Download LTB and read instructions.

📖 Citation

If you find MAEDet or LTB useful for your research, please kindly cite using this BibTeX:

@inproceedings{cao2025devil,
  title={The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection},
  author={Cao, Tian-Jiao and Lyu, Jia-Hao and Zeng, Wei-Chao and Mu, Wei-Min and Zhou Yu},
  booktitle={Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence},
  year={2025}
}

🙏 Acknowledgments

This work is heavily based on MAE, MMOCR. Thanks to all the authors for their great work.

About

[2025-IJCAI] The Devil is in Fine-tuning and Long-tailed Problems: A New Benchmark for Scene Text Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •