Description
I have been trying to analyze the documents using layout parser on different types of documents, I am able to get expected results on True pdfs but not on scanned pdfs, it is detecting the scanned pdf image contents as figure or not as expected results.
I am facing this issue only for the scanned pdfs
Checklist
- I have searched related issues but cannot get the expected help.
- The bug has not been fixed in the latest version, see the Layout Parser Releases
To Reproduce
import layoutparser as lp
import cv2
image = cv2.imread("test.png")
image = image[..., ::-1]
model = lp.models.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
color_map = {
'Text': 'red',
'Title': 'blue',
'List': 'green',
'Table': 'purple',
'Figure': 'pink',
}
layout = model.detect(image)
lp.draw_box(image, layout, box_width=3,color_map=color_map)
Environment
- I am using windows
- Latest layout parser version
Contains 2 images: