How do i use pretrained model for prediction?
monuminu opened this issue ยท 34 comments
Hi All,
Thanks a lot of this awesome Dataset and pretrained weights . I wanted to know how can i use this for prediction of bounding box given a page image ?
Hi @monuminu The model was trained with the Detectron frame work. So you can load it in Detectron and use its inference examples. I will also provide a Jupyter notebook for inference soon.
Hi,
Once again, thanks for releasing your great work.
I am not sure whether we should get Detectron or Detectron2? Detectron is in Caffe2 and installing Caffe at this point is just a pain. Detectron2 is a rewrite of Detecton in PyTorch and seems to enjoy a better support.
Looking forward to the tutorial!
Hi @Lambert-Shirzad Thank you. I trained the models on Detectron, which is powered by Caffe2. But I found the installation was quite easy. I just followed the instructions in this link. We also plan to retrain the models in the Detectron2 framework.
Agree with @Lambert-Shirzad . It will be awesome if you can train on Detectron2 . Also waiting for your jupyter notebook. That will greatly help me .
If anyone is interested, I have trained it on Detectron2. You can find training config and trained models (resnet101, resnext101) in this repo https://github.com/hpanwar08/detectron2
Note: Models are trained on ~60% of the original dataset for ~1.5 epochs. But these models works good for fine-tuning domain specific dataset.
@hpanwar08 I am trying your trained models for prediction but I get this error:
Config '\configs\DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
@nodechef This is just a warning. You should still be able to get the predictions.
I have updated the models. Please download the smaller trimmed model "model_final_trimmed.pth" from dropbox. It is smaller in size.
Use the below command for prediction
python demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input "<path to image.jpg>" --output <path to save the predicted image> --confidence-threshold 0.5 --opts MODEL.WEIGHTS <path to model_final_trimmed.pth> MODEL.DEVICE cpu
@hpanwar08 Yeah it worked, later realized that it was just the warning. However, I have a question.
How do we get predicted classes ? like title, paragraph, figure ?
I guess we need to add this to demo.py in order to get the predicted class, Right ? Correct me if I am wrong.
from detectron2.data import MetadataCatalog
MetadataCatalog.get("dataset_name").thing_classes = ["title", "text","figure", "table","list"]
The name of the dataset would be the one you used for training ?
@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names.
Refer line 35 and 48 in predictor.py
Or you can try this
classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')
pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
"instances" contains the bbox, class probabilities and class indexes.
@hpanwar08 I am looking for class labels instead of percentage for each bbox (Along with the visualization.)
@nodechef If that is the case, then what you said should work. dataset_name will be "dla_val"
MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ['text', 'title', 'list', 'table', 'figure']
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]) , scale=1.2)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(v.get_image()[:, :, ::-1])
This worked.
Hi, is there an internal feature which lets each classed be saved as a seperate segment, or image? I am trying to identify tables, seperate and then run through a tabular data analyzer and ocr - so far am able to get the image predictions with your code, but not the actual annotations/segmented fields for further analysis/ocr.
@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names.
Refer line 35 and 48 in predictor.pyOr you can try this
classes = ['text', 'title', 'list', 'table', 'figure'] default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg) img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR") predictions = default_predictor(img) instances = predictions["instances"].to('cpu') pred_classes = instances.pred_classes labels = [classes[i] for i in pred_classes]"instances" contains the bbox, class probabilities and class indexes.
Sorry, can you expand on where I woud add/change these lines you added - do I run as an external .py code and reference input/outputs, or do I change the predict code as mentioned above?
MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ['text', 'title', 'list', 'table', 'figure'] v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]) , scale=1.2) v = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2_imshow(v.get_image()[:, :, ::-1])
This worked.
@nodechef does this give you both the classes, and the visualization? could you elaborate on whether you changed these lines in the predict.py file or ran an external .py file?
MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ['text', 'title', 'list', 'table', 'figure']
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]) , scale=1.2)
v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
cv2_imshow(v.get_image()[:, :, ::-1])
@nodechef when adding the script you added to the bottom of demo.py, I get the following error:
Traceback (most recent call last):
File "C:\projects\pytorch\detectron2\demo\demo.py", line 155, in <module>
v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]) , scale=1.2)
NameError: name 'Visualizer' is not defined
when adding in to the top of the demo.py file, I get the following:
Traceback (most recent call last):
File "C:\projects\pytorch\detectron2\demo\demo.py", line 22, in <module>
MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ['text', 'title', 'list', 'table', 'figure']
NameError: name 'cfg' is not defined
is there somewhere specific I should be adding your edit?
@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names.
Refer line 35 and 48 in predictor.pyOr you can try this
classes = ['text', 'title', 'list', 'table', 'figure'] default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg) img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR") predictions = default_predictor(img) instances = predictions["instances"].to('cpu') pred_classes = instances.pred_classes labels = [classes[i] for i in pred_classes]"instances" contains the bbox, class probabilities and class indexes.
@hpanwar08 thanks the above throws up no errors, how can I save the instances to a location on my system? am running through shell.
@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names.
Refer line 35 and 48 in predictor.py
Or you can try thisclasses = ['text', 'title', 'list', 'table', 'figure'] default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg) img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR") predictions = default_predictor(img) instances = predictions["instances"].to('cpu') pred_classes = instances.pred_classes labels = [classes[i] for i in pred_classes]"instances" contains the bbox, class probabilities and class indexes.
@hpanwar08 thanks the above throws up no errors, how can I save the instances to a location on my system? am running through shell.
@elnazsn1988 In addition to the above code you can extract the bounding boxes and then crop the image based on the bounding box and save.
classes = ['text', 'title', 'list', 'table', 'figure']
default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg)
img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR")
predictions = default_predictor(img)
instances = predictions["instances"].to('cpu')
pred_classes = instances.pred_classes
labels = [classes[i] for i in pred_classes]
boxes = instances.pred_boxes
if isinstance(boxes, detectron2.structures.boxes.Boxes):
boxes = boxes.tensor.numpy()
else:
boxes = np.asarray(boxes)
for label, bbox in zip(labels, boxes):
if label == "table":
cropped_img = img.crop(bbox)
croppped_img.save(f"{label}_{bbox}.png")
Hope this helps.
@nodechef If you want just the predictions and not visualization then use detectron2.engine.defaults.DefaultPredictor() to get the classes. You will get class indexes, then you need to convert class indexes to actual class names.
Refer line 35 and 48 in predictor.py
Or you can try thisclasses = ['text', 'title', 'list', 'table', 'figure'] default_predictor = detectron2.engine.defaults.DefaultPredictor(cfg) img = detectron2.data.detection_utils.read_image(path_to_image, format="BGR") predictions = default_predictor(img) instances = predictions["instances"].to('cpu') pred_classes = instances.pred_classes labels = [classes[i] for i in pred_classes]"instances" contains the bbox, class probabilities and class indexes.
Sorry, can you expand on where I woud add/change these lines you added - do I run as an external .py code and reference input/outputs, or do I change the predict code as mentioned above?
this may already be resolved but would leave it for future reference, based on node chef's code, what I did is adding the following lines in demo.py after demo.run_on_image
from detectron2.data import MetadataCatalog
MetadataCatalog.get(cfg.DATASETS.TRAIN[0]).thing_classes = ["text",
"title",
"list",
"table",
"figure"]
v = Visualizer(img[:, :, ::-1],
MetadataCatalog.get(cfg.DATASETS.TRAIN[0]),
scale=1.2)
v = v.draw_instance_predictions(predictions["instances"].to("cpu"))
and then
imwrite(out_filename, v.get_image()[:, :, ::-1])
where u have output filename
Note you may have to reorder the labels in the classes array
Did anyone try to find tune with others categories. For example, i have others categories as authors, introduction. So my categories are [title, text, figure, table, list, authors, introduction]. I know that IBM don't have plan with additional categories, but how can i try to do this?
Did anyone try to find tune with others categories. For example, i have others categories as authors, introduction. So my categories are [title, text, figure, table, list, authors, introduction]. I know that IBM don't have plan with additional categories, but how can i try to do this?
Hi @ChungNPH do you have annotations of the additional categories?
Did anyone try to find tune with others categories. For example, i have others categories as authors, introduction. So my categories are [title, text, figure, table, list, authors, introduction]. I know that IBM don't have plan with additional categories, but how can i try to do this?
Hi @ChungNPH do you have annotations of the additional categories?
I create a custom dataset my self, it have some difference with publaynet format. As:
{ image:
{ file name: 'filename',
height: 'height',
width: 'width',
id: 'id',
annotations:
{ obj 1: [],
obj 2: [],
}
}
}
And i tried to train only publaynet with detectron2 but may detectron2 dont understand publaynet format as { 'image': [...], 'annotations' : [...], 'categories' : [...] }. Should I change json file format to train? Can you so me an overview to train with publaynet? Thank you so much!
https://github.com/facebookresearch/detectron2/blob/master/docs/tutorials/datasets.md
I see that detectron2 need to re-format dataset to their format before train. Is it right?
Hi @monuminu The model was trained with the Detectron frame work. So you can load it in Detectron and use its inference examples. I will also provide a Jupyter notebook for inference soon.
Hey can anybody help me in correctly loading pretrained model and test it on documents,
I am trying to use the pretrained weights on the testset Images.
Case 1 : Using config configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml
code: python3.7 demo/demo.py --config-file configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml --input examples/PMC3654277_00006.jpg --output out/ --confidence-threshold 0.25 --opts MODEL.DEVICE cpu
But it's giving me pretty bad results. if I increase confidence-threshold> 0.25 then there is no detection
Case 2 : Using config configs/DLA_mask_rcnn_X_101_32x8d_FPN_3x.yaml
code python3.7 demo/demo.py --config-file pre-trained-models/Mask-RCNN/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml --input examples/PMC3576793_00004.jpg --output out/ --confidence-threshold 0.2 --opts TYPE Mask-RCNN MODEL.DEVICE cpu
throwing the below error
'pre-trained-models/Mask-RCNN/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Traceback (most recent call last):
File "demo/demo.py", line 76, in <module>
cfg = setup_cfg(args)
File "demo/demo.py", line 28, in setup_cfg
cfg.merge_from_file(args.config_file)
File "/opt/anaconda3/lib/python3.7/site-packages/detectron2/config/config.py", line 49, in merge_from_file
self.merge_from_other_cfg(loaded_cfg)
File "/opt/anaconda3/lib/python3.7/site-packages/fvcore/common/config.py", line 120, in merge_from_other_cfg
return super().merge_from_other_cfg(cfg_other)
File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
_merge_a_into_b(cfg_other, self, self, [])
File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 464, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 477, in _merge_a_into_b
raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.TYPE'
Where did you download the trained model from?
https://github.com/facebookresearch/detectron2/blob/master/docs/tutorials/datasets.md
I see that detectron2 need to re-format dataset to their format before train. Is it right?
You probably have solved it by now, but just for info, if your data is in coco format then it's easy to train detectron2. And yes detectron2 convert the data into it's own data structure.
Hi, I need a little help, I'm new to this. I just want to test the pre-trained models of PubLayNet dataset on the test data given on PubLayNet. I have downloaded the pkl file and un-pickled it. But I don't know how to use it on test data. Can you help me out? Thank you
You could try this https://github.com/hpanwar08/document-layout-analysis-app
Hey guys @ShabariGadewar @zhxgj , do you have any jupyter notebook now for the publaynet pre-trained model loading and testing? I am also looking to test the pre-trained models of PubLayNet dataset on the test data given on PubLayNet but i also get the same error which @jmandivarapu1 was getting in his comment above. Also @hpanwar08, I was able to run your trimmed models but can you please guide on how can we further train your pre-trained models (trimmed ones) that you have included in your repo?
'pre-trained-models/Mask-RCNN/e2e_mask_rcnn_X-101-64x4d-FPN_1x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Traceback (most recent call last):
File "demo/demo.py", line 76, in
cfg = setup_cfg(args)
File "demo/demo.py", line 28, in setup_cfg
cfg.merge_from_file(args.config_file)
File "/opt/anaconda3/lib/python3.7/site-packages/detectron2/config/config.py", line 49, in merge_from_file
self.merge_from_other_cfg(loaded_cfg)
File "/opt/anaconda3/lib/python3.7/site-packages/fvcore/common/config.py", line 120, in merge_from_other_cfg
return super().merge_from_other_cfg(cfg_other)
File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 217, in merge_from_other_cfg
_merge_a_into_b(cfg_other, self, self, [])
File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 464, in _merge_a_into_b
_merge_a_into_b(v, b[k], root, key_list + [k])
File "/opt/anaconda3/lib/python3.7/site-packages/yacs/config.py", line 477, in _merge_a_into_b
raise KeyError("Non-existent config key: {}".format(full_key))
KeyError: 'Non-existent config key: MODEL.TYPE'
in support of @ankur-sentieo for a juypter/colab/binder notebook or at least a working docker container
If anyone is interested, I have trained it on Detectron2. You can find training config and trained models (resnet101, resnext101) in this repo https://github.com/hpanwar08/detectron2
Note: Models are trained on ~60% of the original dataset for ~1.5 epochs. But these models works good for fine-tuning domain specific dataset.
Thanks for that!
Note that the layout-parser team also retrained with detectron2. See:
https://github.com/Layout-Parser/layout-parser/blob/main/src/layoutparser/models/detectron2/catalog.py#L25
And
https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html#model-catalog
Hi, just wondering how I can save the annotation output from the model into the same format as Publaynet's dataset?