This is the internal UIUC repo for the Legend Extraction portion of DARPA CMASS project.
- Finish ReadME
- Add Description of all the plots
- Put comments in notebooks so they are useful as learning tools
This repo requires the following packages:
- opencv
- matplotlib
- pandas
- tqdm
- easyocr
legendExtractor.py is the example script for showing how to use the legend extraction code.
# Example Usage
python LegendExtractor.py training/images -o train_legend_results -m training/uncharted_masks -t training/usgs_legends
Argument | Default | Description |
---|---|---|
data | N/A | Path to the directory containing the images to perform legend extraction on. |
-o, --outputDir | legend_results | Path to the directory that the resulting json will be written to. If automatic scoring or debugging is enabled there respective outputs will be written here as well. |
-m, --maskDir | None | Path to a directory containing json files to be used for masking out the legend area. json should be in uncharted format and json file should have same name as its corrasponding image file. |
-t, --truthDir | None | Path to a directory containing json files to be used as truth values in automatic scoring. json should be in usgs format and json file should have same name as its corrasponding image file. |
--debug | False | Flag to set enable debug statements in the log. |
extractLegends is the main function for generating predicted legends. It takes the map image and returns a json of the predictions in the USGS style. filename is an optional argument to add the img_filename that it was generated on which is consistent with USGS given jsons.
This function converts the output of extractLegends to the json format that follows the usgs format from the compition
scoreLegendExtraction is the main function for scoring and takes the map image the json of the truth legends and the json of predicted legends. It then scores these based on the label and intersection of the contours.
outputDir is an optional argument that controls weather the debugging plots are generated. When the outputDir argument is set scoreLegendExtraction will save the scores of each map as well as an annotated version of the map with some helpful plots and cutouts of all of the missed and ocr failed polygon labels. The legendExtractionDemo adds on this by producing a full dataset score and overview files as well. The structure of this output is as shown below.
outputDir
├── #[sourceDir]_Overview.png
├── #[sourceDir]_Scores.csv
├── Map1
│ ├── #Map1_Overview.png
│ ├── #Map1_Scores.png
│ ├── Poly1.tif
│ ├── Poly2.tif...
│ └── PolyX.tif
├── Map2...
└── MapX
The USGS json format is mostly the same format as the one used for the CMA competition, but now the shape_type field indicates whether there are 2 points (rectangle) or 4 points (quad) to describe the bounding box.
Current as of Dec 2023
The Uncharted Json is based on the TA1 schema (i.e., list of PageExtraction schema objects) and has three segmentation classes:
- map -- the main map region
- legend_polygons -- legend region for polygon features
- legend_points_lines -- legend region for point and/or line features
Current as of Dec 2023
This repo was created by and currently maintained by Albert Bode.