Chart Infographics - Tools for chart annotations
The official relase of the tools developed at the University at Buffalo which have been previously used for annotation of charts at ICDAR 2019 CHART-Info competition (https://chartinfo.github.io/). This code is
Note. This is a sofware in early stage development. It is distributed without any warranties (as is), under a GNU public license version 3.0. A copy of the license is included in the package.
This tool was tested and intended for usage with Python 3.6+
Main Library Requirements:
- Pygame
- OpenCV
- Numpy
- Scipy
- Shapely
- PyTesseract
Main Annotation Tool
The main annotation tool, chart_annotator.py, allows to annotate chart images. Currently, only bar charts, line charts, and box plots are supported for full annotation. Other images can still be annotated at the level of panels (for multi-panel figures), chart classes (including other chart types currently not supported for data annotation), and text annotation (for chart images and other figures).
Usage:
python chart_annotator.py config [small/large]
Where:
- config: Path to the Configuration File
- size: Window Size (normal by default)
- small: for smaller window
- large: for larger window
Example:
python chart_annotator.py config.txt large
Note. The pygame window created by the annotation tool is sensitive to the local OS font scaling settings used for increased readability in high resolution displays. For windows, we recommend adding python.exe to the list of exceptions of these settings to avoid any scaling issues.
Note. A parameter can be added or removed to the config file in order to enable the Administrator mode. Note that only administrators can validate the annotations.
For administrators or supervisors of the annotation process use:
ENABLE_ADMIN_MODE = 1
For the rest of the annotators, just delete this line or set it to 0 to disable it.
ENABLE_ADMIN_MODE = 0
Chart Annotation Stats tool
An overview of the annotation process status can be obtained using the chart_stats.py program. This tool allows to check how many images have been annotated per class per stage.
Usage:
python chart_stats.py config
Where:
- config: Path to the Configuration File
Example:
python chart_stats.py config.txt
Tool for batch-merging annotations
This tool allows merging two existing sets of annotations for a single set of images. The tool assumes one of the annotation sources represents the newer version while the second represents the original annotations. Instead of simply using every annotation from the source to overwrite the annotations at the destination, this tool attempts to determine if the source contains a more complete annotation of each file at the destination and will only overwrite the annotations that have an absolute higher completion status (e.g. number of tasks annotated/validated).
Usage:
python chart_update_annotations.py src_config dst_config
Where:
- src_config: Source Configuration (newer annotations)
- dst_config: Destination Configuration (annotations to update)
Example:
python chart_update_annotations.py config_bar_charts.txt config_all_charts.txt
Tool for exporting XML annotations to JSON format used by CHART-Info 2019
Our annotation tool uses a custom XML-based file format to store all the annotations. However, for the Competition on HArvesting Raw Tables from Infographics (CHART-Infographics) at ICDAR 2019 we use a different file format based on JSON which is oriented to provided only the information required as input and output of each task of the competition. A synthetic dataset was also liberated using this file format. In order to generate JSON annotations from the XML files generated by our tool, the chart_json_export.py program is provided. This allows the creation of new annotated datasets which can be further evaluated using the evaluation tools from this competition.
Usage:
python chart_json_export.py img_folder xml_folder json_folder task_num test_mode
Where:
- img_folder: Path to Directory containing annotated images
- xml_folder: Path to Directory containing XML based annotations
- json_folder: Path to Directory where the generated JSON annotations will be stored.
- task_num: Competition task to export. Note that each task includes the outputs from some of the previous tasks automatically.
- test_mode: Determines if the JSON files are being generated for a testing dataset or not. Normally, the JSON files will contain both the input and outputs for the indicated task, but for testing datasets, only the inputs will be included for this task.
Example:
# This will generate a testing dataset for Task 3
python chart_json_export.py data/images data/annotations data/task3_json 3 0
Update (July 28, 2020)
- Extended, re-factored and improved JSON export
- New validations added for Task 4
- Added parsing-based generation of GT for Tasks 6a, 6b and 7 for:
- Bar Charts
- Box Charts
- Line Charts
- Scatter Charts
- Added partial support for additional Chart types included in ICPR 2020 - CHART-Infographics
- These can have annotationss of text, axes and legends but data annotation is not yet implemented
- Added the "Auto-Check" function to help detect annotation errors
- This feature is based on the same parsing functions used by the exporter
- Full chart parsing based on user annotations is attempted
- Errors found by the exporter are then reported to the annotator
- Annotations now record if Auto-check has been used and if it was a Success or a Failure
- The Chart Stats tools can be used to get a report on Auto-checks for a given set of charts
Update (July 7, 2020)
- Minor improvements on Line Chart Data Annotation
- For text regions containing data series names (e.g. from the legend):
- Double click over them will make the tool go to the edition mode menu for that line.
- Added Inverted color view modes.
- Bug fix for Box plot annotator to handle properly charts without categorical values using a single unnamed default category
Update (July 3, 2020)
- Minor improvements on Axis Annotation
- Zoom-in shown on right side panel:
- when setting or editing the axes bounding box
- when setting the new position of a given tick
- Ticks are now displayed on red by default if they do not have associated labels. Otherwise, they are shown in dark orange.
- Zoom-in shown on right side panel:
- Minor improvements on Line Chart Data Annotation
- It is now possible to swap data points between lines
- For text regions containing line names (e.g. from the legend or from data mark labels):
- Moving the mouse over them will highlight the corresponding line.
- Double click over them will make the tool go to the edition mode menu for that line.
- Double right-click can now be used to stop the "line point edition" mode.
Update (June 28, 2020)
- Minor improvements to data annotations
- In some cases, legend boxes will be automatically inferred from image
- Reduced the number of intermediate zoom levels between 100% and 400%
- Added heuristics to help annotate line and scatter charts by using right
click on "Add Points" mode.
- For Scatter Charts, the user can save time on charts containing solid colored marks by just placing the cursor over the data mark and then using right click, then the new point will be added on the centroid of current mark as shown on the right-side panel of the annotation tool.
- For Line Charts, the user can save time on charts containing solid colored lines by placing the cursor close enough to the line and then using right click to add the suggested line point shown in green in the right side panel.
Update (June 17, 2020)
- Major improvements to data annotations
- Tools for Bar Charts and Box Charts now use draggable tools to quickly adjust bar/box parameters
- Tool for Bar Charts now includes semi-automatic bar height adjustment which greatly simply annotations of bar chats using solid color bars.
- Toos for Axes and Line Charts now use draggable tools to make adjusting points much easier/faster
Update (June 05, 2020)
- Major improvements to text annotation including:
- Improved Text Box Behavior
- Copy/Paste Options
- Two new text roles: Tick Grouping and Data Mark Label
- Data Mark labels are now used as the next default name for data series on charts which have them but do not include legends
- Other minor improvements on the UI