PrettyPaper
Member Info
Work Group
Data Filter (3 people a group)
1.- Determine whether the block of data is related to the paper or not.
- If there are any data from another paper, you have to remove them.
- Skill Set: majority for
textual analysis
,python pdf parser
, maybe someimage processing
Normal Paper Paragraph Segmentation (5 people a group)
2.- Extract metadata, main title, and subtitle in a paper.
- Extract all the content beyond each subtitle.
- Skill Set:
textual analysis
,python pdf parser
Component Object Detection (4 people a group)
3.- Extract figures, tables, and charts in a paper. The output result would be image data.
- Skill Set:
Multiple Object Detection
,image processing
,Machine Learning
,python pdf parser
and maybe sometextual analysis
Img Paper Paragraph Segmentation (5 people a group)
4.- Separate the whole image page into parts by each paragraph.
- Segment subtitle and content in each block.
- Skill Set:
Image Segmentation
,image processing
,Machine Learning
,python pdf parser
and maybe sometextual analysis
OCR, lang_trans (3 people a group)
5.- Extract text data from image data with the OCR technique after image pre-processing.
- Skill Set:
OCR (Tesseract)
,image processing
,Machine Learning
,python pdf parser
and maybe sometextual analysis