FigureSeer: Parsing Result-Figures in Research Papers

Introduction

FigureSeer is a system for parsing result-figures in research papers. It automatically localizes figures, classifies them, and analyses their content.

Citing FigureSeer

If you find FigureSeer useful in your research, please consider citing:

@inproceedings{siegelnECCV16figureseer,
    Author = {Noah Siegel and Zachary Horvitz and Roie Levin and Santosh Divvala and Ali Farhadi},
    Title = {FigureSeer: Parsing Result-Figures in Research Papers},
    Booktitle = {European Conference on Computer Vision ({ECCV})},
    Year = {2016}
}

Requirements: Software

Caffe and its Matlab interface (http://caffe.berkeleyvision.org/installation.html)
JSONlab (https://www.mathworks.com/matlabcentral/fileexchange/33381-jsonlab--a-toolbox-to-encode-decode-json-files)

Requirements: Hardware

The default configuration for FigureSeer runs entirely on CPU. The CNN patch embedding feature used for data tracing is computantionally expensive and is disabled by default. If running on a GPU, you can enable it by setting "conf.useGPU = true" and "conf.usePatchCnn = true" in setConf.m.

Installation

Clone the repo with git clone --recursive https://github.com/allenai/figureseer
Download model weights: from the FigureSeer root directory, run aws s3 cp --recursive s3://ai2-website/data/figureseer/neural-networks/ data/models/neural-networks/
Compile pdffigures (included in the dependencies directory)
In setConf.m, edit 'conf.caffeRoot' to point to your Caffe installation.
Run 'main.m'

To run on your own PDFs, simply add them to figureseer/data/pdfs and run main.

Data

Data used for training models is available at the project webpage: http://allenai.org/plato/figureseer/.

License

FigureSeer is released under the GPLv2 License.

fralomba/figureseer