Open Extraction Works

A set of services to spider and extract metadata (abstracts, titles, pdf links) from URLs.

Most services may be run individually as command line applications, or as an integrated pipeline with a REST API

Commandline Applications

Integrated Pipeline Usage

Production machine setup and Deployment (for Integrated Pipeline)

Requirements

Docker

Postgres

Notes on Development setup

Adding/Modifying Field Extraction Rules