/drip-knowledge-extraction

Methods, materials and notes related to extracting dam removal attributes from published text documents

Primary LanguageJupyter NotebookOtherNOASSERTION

DRIP Knowledge Extraction

Methods, materials and notes related to extracting dam removal attributes from published text documents for the Dam Removal Information Portal (DRIP).

Components

app

Application with a command line interface to perform text mining over documents.

document-classification

Contains a web application that serves a document classification model to make predictions based on 3 categories: abiotic, biotic and abiotic&biotic.

citations

Notes for extracting structured citations from PDFs using grobid software.

preprocessing

Notes and documentation for batch converting PDFs to text for development using pdftotext.

Provisional Software Disclaimer

Under USGS Software Release Policy, the software codes here are considered preliminary, not released officially, and posted to this repo for informal sharing among colleagues.

This software is preliminary or provisional and is subject to revision. It is being provided to meet the need for timely best science. The software has not received final approval by the U.S. Geological Survey (USGS). No warranty, expressed or implied, is made by the USGS or the U.S. Government as to the functionality of the software and related material nor shall the fact of release constitute any such warranty. The software is provided on the condition that neither the USGS nor the U.S. Government shall be held liable for any damages resulting from the authorized or unauthorized use of the software.