/chemdataextractor2

ChemDataExtractor Version 2.0

Primary LanguageHTMLOtherNOASSERTION

ChemDataExtractor

ChemDataExtractor v2 is a toolkit for extracting chemical information from the scientific literature. Python 3.5 to Python 3.8 supported.

Installation

pip install chemdataextractor2

Features

  • HTML, XML and PDF document readers
  • Chemistry-aware natural language processing pipeline
  • Chemical named entity recognition
  • Rule-based parsing grammars for property and spectra extraction
  • Table parser for extracting tabulated data
  • Document processing to resolve data interdependencies

Documentation & Development

Please read the documentation for instructions on contributing to the project.

https://cambridgemolecularengineering-chemdataextractor-development.readthedocs-hosted.com/en/latest/

License

ChemDataExtractor v2 is licensed under the MIT license_, a permissive, business-friendly license for open source software.

MIT license: https://github.com/CambridgeMolecularEngineering/ChemDataExtractor/blob/master/LICENSE