Pinned Repositories
ami
cm-crawlerd
ContentMine crawler daemon - this finds the latest articles in journals we mine, and stores them in our scraping queue
FutureTDM
Materials of FutureTDM project
getpapers
Get metadata, fulltexts or fulltext URLs of papers matching a search query
journal-scrapers
Journal scraper definitions for the ContentMine framework
norma
Convert XML/SVG/PDF into normalised, sectioned, scholarly HTML
quickscrape
A scraping command line tool for the modern web
scraperJSON
The scraperJSON standard for defining web scrapers as JSON objects
thresher
Headless scraperJSON scraping for Node.js
workshop-resources
This repository contains material helping you to set up a ContentMine workshop. It also includes tutorials for learning the ContentMine tools on your own.
The ContentMine's Repositories
ContentMine/getpapers
Get metadata, fulltexts or fulltext URLs of papers matching a search query
ContentMine/journal-scrapers
Journal scraper definitions for the ContentMine framework
ContentMine/norma
Convert XML/SVG/PDF into normalised, sectioned, scholarly HTML
ContentMine/canary
Canary is a UI to the contentmine tools getpapers, quickscrape, norma, and ami.
ContentMine/NCBI2wikidata
ContentMine/canary-perch
ES Academic paper fact extraction - backend for canary
ContentMine/vms
ContentMine virtual machines
ContentMine/sciencesource-wikibase-docker
🐳 Docker images and compose file for Wikibase and the query service
ContentMine/wikibase
Simple golang library for interfacing with wikibase.
ContentMine/cephis
Document processing including support libraries and PDFBox2
ContentMine/cm-uclii
Data and progress tracking for table extraction and semantically guided content enhancement
ContentMine/CMServices
Web services layer for ContentMine text and data mining tools and utilities
ContentMine/contentmine-gui
GUI for executing ContentMine commands - browser SPA for running locally on user's machine.
ContentMine/dictionaries
Dictionaries for use with `ami` , including some management software
ContentMine/imageanalysis
ContentMine Fork of the WWMM imageanalysis Package
ContentMine/pdf2svg
ContentMine Fork of the WWMM pdf2svg Package
ContentMine/ScienceSourceReview
ContentMine/ahocorasick
A Golang implementation of the Aho-Corasick string matching algorithm
ContentMine/cm-pom
Parent POM for ContentMine Java/MVN stack
ContentMine/CMForestPlots
Things for managing the ContentMine forest plot functionality in normal
ContentMine/cproject
ArgProcessor and files for basic CMDirectories. Often subclassed. Needs to be separate from euclid and norma
ContentMine/euclid
ContentMine Fork of the WWMM Euclid Package
ContentMine/go-europmc
Simple Go library for working with openXML papers form EuroPMC
ContentMine/junk
analysis of documents containing forest plots in Stata format
ContentMine/normami
A tool to convert a variety of inputs into normalized, tagged, XHTML (with embedded/linked SVG and PNG where appropriate).
ContentMine/ScienceSourceIngest
Tool for importing openXML format papers into ScienceSource
ContentMine/stataforestplots
documents and tests relating to ForestPlots in Stata format
ContentMine/svg2xml
ContentMine Fork of the WWMM svg2xml Package
ContentMine/svghtml
Combined SVG and HTML repos and building functionality
ContentMine/UCL-ForestPlots