/Eager

Biomedical text mining and question answering.

Primary LanguageGroovyApache License 2.0Apache-2.0

Eager

Biomedical text mining and question answering.

Modules

  • api
    Interface definitions.
    This package will likely be removed.
  • core
    Common classes and utilities.
  • docs
    Documentation.
  • elasticsearch Not Used
    A placeholder project for eventual indexing and searching with ElasticSearch.
  • error
    Error logging service. Services can send logging message to consolidate error messages in a common location.
  • indexer
    Standalone program for creating the Solr index of PubMed and PubMed Central.
  • nlp
    Standalone service that uses Stanford CoreNLP to perform sentence splitting, tokenization, lemmatization, and part of speech tagging.
  • preprocess
    Process PMC documents to:
    1. Extract just the text content to a separate file.
    2. Create LIF versions with sentence, token, lemma', and pos` annotations.
    3. Create text versions with stop words, punctuation, numbers and symbols removed ready to be processed with word2vec or doc2vec
  • query
    Query processors. Accepts natural language from the user and converts it into a search engine query.
  • rabbitmq
    RabbitMQ messaging services.
  • ranking
    Document ranking algorithms.
  • retreival
    Standalone service for retrieving PubMed or PubMed central documents.
  • scraper-pubmedmedline
    Python script used to download and extract PubMed documents from the NIH FTP server.
  • solr
    Solr configuration files.
  • test (To be removed)
    Experimental programming. This module has nothing to do with actual testing.
  • upload
    Upload service for loading json into Galaxy.
  • web
    Spring Boot application that provides a web user interface and REST API.

Building

Running mvn install in the top level project directory will build all of the Java/Groovy modules, but not all modules are Maven projects.

Building The Web Application

The web project includes a Makefile that can be used to generate the Docker image and push the image to docker.lappsgrid.org.

$> make clean
$> make 
$> make docker
$> make push

Since the web project is a Spring Boot application simply run the jar file:

$> java -Xmx8G -jar eager.jar

Note In the (near) future JMX capabilities will be added which means the start up procedure will change considerably. Check for the presence of a startup.sh script in the root directory of the project.

Services

See the README.md files in each project for instruction on running that module.

The following modules are intended to be run as standalone services:

  1. error Error logging service used to collect error messages in a single location.
  2. nlp Stanford Core NLP processing service.
  3. retrieval Document retrieval service.
  4. upload Galaxy upload service.

All of the above services use RabbitMQ as a message broker. The nlp project has an example Groovy script for submitting documents to the Stanford NLP service for processing.

Applications

The following modules contain standalone programs that are intended to be run from the command line.

  1. indexer Creates the Solr index(es).
  2. [preprocess](preprocess/README.md

)