Textpresso Central is a platform to perform full text literature searches, view and curate research papers, train and apply machine learning (ML) and text mining (TM) algorithm for semantic analysis and curation purposes. The user is supported in this task by giving him capabilities to select, edit and store lists of papers, sentences, term and categories in order to perform training and mining. The system is designed with the intent to empower the user to perform as many operations on a literature corpus or a particular paper as possible. It uses state-of-the-art software packages and frameworks such as the Unstructured Information Management Architecture (UIMA), Lucene and Wt. The corpus of papers can be build from fulltext articles that are available in PDF or NXML format.
libtpc is the core library of Textpresso Central, and includes functions to convert documents, annotate, index and search them.
To compile libtpc, the following libraries and programs are needed:
- cmake
- lucene++
- xerces-c
- apr
- aprutil
- icu
- boost
- podofo
- pqxx
- lighttp
- fcgi++
- magick
- wt
- curl
- eigen3
- postgresql
- cimg
- GTest
- bzip2
- wget
- unzip
- python-software-properties
- autoconf
- git
- uima
NOTE
cmake version >= 3.5 is required.
To compile and install libtpc, run the following commands from the root directory of the repository:
$ mkdir cmake-build-release && cd cmake-build-release
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make && make install
This will install libtpc in its default location (/usr/local/lib).
libtpc can be also compiled and installed in debug mode, with the following commands:
$ mkdir cmake-build-debug && cd cmake-build-debug
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make && make install
The users "www-data" and "textpresso" must be created, along with "www-data" and "textpresso" databases. To do so, install postgresql-client, and set "trust" method for local access to all users in /etc/postgresql/9.6/main/pg_hba.conf. Then, run the following commands to create the users:
$ sudo -u postgres createuser "www-data"
$ sudo -u postgres createuser textpresso
Enter the psql console by running:
$ sudo -u postgres psql
Inside the console, run the following commands to create the databases:
postgres=# CREATE DATABASE "www-data";
postgres=# CREATE DATABASE "textpresso";
NOTE
If you want to debug libtpc, create a user and a database with your username (the user that will launch the server instance) and grant all privileges on all tables, by running the following command in the psql console launched as user postgres:
postgres=# GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO "<your_username>";
postgres=# GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO "<your_username>";
The easiest way to populate the database is to restore it from a dump file, with the following command:
sudo -u postgres pg_restore -U postgres -d www-data www-data.20170523.tar
A Docker image based on Ubuntu 16.04 and with all the libraries required to compile and run libtpc and the other Textpresso projects (textpressocentral and tpctools) is vailable on Docker hub. To pull it, run the following command:
$ sudo docker pull valearna/ubuntu-tpc-buildtest:16.04
To run the image and connect to an interactive shell:
$ sudo docker run -p80:80 -it valearna/ubuntu-tpc-buildtest:16.04
libtpc and the other Textpresso projects can be directly compiled and installed on the image.
The repository contains a Dockerfile that can be used to generate a Docker image: Dockerfile-tpc
To build the image, run the following command:
$ sudo docker build -t <image_name>:<tag> -f Dockerfile-tpc <repository_root>
NOTE The image comes with libtpc pre-installed, but the other Textpresso project have to be manually installed from its console.
To connect to a running Docker container, execute the following command:
$ sudo docker exec -i -t <container_name_or_id> /bin/bash
After installing Textpressocentral, the literature data must be populated. To do so, copy the LuceneIndex folder from an existing Textpressocentral installation. For example, to copy the C. Elegans literature, run the following commands:
$ sudo scp -r <username>@<host>:/usr/local/textpresso/luceneindex/C.\\\ elegans_0/ /usr/local/textpresso/luceneindex/
$ scp -r <username>@<host>:/usr/local/textpresso/tpcas/C.\\\ elegans /usr/local/textpresso/tpcas/
$ sudo echo "C. elegans" > /usr/local/textpresso/luceneindex/subindex.config
$ sudo ln -s /usr/lib/cgi-bin/tc/images /usr/local/textpresso/tpcas/