Syncs the OMOP concept with a SolR instance
- OMOP concept
- load the OMOP concept table from ATHENA CSV
- extends the model with external informations
- SolR cloud
- install and configure apache SolR
- Spark
- install and configure apache Spark
- ETL postgres -> SolR
- install the requirements
- install pyLivy from pyLivy
- use python3.6+
In order to make zookeeper able to ingest large configurations such synonyms
- add SOLR_OPTS="$SOLR_OPTS -Djute.maxbuffer=0x9fffff" to the '$SOLR_HOME/bin/solr.in.sh'
- define
$SPARK_HOME
linux env variable
Livy configuration can be found: $LIVY_HOME
/conf/livy.conf
You will need at least 65000 open files to make spark and solr work fine.
sudo bash ulimit -n 8192 sudo -
Edit the /etc/security/limits.conf file Add:
-
hard nofile 65000
-
soft nofile 65000
root hard nofile 65000 root soft nofile 65000
The spark library are loaded thought apache livy. They are specified into a yaml file and loaded from the pylivy library.
Clone the spark-postgres Compile it and move the shaded jar into some place.
Clone the spark-postgres Compile it and move the shaded jar into some place.