Prototype for the GeneriX project.
Each step in the installation is described below. If you already have a prerequisite step installed, you can skip it.
install package managers pip3 and npm, and setuptools:
apt-get install python3-pip npm nodejs python3-setuptools
upgrade pip:
pip3 install --upgrade pip
pip3 install pandas pyArango dumper xarray
apt-get install apache2
This installation guide assumes you use SSL. If on a public server, you can get a SSL certificate from letsencrypt.org. If on a private development server, you can generate your own certificate (in /etc/ssl/certs) using these directions from letsencrypt.org:
openssl req -x509 -out localhost.crt -keyout localhost.key \
-newkey rsa:2048 -nodes -sha256 \
-subj '/CN=localhost' -extensions EXT -config <( \
printf "[dn]\nCN=localhost\n[req]\ndistinguished_name = dn\n[EXT]\nsubjectAltName=DNS:localhost\nkeyUsage=digitalSignature\nextendedKeyUsage=serverAuth")
mv localhost.key ../private
(based on https://jupyterhub.readthedocs.io/en/stable/quickstart.html)
install jupyter:
pip3 install jupyter
install jupyterhub and dependencies:
pip3 install jupyterhub
npm install -g configurable-http-proxy
pip3 install notebook
useradd jupyterhub
remember to set shell to nologin, add to shadow group
set up /etc/jupyterhub and /srv/jupyterhub files as described in docs above, or copy from another installation.
make these files owned by jupyterhub, delete old sqlite and jupyterhub_cookie_secret
ssl certs need to be readable by jupyterhub user
pip3 install sudospawner
apt-get install sudo
note: was getting "OSError: [Errno 99] Cannot assign requested address" on spawning jupyterhub
this is because ip 127.0.0.1 needs to be explicitly specified
had to create file /usr/local/bin/sudospawner-singleuser:
#!/bin/bash -l
exec "/usr/local/bin/jupyterhub-singleuser" --ip 127.0.0.1 $@
In /etc/sudoers:
Cmnd_Alias JUPYTER_CMD = /usr/local/bin/sudospawner
# actually give the Hub user permission to run the above command on behalf
# of the clearinghouse users without prompting for a password:
%jupyterhub ALL=(jupyterhub) /bin/sudo
jupyterhub ALL=(%clearinghouse) NOPASSWD:JUPYTER_CMD
add users who use clearinghouse, and jupyterhub, to linux group clearinghouse
set up new base directory for clearinghouse
mkdir /home/clearinghouse/
cd /home/clearinghouse
chown jupyterhub .
chgrp clearinghouse .
chmod -R g+w .
chmod -R g+s .
setfacl -dm g:clearinghouse:rw .
Follow directions from ArangoDB website; something like this:
cd /tmp
curl -OL https://download.arangodb.com/arangodb36/DEBIAN/Release.key
apt-key add - < Release.key
echo 'deb https://download.arangodb.com/arangodb36/DEBIAN/ /' | tee /etc/apt/sources.list.d/arangodb.list
apt-get install apt-transport-https
apt-get update
apt-get install arangodb3=3.6.1-1
in /etc/systemd/system, make jupyterhub.service:
[Unit]
Description=Jupyterhub
After=network-online.target
[Service]
User=jupyterhub
ExecStart=/usr/local/bin/jupyterhub --JupyterHub.spawner_class=sudospawner.SudoSpawner
WorkingDirectory=/etc/jupyterhub
[Install]
WantedBy=multi-user.target
make it start automatically:
systemctl enable jupyterhub
service jupyterhub start
to view output when testing:
service jupyterhub status
enable all options needed by apache:
a2enmod ssl rewrite proxy proxy_http proxy_wstunnel
in apache conf (/etc/apache2/sites-enabled/000-default.conf):
SSLProxyEngine on
SSLProxyVerify none
SSLProxyCheckPeerCN off
SSLProxyCheckPeerName off
SSLProxyCheckPeerExpire off
ProxyPreserveHost On
ProxyRequests off
TraceEnable Off
<Location /jupyterhub>
ProxyPass https://localhost:8000/jupyterhub
ProxyPassReverse https://localhost:8000/jupyterhub
ProxyPassReverseCookieDomain localhost YOUR_FULL_DOMAIN_NAME_HERE
</Location>
<LocationMatch "/jupyterhub/(user/[^/]*)/(api/kernels/[^/]+/channels|terminals/websocket)(.*)">
ProxyPassMatch wss://localhost:8000/jupyterhub/$1/$2$3
ProxyPassReverse wss://localhost:8000/jupyterhub/$1/$2$3
</LocationMatch>
<Location /arangodb/>
ProxyPass http://localhost:8529/
ProxyPassReverse http://localhost:8529/
ProxyPreserveHost On
AuthType Basic
AuthName "Restricted Content"
AuthUserFile /etc/apache2/htpasswd
Require valid-user
</Location>
<Location /_db/>
ProxyPass http://localhost:8529/_db/
ProxyPassReverse http://localhost:8529/_db/
ProxyPreserveHost On
</Location>
<Location /_api/>
ProxyPass http://localhost:8529/_api/
ProxyPassReverse http://localhost:8529/_api/
ProxyPreserveHost On
</Location>
Set up Data Clearinghouse front end links
e.g., in /home/httpd/html/dc/index.html:
<html>
<body>
Data Clearinghouse
<ul>
<li><a href="/generix-ui/">CORAL UI</a>
<li><a href="/jupyterhub/">JupyterHub</a>
<li><a href="/arangodb/">ArangoDB</a>
</ul>
</body>
</html>
Now that you can get at ArangoDB through Apache, log in and create databases. You need at least a "production" database, but you could have "test" or other versions for development. They can be called whatever you want.
Here I'm going to assume you have only a "prod" environment for production. But as mentioned above, you might want more than one, for testing and development.
cd /home/clearinghouse
mkdir prod
cd prod
mkdir images
mkdir data_import
mkdir data_store
mkdir data_store/tmp
mkdir notebooks
mkdir modules
cd modules
git clone git@github.com:jmchandonia/generix_prototype.git
load in the data from a jupyter notebook:
rsync data_import, notebooks, images from server with data
define your process type and all other static types in var/typedef.json
set up var/upload_config.json with all the filenames of all ontologies, bricks, entities, and processes that you want to load.
set up any predefined brick type templates in var/brick_type_templates.json
make var/config.json based on the following template:
{
"Import":{
"ontology_dir": "/home/clearinghouse/prod/data_import/ontologies/",
"entity_dir": "/home/clearinghouse/prod/data_import/data/",
"process_dir": "/home/clearinghouse/prod/data_import/data/",
"brick_dir": "/home/clearinghouse/prod/data_import/data/"
},
"Workspace":{
"data_dir": "/home/clearinghouse/prod/data_store/"
},
"ArangoDB": {
"url": "http://127.0.0.1:8529",
"user": "YOUR_USER_NAME",
"password": "YOUR_PASSWORD",
"db": "YOUR_PRODUCTION_DATABASE_NAME"
},
"WebService": {
"port": 8082,
"https": true,
"cert_pem": "PATH_TO_FULLCHAIN.PEM file",
"key_pem": "PATH_TO_PRIVKEY.PEM file",
"plot_types_file": "plot_types.json"
}
}
make a "reload_data" notebook to load and set everything up, then run it.
sample "reload data" notebook contents (e.g., in /home/clearinghouse/prod/notebooks/reload_data.ipynb):
from generix.dataprovider import DataProvider
from generix import toolx
toolx.init_system()
this will set up tables required for web services to start.
These run in a virtualenv, so install this first:
pip3 install virtualenv
python3 -m virtualenv /home/clearinghouse/env/
source /home/clearinghouse/env/bin/activate
pip3 install flask flask_cors pandas simplepam pyjwt pyArango dumper xarray openpyxl
note: Be careful to install pyjwt, NOT jwt! Or login will fail!
create /etc/systemd/system/generix-web-services.service:
[Unit]
Description=Generix Web Services
After=network.target
[Service]
User=root
EnvironmentFile=/etc/sysconfig/generix-web-services
ExecStart=/home/clearinghouse/env/bin/python -m generix.web_services
WorkingDirectory=/home/clearinghouse/prod/modules/generix_prototype
Restart=always
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
create /etc/sysconfig/generix-web-services:
PATH=/home/clearinghouse/env/bin:/usr/local/bin:/usr/bin:/bin
PYTHONIOENCODING=utf-8
PYTHONPATH=/home/clearinghouse/env/
VIRTUAL_ENV=/home/clearinghouse/env/
to start web services
service generix-web-services start
debug by looking in /var/log/daemon.log