AmCAT - Amsterdam Content Analysis Toolkit
Note: the following instructions are for the unstable development version. To install stable releases, please see the readme file for those releases:
Installation instructions for 3.4 (stable)
Installation and Configuration for development version
Prerequisites
Most of the (python) prerequisites for AmCAT are automatically installed using pip (see below). To install the non-python requirements, you can use the following (on Ubuntu 15.10 or 16.04):
sudo apt-get install antiword unrtf rabbitmq-server python3-pip postgresql postgresql-contrib python3-venv git postgresql-server-dev-9.4 python3-dev libxml2-dev libxslt-dev graphviz pspp
If you want to compile lxml and psycopg2 yourself (through pip), you need to install:
sudo apt-get build-dep python3-psycopg2 python3-lxml
You can avoid compiling libraries by installing some dependencies through apt:
sudo apt-get install python3-lxml python3-amqplib python3-psycopg2 python3-requests python3-pygments
It is probably best to install AmCAT in a virtual environment. Run the following commands to setup and activate a virtual environment for AmCAT: (on ubuntu)
pyvenv env
source env/bin/activate
If you use a virtual environment, every time you start working with AmCAT you need to repeat the source
line to load the environment. If you don't use a virtual environment, you will need to run most pip command below using sudo
.
Database
AmCAT requires a database to store its documents in. The default settings look for a postgres database 'amcat' on localhost. To set up the current user as a superuser in postgres and create the database, use:
sudo -u postgres createuser -s $USER
createdb amcat
Elastic
AmCAT uses elasticsearch for searching articles. Elasticsearch is provided as a debian package, but it does need some extra plugins to be ready for AmCAT.
wget https://download.elasticsearch.org/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.1.1/elasticsearch-2.1.1.deb
sudo dpkg -i elasticsearch-2.1.1.deb
rm elasticsearch-2.1.1.deb
# Install plugins
cd /usr/share/elasticsearch
sudo bin/plugin install mobz/elasticsearch-head
sudo bin/plugin install analysis-icu
sudo bin/plugin install amcat/hitcount
# Enable hitcount as default similarity provider, and enable groovy scripting
cat <<EOT | sudo tee --append /etc/elasticsearch/elasticsearch.yml
index.similarity.default.type: hitcountsimilarity
script.inline: on
script.update: on
EOT
# Restart elastic
sudo systemctl stop elasticsearch
sudo systemctl start elasticsearch
Warning: We enabled a non-sandboxed scripting language (Groovy). Make sure to restrict access to the elastic instance by untrusted parties, as this allows executing arbitrary code as the user elasticsearch
.
Installing AmCAT
Clone the project from github and pip install the requirements.
git clone -b release-3.4 https://github.com/amcat/amcat.git
pip install -r amcat/requirements.txt
Be sure to add the new directory to the pythonpath and add AMCAT_ES_LEGACY hash to the environment. If you add these lines to amcat-env/bin/activate they will be automatically set when you activate.
export PYTHONPATH=$PYTHONPATH:$HOME/amcat
export AMCAT_ES_LEGACY_HASH=N
Collecting static files
AmCAT uses bower to install javascript/CSS libraries. On Ubuntu, you need to install the legacy version of nodejs
first, and then install bower by using npm
:
sudo apt-get install nodejs-legacy npm
sudo npm install -g bower
Then, in the top-directory of AmCAT itself run:
bower install
Setting up the database
Whichever way you installed AmCAT, you need to call the migrate command to populate the database and set the elasticsearch mapping:
python -m amcat.manage migrate
You can create a superuser by running:
python -m amcat.manage createsuperuser
Start AmCAT web server
For debugging, it is easiest to start amcat using runserver:
python -m amcat.manage runserver
Start celery worker
Finally, to use the query screen you need to start a celery worker. In a new terminal, type:
DJANGO_SETTINGS_MODULE=settings celery -A amcat.amcatcelery worker -l info -Q amcat
(if you are using a virtual environment, make sure to activate
that first)
Configuring AmCAT
The main configuration parameters for AmCAT reside in the settings folder. In many places, these settings are defaults that can be overridden with environment variables.