Introduction
Aggie is a web application for using social media and other resources to track incidents around real-time events such as elections or natural disasters.
Aggie can retrieve data from several sources:
- Twitter (tweets matching a keyword search)
- Telegram
- Crowdtangle (Facebook, Instagram, and Reddit posts from publicly accessible groups and pages)
- RSS (article titles and descriptions)
- ELMO (answers to survey questions)
Items (called reports) from all sources are streamed into the application. Monitors can quickly triage incoming reports by marking them as relevant or irrelevant.
Relevant reports can be grouped into incidents for further monitoring and follow-up.
Reports are fully searchable and filterable via a fast web interface.
Report queries can be saved and tracked over time via a series of visual analytics.
Aggie is built for scalability and can handle hundreds of incoming reports per second. The backend fetching and analytics systems feature a modular design well-suited to parallelism and multi-core architectures.
Users can be assigned to admin, manager, monitor, and viewer roles, each with appropriate permissions.
Aggie is built using Angular.js and Express.js, both state-of-the-art development frameworks.
Contact mikeb@cc.gatech.edu for more information on the Aggie project.
Sassafras Tech Collective offers managed instances of Aggie, along with development and support services.
Table of Contents
- Using the Application
- Source Installation
- Maintenance
- Project Configuration
- Architecture
- Building and Publishing Aggie's documentation
Using the Application
Extensive documentation about using the application can be found in ReadTheDocs page.
Source Installation
We recommend the semi-automated installation script below to install the required components on Ubuntu.
System requirements
Again, see below for automated installation.
- node.js (v12.16 LTS)
- Use Node Version Manager.
- Node Version Manager (nvm) allows multiple versions of node.js to be used on your system and manages the versions within each project.
- After installing nvm:
- Navigate to the aggie project directory:
cd aggie
. - Run
nvm install
to install the version specified in.nvmrc
.
- Navigate to the aggie project directory:
- Use Node Version Manager.
- Mongo DB (requires >= 4.2.0)
- Follow the installation instructions for your operating system.
- Make sure MongoDB is running:
- On Linux run
sudo systemtl status mongod
to see whether themongod
daemon started MongoDB successfully. If there are any errors, you can check out the logs in/var/log/mongodb
to see them.
- On Linux run
- Note: You do not need to create a user or database for aggie in Mongo DB. These will be generated during the installation process below.
- (optional) SMTP email server
- Required in production for adding new users.
- (optional) JRE
- Java is only required for running end-to-end tests with protractor. Installing Java can be safely skipped if these tests are not needed.
- Install the Java SE Runtime Environment (JRE) from Oracle or your package manager
- (optional) Python (requires >= 2.7)
- Python 2.7 is required to use the hate speech classifier (presently only available for Burmese language).
- Python 2.7 is required because one of the dependencies (Burmese Language Tools) for segmenting Burmese text is written in Python 2.7.
Installation notes
Again, see below for automated installation.
- Clone the aggie repo.
- In your terminal, navigate to your main projects folder (e.g. Documents).
- Use this command:
git clone https://github.com/TID-Lab/aggie.git
. cd aggie
- Copy
config/secrets.json.example
toconfig/secrets.json
.- Set
adminPassword
to the default password your want to use for theadmin
user during installation. - For production, set
log_user_activity
flag totrue
. For testing, set it asfalse
(default value). - If using hate speech indication icons, set hateSpeechThreshold at the threshold the icon appears (.0 - 1) and set enable to true.
- Set
- (optional, rarely needed) To make https work, you need to copy your SSL certificate information to the
config
folder (two files namedkey.pem
andcert.pem
).- If you do not have the certificate you can create a new self-signed certificate with the following command:
openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -days 365
- This will allow you to start the server but it will generate unsafe warnings in the browser. You will need a real trusted certificate for production use.
- Adding the
-nodes
flag will generate an unencrypted private key, allowing you to run tests without passphrase prompt
- If you do not have the certificate you can create a new self-signed certificate with the following command:
- Hate speech detection is available for Burmese language. Set up steps are listed in Semi-automated installation script. In config/secrets.json, set the detectHateSpeech parameter to true. The API will run on http://localhost:5000. User will never directly interact with this API.
- Run
npm install
from the project directory.- This installs all dependencies and concatenates the angular application.
- (optional) Run
npm install -g gulp mocha karma-cli protractor migrate
.- This installs some tools globally which can then be run from the command line for testing.
- You will most likely need Google Chrome installed on your computer for the protractor tests to run.
- This is optional, as
npx
provides easy access to the local copies of these that are installed bynpm install
- To start server in production mode, run
npm start
. Usenpm run dev
for development.- In your terminal, a user and password were generated. You will use these credentials to log into the application. Example:
"admin" user created with password "password"
.
- In your terminal, a user and password were generated. You will use these credentials to log into the application. Example:
- Navigate to
https://localhost:3000
in your browser.- This will show you the running site. Login with the user name and password from your terminal mentioned above.
- If you did not set up the SSL certificate, use
http://localhost:3000
instead
Semi-automated installation script
This is intended for setup on a fresh Ubuntu v18.04 system. Setup may need to be modified for other linux systems.
If it says "user input", you won't want to paste anything beyond that until addressing the input.
# Set up system
export EDITOR=vim # Option 1
export EDITOR=nano # Option 2
sudo apt update
sudo apt install -y ntp nginx software-properties-common
sudo systemctl enable ntp
sudo snap install certbot --classic
sudo ln -s /snap/bin/certbot /usr/bin/certbot
# Nginx server and SSL. Source: https://certbot.eff.org/lets-encrypt/ubuntubionic-nginx
sudo curl -o /etc/nginx/sites-available/aggie.conf https://raw.githubusercontent.com/TID-Lab/aggie/develop/docs/content/aggie-nginx
sudo ln -s /etc/nginx/sites-available/aggie.conf /etc/nginx/sites-enabled/
sudo rm /etc/nginx/sites-enabled/default
# User input: Customize nginx settings with your domain name.
sudo $EDITOR /etc/nginx/sites-available/aggie.conf
# User input: Set up SSL with a couple of prompts.
sudo certbot --nginx
# User input: Set up SSL certificate auto-renewal.
crontab -e
# Paste the following line in crontab, replacing `X` with the current minutes + 1
# (e.g. if it's 12:15pm, write `16` instead of `X`):
X * * * * PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin && sudo /usr/bin/certbot renew --no-self-upgrade > ${HOME}/certbot-cron.log 2>&1
# Then wait until that time occurs, and verify that it logged a renewal attempt:
cat ~/certbot-cron.log
# You should see something like "Cert not yet due for renewal / No renewals were attempted."
# which means the certificate is valid and the cron job is running.
# If you make any config changes later, always run this afterward:
sudo systemctl restart nginx
# Mongo DB. Source: https://docs.mongodb.com/v4.2/tutorial/install-mongodb-on-ubuntu/
wget -qO - https://www.mongodb.org/static/pgp/server-4.2.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/4.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.2.list
sudo apt update
sudo apt install -y mongodb-org zip
sudo systemctl enable mongod
# Optional: Increase ulimits via https://docs.mongodb.com/manual/reference/ulimit/.
# This will affect DB performance in some cases.
# Finally:
sudo systemctl restart mongod
# Node version manager (nvm). Source: https://github.com/nvm-sh/nvm#installing-and-updating
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | bash
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion
# Set up Aggie
git clone https://github.com/TID-Lab/aggie.git
cd aggie
nvm install && npm install
cp config/secrets.json.example config/secrets.json
# User input: Customize Aggie settings per the README instructions.
# This includes adding your SMTP email server credentials, detectHateSpeech option etc.
$EDITOR config/secrets.json
# User input: Get CrowdTangle sources per the README instructions, if using them.
# Otherwise stub it:
echo "{}" > config/crowdtangle_list.json
# Follow these steps for setting up hate speech API for Burmese
python --version # check if you have python 2 installed.
cd hate-speech-api
npm install forever -g # install `forever` npm module.
pip install virtualenv # install virtual environment.
virtualenv venv # create virtual environment.
source venv/bin/activate # activate virtual environment
pip install -r requirements.txt # install dependencies
# User input: Set `detectHateSpeech: true`.
$EDITOR config/secrets.json
# User input: Set the script to run on startup.
crontab -e
# Paste the following line in crontab:
@reboot bash -c 'source $HOME/.nvm/nvm.sh; forever start -o $HOME/aggie/logs/hate-speech-out.log -e $HOME/aggie/logs/hate-speech-err.log -c python $HOME/aggie/hate-speech-api/hate_speech_clf_api.py > hate-cron.log 2>&1'
# Reboot the machine and make sure the Hate Speech API is available on port 5000:
curl localhost:5000
# Ready! Test run:
npm start
# Now verify Aggie is online at your URL, then kill this process (ctrl+c) when you're done.
# Optional troubleshooting if it doesn't work:
curl localhost:3000
# This should return an HTML response starting with something like <html lang="en" ng-app="Aggie">
# If this works but you can't access Aggie publicly, check your network config to make sure ports 80 and 443 are exposed.
# Final steps
# User input: Print a script to run that will enable Aggie on startup.
npx pm2 startup
# Copy/paste the last line of output as instructed.
# Start (or restart) Aggie in the background; save the PM2 state for startup.
npm run serve
npx pm2 save
# If you ever modify secrets.json, restart the app by running (in the `aggie` directory):
npx pm2 restart aggie
# OPTIONAL User input: Restart Aggie every 6 hours if you have high traffic. Memory leaks are in the process of being addressed.
crontab -e
# Paste the following line in crontab:
0 */6 * * * bash -c 'source $HOME/.nvm/nvm.sh && cd $HOME/aggie && npx pm2 restart aggie > $HOME/restart-cron.log 2>&1'
# User input: Enable log rotation.
sudo $EDITOR /etc/logrotate.conf
# Paste the following, changing `/home/my_user` to the location of the `aggie` folder.
/home/my_user/aggie/logs/*.log
/home/my_user/.pm2/logs/*.log
/var/log/mongodb/*.log
{
daily
missingok
rotate 12
compress
delaycompress
notifempty
copytruncate
}
# Whenever you need to, you can view app logs by running (in the `aggie` directory):
npx pm2 logs
Semi-automated upgrade
Save backup:
# Back up your database.
export DATE=`date -u +"%Y-%m-%d"`; mongodump -o "mongodump-$DATE"
# OR authenticated (will prompt for your password):
export DATE=`date -u +"%Y-%m-%d"`; mongodump -o "mongodump-$DATE" -d aggie -u admin
# Compress the data to save disk space.
zip -r "mongodump-$DATE.zip" "mongodump-$DATE"
rm "mongodump-$DATE" -rf
Quick upgrade:
cd aggie # Go to where you originally saved Aggie.
alias assertClean='git diff --exit-code && git diff --cached --exit-code' # Check for dirty files.
assertClean && (git pull && npm install && npx pm2 restart aggie) || echo "Dirty." # Serve the new version.
Full upgrade if the above fails:
cd aggie # Go to where you originally saved Aggie.
git status # Check if anything is modified (this should be rare).
git add -A; git add -u; git stash # Save any files you may have changed.
git branch # Make sure you're on 'develop' (or whatever you need to be on).
git pull # Get upstream changes.
git stash pop # Only if you had changes saved earlier.
# ! Make sure to resolve any conflicts if there are any.
git status # Check if it looks right.
npm install # Make sure dependencies are up to date.
npx pm2 restart aggie # Serve the new version.
Maintenance
- To run migrations run
npx migrate
. - To run unit tests, run
npm test
.- Leave your HTTPS certificate files unencrypted for testing. If necessary, re-run
openssl
with the-nodes
option as described above. - Calling
npm run mocha
will run just the backend tests - Calling
npm run karma
will run just the frontend tests
- Leave your HTTPS certificate files unencrypted for testing. If necessary, re-run
- To monitor code while developing, run
npx gulp
. You can pass an optional--file=[test/filename]
parameter to only test a specific file. - To run end-to-end tests:
- first start Aggie on the test database with
npm run testrun
- then run protractor with
npm run protractor
- first start Aggie on the test database with
- To run end-to-end tests with external APIs
- Set up the appropriate keys in
secrets.json
(e.g. Twitter) - start Aggie on the test database with
npm run testrun
- run protractor with
npm run protractor-with-apis
- Set up the appropriate keys in
- To verify if the CRON job for updating Account Ids <-> Crowdtangle List Names and Saved Searches works
- Empty the contents of config/crowdtangle_list.json.
- Let the CRON job run at midnight UTC and check if the config/crowdtangle_list.json is updated with Account Ids <-> Crowdtangle List Names and Saved Searches.
Project Configuration
You can adjust the settings in the config/secrets.json
file to configure the application.
Tests
Set config.adminParty=true
if you want to run tests.
Social Media and Feeds
- Follow these instructions to generate tokens to use the Twitter API.
- Go to Settings > Configuration and edit the Twitter settings. Remember to toggle the switch on, once you have saved the settings.
CrowdTangle
- Create a dashboard on CrowdTangle and generate the dashboard token.
- Add your CT API token to
config/secrets.json
. - Run
npm run update-ct-lists
to fetch data.- This will update
config/crowdtangle_list.json
. - This also happens automatically every night at midnight while Aggie is running.
- This will update
Note: To have git ignore changes, run git update-index --skip-worktree config/crowdtangle_list.json
The WhatsApp feature is documented in a conference paper. As WhatsApp does not currently offer an API, a Firefox extension in Linux is used to redirect notifications from web.whatsapp.com to Aggie server. Thus, you need a Linux computer accessing WhatsApp through Firefox for this to work. Follow these steps to have it working.
- Install Firefox in Linux using your distribution preferred method.
- Install GNotifier add-on in Firefox.
- Configure the add-on about:addons:
- Set Notification Engine to Custom command
- Set the custom command to
curl --data-urlencode "keyword=<your own keyword>" --data-urlencode "from=%title" --data-urlencode "text=%text" http://<IP address|domain name>:2222/whatsapp
- We suggest setting your
keyword
to a unique string of text with out spaces or symbols, e.g., the phone number of the WhatsApp account used for Aggie. This keyword must be the same one as the one specified in the Aggie application, when creating the WhatsApp Aggie source. - Replace
IP address|domain
with the address or domain where Aggie is installed (e.g.,localhost
for testing).
- We suggest setting your
- Visit web.whatsapp.com, follow instructions, and enable browser notifications
- Notifications will not be sent to Aggie when browser focus is on the WhatsApp tab, so move away from that tab if not replying to anyone.
ELMO
- Log in to your ELMO instance with an account having coordinator or higher privileges on the mission you want to track.
- In your ELMO instance, mark one or more forms as public (via the Edit Form page). Note the Form ID in the URL bar (e.g. if URL ends in
/m/mymission/forms/123
, the ID is123
). - Visit your profile page (click the icon bearing your username in the top-right corner) and copy your API key (click 'Regenerate' if necessary).
- Go to Settings > Configuration and edit the ELMO settings. Remember to toggle the switch on, once you have saved the settings.
Google Places
Aggie uses Google Places for guessing locations in the application. To make it work:
- You will need to get an API key from Google API console for Google Places API.
- Read about Google API usage limits and consider whitelisting your Aggie deployment to avoid surprises.
- Go to Settings > Configuration and edit the Google Places settings and add the key.
Emails
Email service is required to create new users.
fromEmail
is the email address from which system emails come. Also used for the default admin user.email.from
is the address from which application emails will comeemail.transport
is the set of parameters that will be passed to NodeMailer. Valid transport method values are: 'SES', 'sendgrid' and 'SMTP'.- If you are using SES for sending emails, make sure
config.fromEmail
has been authorized in your Amazon SES configuration.
Fetching
- Set
fetching
value to enable/disable fetching for all sources at global level.
- This is also changed during runtime based on user choice.
Logging
Set various logging options in logger
section.
console
section is for console logging. For various options, see [winston](see https://github.com/winstonjs/winston#transports)file
section is for file logging. For various options, see [winston](see https://github.com/winstonjs/winston#transports)SES
section is for email notifications.- Set appropriate AWS key and secret values.
- Set
to
andfrom
email ids. Make surefrom
has been authorised in your Amazon SES configuration.
Slack
section is for Slack messages.- Set the webhook URL to send logs to a specific Slack channel
- DO NOT set
level
to debug. Recommended value is error.
Only the console
and file
transports are enabled by default. Transports can be disabled using the "disabled"
field included in each section in the config/secrets.json
file.
Remote access
See the first part of the Tableau docs in BI Connector setup.
Data visualization using Tableau
Setting up and viewing Tableau visualizations in Aggie requires installing Tableau's MongoDB BI Connector on the server that acts as a bridge between Tableau and MongoDB. To set up the BI Connector, follow these steps: BI Connector setup.
Architecture
Aggie consists of two largely separate frontend and backend apps. Some model code (in /shared
) is shared between them.
Backend
The backend is a Node.js/Express app responsible for fetching and analyzing data and servicing API requests. There are three main modules, each of which runs in its own process:
- API module
- Fetching module
- Analytics module
See README files in the lib
subdirectories for more info on each module.
The model layer (in /models
) is shared among all three modules.
Frontend
The frontend is a single-page Angular.js app that runs in the browser and interfaces with the API, via both pull (REST) and push (WebSockets) modalities. It is contained in /public/angular
.
Building and Publishing Aggie's documentation
The documentation is in the docs
directory. These are automatically built and
pushed on each commit for the master
and develop
branches in Github:
To build the docs locally, do the following: