/tweedr

A machine learning API to analyze tweets during disasters.

Primary LanguageJavaScriptMIT LicenseMIT

Tweedr: measuring disaster damage with tweets

Tweedr makes information from social media more accessible to providers of disaster relief. There are two aspects to the application:

  1. An API / pipeline for applying machine learning techniques and natural language processing tools to analyze social media produced in response to a disaster.
  2. A user interface for manipulating, filtering, and aggregating this enhanced social media data.

Tweedr is a Data Science for Social Good project, through a partnership with the Qatar Computational Research Institute.

Problem, solution, data

web app screenshot

Project layout

  • doc/ contains various presentations, along with accompanying slides and poster.
    • doc/report/ contains a more technical and extensive write-up of this project. In progress.
  • ext/ is created by a complete install; external data sources and libraries are downloaded to this folder.
  • static/ contains static (non-Javascript) files used by the web app.
  • templates/ contain templates (both server-side and client-side) used by the web app.
  • tests/ contain unittest-like tests. Use python setup.py test to run these.
  • tools/ holds tools to aid development (currently, only a test-running git-hook).
  • tweedr/ contains the main Python app and functions as a Python package (e.g., import tweedr).

Installation guide

git clone https://github.com/dssg/tweedr.git
cd tweedr
python setup.py develop download_ext

If you want to jump straight to development, see the Contributing wiki page.

Dependencies

Tweedr uses a number of external libraries and resources. This is the dependency tree:

crfsuite and liblbfgs are the only components that can't be installed directly with Python via setuptools. Though if you have trouble installing some of the packages above, you might have better luck looking for those packages in your operating system's pacakge manager or as binaries on the projects' websites.

Installation steps

1. Installing libLBFGS

The source code can be downloaded from the maintainer's webpage, though this Github fork (and below) attempts to simplify the install process.

git clone https://github.com/chbrown/liblbfgs.git
cd liblbfgs
./configure
make
sudo make install

2. Installing CRFsuite

Like libLBFGS, a tarball can be downloaded from the original website, though the accompanying fork on Github attempts to document the installation process and make compilation more automatic on both Linux and Mac OS X.

git clone https://github.com/chbrown/crfsuite.git
cd crfsuite
./configure
make
sudo make install

That installs the library, but not the Python wrapper, which takes a few more steps:

cd swig/python
python setup.py build_ext
sudo python setup.py install_lib

To test whether it installed correctly, you can run the following at your terminal, which should print out the current CRFsuite version:

python -c 'import crfsuite; print crfsuite.version()'
> 0.12.2

The github repository documents a few more options that might come in handy if the process above does not work for your operating system.

3. Configuring environment variables

Tweedr also connects to a number of remote resources when running live; see [[Environment]] for instructions on setting those up.

4. Installing Tweedr

After installing crfsuite and liblbfgs, everything else should be installable via setuptools / distutils:

git clone https://github.com/dssg/tweedr.git
cd tweedr
python setup.py install

And then to download external data requirements:

python setup.py download_ext

The download_ext command will download external data, which currently includes the following packages / sources:

You may get an error, "IOError: cmu.arktweetnlp.RunTagger error", if you try to use some parts of Tweedr before installing this component.

5. Instantiating the database

While we are not currently able to release our data, you can easily recreate the structure of our database by running the following command:

tweedr-database create

This simply uses SQLAlchemy to un-reflect the database, by running metadata.create_all().

Running Tweedr

At this point, you should have tools like tweedr-ui and tweedr-pipeline on your PATH, and you can run each of those with the --help flag to view the usage messages.

See the API section of the wiki for a description of some of the fields that tweedr-pipeline adds.

Troubleshooting

If your installation is still missing packages, see the manually installing page of the wiki.

Team

Team

Contributing to the project

Want to get in touch? Found a bug? Open up a new issue or email us at dssg-qcri@googlegroups.com.

License

Copyright © 2013 The University of Chicago. MIT Licensed.