/edx-analytics-pipeline

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

edx-analytics-pipeline

The Hadoop-based data pipeline.

Requirements

Your machine will need the following in order to run the code in this repository:

All of the components above can be installed with your preferred package manager (e.g. apt, yum, brew.

The requirements in requirements/default.txt and requirements/test.txt can be installed with pip:

pip install -U -r requirements/default.txt

Known Issues on Mac OS X

If you are running the code on Mac OS X, you may encounter a couple issues when installing numpy. If pip complains about being unable to compile Fortran, ensure that you have GCC installed. The easiest way to install GCC is using Homebrew: brew install gcc. If after installing GCC you see an error along the lines of cannot link a simple C program, execute the following command to trigger the compiler to not throw an error when it encounters unused command arguments:

export ARCHFLAGS=-Wno-error=unused-command-line-argument-hard-error-in-future

Note: If you need to frequently re-install/upgrade requirements, you may find it convenient to add the export statements above to your .bashrc or .bash_profile file so that the statement is run whenever you open a new shell.

Wheel

Wheel can help cut down the time to install requirements. The Makefile is setup to use the environment variables WHEEL_URL and WHEEL_PYVER to find the Wheel server. You can set these variables using the commands below. If you want to set these variables every time you open a shell, add them to your .bashrc or .bash_profile files.

export WHEEL_PYVER=2.7
export WHEEL_URL=http://edx-wheelhouse.s3-website-us-east-1.amazonaws.com/<OPERATING SYSTEM>/<OS VARIANT>

Values for <OPERATING SYSTEM>/<OS VARIANT>:

  • Ubuntu/precise
  • MacOSX/lion

Running the Tests

Run make test to install the Python requirements and run the unit tests.

Some of the tests rely on AWS. If you encounter errors such as NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials, you need to set the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables. The values do not need to be valid credentials for the tests to pass, so the commands below should fix the failures.

export AWS_ACCESS_KEY_ID='AK123'
export AWS_SECRET_ACCESS_KEY='abc123'

How to Contribute

Contributions are very welcome, but for legal reasons, you must submit a signed individual contributor's agreement before we can accept your contribution. See our CONTRIBUTING file for more information -- it also contains guidelines for how to maintain high code quality, which will make your contribution more likely to be accepted.