/ml

Sample data collection/display Django site

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

ml

Simple ML platform - current iteration just collects data from public APIs and displays/stores, for local use.

Project skeleton originally generated with (including this README) https://github.com/pydanny/cookiecutter-django

LICENSE: BSD

Settings

ml relies extensively on environment settings which will not work with Apache/mod_wsgi setups. It has been deployed successfully with both Gunicorn/Nginx and even uWSGI/Nginx.

For configuration purposes, the following table maps the 'ml' environment variables to their Django setting:

Environment Variable Django Setting Development Default Production Default
DJANGO_CACHES CACHES (default) locmem redis
DJANGO_DATABASES DATABASES (default) See code See code
DJANGO_DEBUG DEBUG True False
DJANGO_SECRET_KEY SECRET_KEY CHANGEME!!! raises error
DJANGO_SECURE_BROWSER_XSS_FILTER SECURE_BROWSER_XSS_FILTER n/a True
DJANGO_SECURE_SSL_REDIRECT SECURE_SSL_REDIRECT n/a True
DJANGO_SECURE_CONTENT_TYPE_NOSNIFF SECURE_CONTENT_TYPE_NOSNIFF n/a True
DJANGO_SECURE_FRAME_DENY SECURE_FRAME_DENY n/a True
DJANGO_SECURE_HSTS_INCLUDE_SUBDOMAINS HSTS_INCLUDE_SUBDOMAINS n/a True
DJANGO_SESSION_COOKIE_HTTPONLY SESSION_COOKIE_HTTPONLY n/a True
DJANGO_SESSION_COOKIE_SECURE SESSION_COOKIE_SECURE n/a False
DJANGO_DEFAULT_FROM_EMAIL DEFAULT_FROM_EMAIL n/a "ml <noreply@example.com>"
DJANGO_SERVER_EMAIL SERVER_EMAIL n/a "ml <noreply@example.com>"
DJANGO_EMAIL_SUBJECT_PREFIX EMAIL_SUBJECT_PREFIX n/a "[ml] "

The following table lists settings and their defaults for third-party applications:

Environment Variable Django Setting Development Default Production Default
DJANGO_AWS_ACCESS_KEY_ID AWS_ACCESS_KEY_ID n/a raises error
DJANGO_AWS_SECRET_ACCESS_KEY AWS_SECRET_ACCESS_KEY n/a raises error
DJANGO_AWS_STORAGE_BUCKET_NAME AWS_STORAGE_BUCKET_NAME n/a raises error
DJANGO_SENTRY_DSN SENTRY_DSN n/a raises error
DJANGO_SENTRY_CLIENT SENTRY_CLIENT n/a raven.contrib.django.raven_compat.DjangoClient
DJANGO_SENTRY_LOG_LEVEL SENTRY_LOG_LEVEL n/a logging.INFO
DJANGO_MAILGUN_API_KEY MAILGUN_ACCESS_KEY n/a raises error
DJANGO_MAILGUN_SERVER_NAME MAILGUN_SERVER_NAME n/a raises error

Getting up and running

Basics

The steps below will get you up and running with a local development environment. We assume you have the following installed:

  • pip
  • virtualenv
  • PostgreSQL

First make sure to create and activate a virtualenv, then open a terminal at the project root and install the requirements for local development:

$ pip install -r requirements/local.txt

Create a local PostgreSQL database:

$ createdb ml

Run migrate on your new database:

$ python manage.py migrate

You can now run the runserver_plus command:

$ python manage.py runserver_plus

Open up your browser to http://127.0.0.1:8000/ to see the site running locally.

Setting Up Your Users

To create a normal user account, just go to Sign Up and fill out the form. Once you submit it, you'll see a "Verify Your E-mail Address" page. Go to your console to see a simulated email verification message. Copy the link into your browser. Now the user's email should be verified and ready to go.

To create an superuser account, use this command:

$ python manage.py createsuperuser

For convenience, you can keep your normal user logged in on Chrome and your superuser logged in on Firefox (or similar), so that you can see how the site behaves for both kinds of users.

Test coverage

To run the tests, check your test coverage, and generate an HTML coverage report:

$ coverage run manage.py test
$ coverage html
$ open htmlcov/index.html

Live reloading and Sass CSS compilation

If you'd like to take advantage of live reloading and Sass / Compass CSS compilation you can do so with a little bit of prep work.

Make sure that nodejs is installed. Then in the project root run:

$ npm install

If you don't already have it, install compass (doesn't hurt if you run this command twice):

gem install compass

Now you just need:

$ grunt serve

The base app will now run as it would with the usual manage.py runserver but with live reloading and Sass compilation enabled.

To get live reloading to work you'll probably need to install an appropriate browser extension

Celery

This app comes with Celery.

To run a celery worker:

cd ml
celery -A ml.taskapp worker -l info

Please note: For Celerys import magic to work, it is important where the celery commands are run. If you are in the same folder with manage.py, you should be right.

Email Server

In development, it is often nice to be able to see emails that are being sent from your application. For this purpose, a Grunt task exists to start an instance of maildump which is a local SMTP server with an online interface.

Make sure you have nodejs installed, and then type the following:

$ grunt start-email-server

This will start an email server. The project is setup to deliver to the email server by default. To view messages that are sent by your application, open your browser to http://127.0.0.1:1080

To stop the email server:

$ grunt stop-email-server

The email server listens on 127.0.0.1:1025

Sentry

Sentry is an error logging aggregator service. You can sign up for a free account at http://getsentry.com or download and host it yourself. The system is setup with reasonable defaults, including 404 logging and integration with the WSGI application.

You must set the DSN url in production.

It's time to write the code!!!

Running end to end integration tests

N.B. The integration tests will not run on Windows.

To install the test runner:

$ pip install hitch

To run the tests, enter the ml/tests directory and run the following commands:

$ hitch init

Then run the stub test:

$ hitch test stub.test

This will download and compile python, postgres and redis and install all python requirements so the first time it runs it may take a while.

Subsequent test runs will be much quicker.

The testing framework runs Django, Celery (if enabled), Postgres, HitchSMTP (a mock SMTP server), Firefox/Selenium and Redis.

Deployment

It is possible to deploy to Heroku, to your own server by using Dokku, an open source Heroku clone or using docker-compose.

Heroku

Run these commands to deploy the project to Heroku:

heroku create --buildpack https://github.com/heroku/heroku-buildpack-python

heroku addons:create heroku-postgresql:hobby-dev
heroku pg:backups schedule --at '02:00 America/Los_Angeles' DATABASE_URL
heroku pg:promote DATABASE_URL

heroku addons:create heroku-redis:hobby-dev
heroku addons:create mailgun

heroku config:set DJANGO_SECRET_KEY=`openssl rand -base64 32`
heroku config:set DJANGO_SETTINGS_MODULE='config.settings.production'

heroku config:set DJANGO_AWS_ACCESS_KEY_ID=YOUR_AWS_ID_HERE
heroku config:set DJANGO_AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY_HERE
heroku config:set DJANGO_AWS_STORAGE_BUCKET_NAME=YOUR_AWS_S3_BUCKET_NAME_HERE

heroku config:set DJANGO_MAILGUN_SERVER_NAME=YOUR_MALGUN_SERVER
heroku config:set DJANGO_MAILGUN_API_KEY=YOUR_MAILGUN_API_KEY

heroku config:set PYTHONHASHSEED=random

git push heroku master
heroku run python manage.py migrate
heroku run python manage.py check --deploy
heroku run python manage.py createsuperuser
heroku open

Dokku

You need to make sure you have a server running Dokku with at least 1GB of RAM. Backing services are added just like in Heroku however you must ensure you have the relevant Dokku plugins installed.

cd /var/lib/dokku/plugins
git clone https://github.com/rlaneve/dokku-link.git link
git clone https://github.com/luxifer/dokku-redis-plugin redis
git clone https://github.com/jezdez/dokku-postgres-plugin postgres
dokku plugins-install

You can specify the buildpack you wish to use by creating a file name .env containing the following.

export BUILDPACK_URL=<repository>

You can then deploy by running the following commands.

git remote add dokku dokku@yourservername.com:ml
git push dokku master
ssh -t dokku@yourservername.com dokku redis:create ml-redis
ssh -t dokku@yourservername.com dokku redis:link ml-redis ml
ssh -t dokku@yourservername.com dokku postgres:create ml-postgres
ssh -t dokku@yourservername.com dokku postgres:link ml-postgres ml
ssh -t dokku@yourservername.com dokku config:set ml DJANGO_SECRET_KEY=RANDOM_SECRET_KEY_HERE
ssh -t dokku@yourservername.com dokku config:set ml DJANGO_SETTINGS_MODULE='config.settings.production'
ssh -t dokku@yourservername.com dokku config:set ml DJANGO_AWS_ACCESS_KEY_ID=YOUR_AWS_ID_HERE
ssh -t dokku@yourservername.com dokku config:set ml DJANGO_AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET_ACCESS_KEY_HERE
ssh -t dokku@yourservername.com dokku config:set ml DJANGO_AWS_STORAGE_BUCKET_NAME=YOUR_AWS_S3_BUCKET_NAME_HERE
ssh -t dokku@yourservername.com dokku config:set ml DJANGO_MAILGUN_API_KEY=YOUR_MAILGUN_API_KEY
ssh -t dokku@yourservername.com dokku config:set ml DJANGO_MAILGUN_SERVER_NAME=YOUR_MAILGUN_SERVER
ssh -t dokku@yourservername.com dokku run ml python manage.py migrate
ssh -t dokku@yourservername.com dokku run ml python manage.py createsuperuser

When deploying via Dokku make sure you backup your database in some fashion as it is NOT done automatically.

Docker

Warning

Docker is evolving extremely fast, but it has still some rough edges here and there. Compose is currently (as of version 1.4) not considered production ready. That means you won't be able to scale to multiple servers and you won't be able to run zero downtime deployments out of the box. Consider all this as experimental until you understand all the implications to run docker (with compose) on production.

Run your app with docker-compose

Prerequisites:

  • docker (tested with 1.8)
  • docker-compose (tested with 0.4)

Before you start, check out the docker-compose.yml file in the root of this project. This is where each component of this application gets its configuration from. It consists of a postgres service that runs the database, redis for caching, nginx as reverse proxy and last but not least the django application run by gunicorn. Since this application also runs Celery, there are two more services with a service called celeryworker that runs the celery worker process and celerybeat that runs the celery beat process.

All of these services except redis rely on environment variables set by you. There is an env.example file in the root directory of this project as a starting point. Add your own variables to the file and rename it to .env. This file won't be tracked by git by default so you'll have to make sure to use some other mechanism to copy your secret if you are relying solely on git.

By default, the application is configured to listen on all interfaces on port 80. If you want to change that, open the docker-compose.yml file and replace 0.0.0.0 with your own ip. If you are using nginx-proxy to run multiple application stacks on one host, remove the port setting entirely and add VIRTUAL_HOST=example.com to your env file. This pass all incoming requests on nginx-proxy to the nginx service your application is using.

Postgres is saving its database files to /data/ml/postgres by default. Change that if you wan't something else and make sure to make backups since this is not done automatically.

To get started, pull your code from source control (don't forget the .env file) and change to your projects root directory.

You'll need to build the stack first. To do that, run:

docker-compose build

Once this is ready, you can run it with:

docker-compose up

To run a migration, open up a second terminal and run:

docker-compose run django python manage.py migrate

To create a superuser, run:

docker-compose run django python manage.py createsuperuser

If you need a shell, run:

docker-compose run django python manage.py shell_plus

Once you are ready with your initial setup, you wan't to make sure that your application is run by a process manager to survive reboots and auto restarts in case of an error. You can use the process manager you are most familiar with. All it needs to do is to run docker-compose up in your projects root directory.

If you are using supervisor, you can use this file as a starting point:

[program:ml]
command=docker-compose up
directory=/path/to/ml
redirect_stderr=true
autostart=true
autorestart=true
priority=10

Place it in /etc/supervisor/conf.d/ml.conf and run:

supervisorctl reread
supervisorctl start ml

To get the status, run:

supervisorctl status

If you have errors, you can always check your stack with docker-compose. Switch to your projects root directory and run:

docker-compose ps

to get an output of all running containers.

To check your logs, run:

docker-compose logs

If you want to scale your application, run:

docker-compose scale django=4
docker-compose scale celeryworker=2

Don't run the scale command on postgres or celerybeat