Media Cloud Headline Gender Mentions Pipeline

A workflow that will fetch stories, add entity mentions, check gender on them. All info saved to a MongoDB.

Dev Installation

pip install -r requirements.txt - to install the dependencies

Copy the ``.env.templateto.env` and then edit it.

Run python 1-fetch-stories.py to fill the DB with stories from Media Cloud.

This uses our Cliff-Clavin server to identity any mentions of people in the headline.

Open up one terminal window and start the workers waiting: celery worker -A worker -l info. Watch the log to see if processing stories.

Open up another window and run python 2-add-headline-entities.py to fill that queue with tasks.

This uses the Genderize.io API to identity the gender of people.

Open up one terminal window and start the workers waiting: celery worker -A worker -l info. Watch the log to see if processing stories.

Open up another window and run python 3-add-entity-gender.py to fill that queue with tasks.