noodles

We eat our data in a messy fashion, thank you.

Requirements

Flask, http://flask.pocoo.org/
Elasticsearch, http://www.elasticsearch.org/
Elasticsearch.js, http://www.elasticsearch.org/guide/en/elasticsearch/client/javascript-api/current/
Facetview, https://github.com/okfn/facetview
d3.js, http://d3js.org/

Data

Canada

Source: http://www.tmxmoney.com/en/sector_profiles/energy.html
Type: XLS File

Manual Processing:

Grab file from source (Oil & Gas Companies) as xls file
Create to separate files fom both sheets in file with columns "Name", "HQ Location"
Rename: "HQ Location" -> "Country"
Save both files as canada1.csv, canada2.csv with , as delimiter (and , quoted in company names)

Australia

Source: http://www.asx.com.au/asx/research/ASXListedCompanies.csv
Type: CSV File

Processing with Google (Open) Refine:

Download file from source
Create a new project from within OpenRefine (http://openrefine.org)
Create a text facet (filter) on column3 (GICS industry group) for groups "Energy" and "Materials"
Export project/file as comma-separated CSV file
Open file, remove 2nd, 3rd column (just company name remaining), rename column header to "Name" and add second empty column "country"
Save as data/australia_YYYY-MM-DD.csv

Concession Data - OpenOil

Source: http://repository.openoil.net/wiki/Concession_Layer_Methodology#Sourcing
Type: CSV File(s)

Processing with Google Refine:

Download TOTAL file with all concessions (if available) or otherwise download country concession files and concatenate to one file with csvtoolkit -> csvstack command
Load into Google Refine
Delete all columns except "ConcessionContractor"
Split "ConcessionContractor" values and distribute to different rows ("Edit Cells" -> "Split multi-valued cells...")
Export as CSV file, rename "ConcessionContractor" column to "Name", add empty "County" column
Save as data/concession-companies_YYYY-MM-DD.csv

SEC

Source: -
Type: -

Installation

Getting Started:

$ cd /opt
$ git clone https://github.com/uf6/noodles

Meteor Frontend

Install & run noodles in meteor - https://www.meteor.com/install -

$ curl https://install.meteor.com/ | sh
$ cd noodles/frontend
$ meteor

You should be able to access the application from http://localhost:3000

Load data into Mongo's meteor

Default local settings are for an external Mongo database. To run the importer with the meteor MongoDB, modify local_settings to

MONGO_URL = 'localhost'
MONGO_PORT = 3002
MONGO_DATABASE = 'meteor'
MONGO_COLLECTION = 'documents'

In a python virtualenv

$ python noodles/manage.py load_mongo

Docker for Elasticsearch (optional):

$ docker pull dockerfile/elasticsearch
$ mkdir /opt/noodles-elastic
$ vi /opt/noodles-elastic

path:
  logs: /data/log
  data: /data/data

$ mkdir /opt/noodles-elastic/data
$ mkdir /opt/noodles-elastic/log 
$ docker run -d -p 9200:9200 -p 9300:9300 -v /opt/noodles-elastic/:/data dockerfile/elasticsearch /elasticsearch/bin/elasticsearch -Des.config=/opt/noodles-elastic/elasticsearch.yml
$ /usr/bin/docker run -p 8089:80 -v /opt/noodles:/src -e VIRTUAL_HOST=noodles.iilab.org --privileged -d -t --name noodles iilab/static

Setup:

$ python setup.py install
$ python noodles/manage.py ingest edgar
$ python noodles/manage.py index edgar

Run server

$ python noodles/manage.py runserver -p 7777