Donight is a framework for indexing events from the web, and making them easily accessible. It aims to be easily extensible, in order to allow an open source community to form around it.
To install simply run:
> git clone https://github.com/ehudhala/donight.git
> cd donight/src
> python setup.py develop
(Note: Donight is still under heavy development, so for now installation is only for developing purposes.)
Additional setup is required to scrape facebook users:
- Make sure Firefox is installed on the executing computer.
- Since we're using Selenium to interact with the browser, there may be
compatibility issues between versions.
Donight has been tested with Firefox 46.0.1 and
selenium 2.53.2. Use
pip install selenium==2.53.2
to install that version.
- Since we're using Selenium to interact with the browser, there may be
compatibility issues between versions.
Donight has been tested with Firefox 46.0.1 and
selenium 2.53.2. Use
- Configure the scraping in
src/donight/config/facebook_scraping_config.py
. You'll need to specify the scraped user email, password, the scraped pages URLs, etc. - it's all documented in that page. - Configure the scraped user.
- Manually login to the scraped user.
- Enter the Graph API Explorer page.
- Click 'Get Token' → 'Get User Access Token'. Make sure the
user_events
option is enabled and click 'Get Access Token'. If Facebook requires that you permit the app to access your account, do so. - Enter the user's language settings page and set facebook to
be shown in
English (US)
.
In order to develop the web server, the following should be done:
- Install node.js.
- In
src/donight/web/client
runnpm install
. - Run
gulp
in order to compile the sources for the web frontend. (You can rungulp watch
in order to automatically recompile) - Start the web server by running
python src/donight/web/app.py
.
In order to deploy to Heroku, the following should be done:
- Create a free Heroku account.
- Download the Heroku Toolbelt
- Login to heroku:
> heroku login
Enter your Heroku credentials.
Email: python@example.com
Password:
- Create the Heroku app (from the project root directory):
heroku create <app_name>
. - Set the conifguration for deployment and database access:
> heroku config:set DEBUG=false
> heroku config:set DB_ADDRESS=<Address of the database>
> heroku config:set DB_NAME=<Name of the database>
> heroku config:set DB_USERNAME=<Username of the database>
> heroku config:set DB_PASSWORD=<Password of the database>
- Optional: If the database is not postgresql:
heroku config:set DB_ENGINE=<Name of the sqlalchemy engine for your DB>
- Define the buildpacks for heroku (python for the app, and node for compiling the client):
> heroku buildpacks:set heroku/python
> heroku buildpacks:add heroku/nodejs
> git push heroku master
Donight is simple to use. To index all events once installed:
from donight.event_finder import EventFinder
EventFinder().index_events()
Then to create an excel spreadsheet from the indexed events for easy viewing:
from donight.applications.events_to_excel import EventsExcel
from donight.events import Session, Event
events = Session().query(Event).all()
EventsExcel().create_excel(events, "events.xlsx")
As we see in the example Donight is split into two parts:
- EventFinder: This module is responsible for finding events, and indexing them to the db.
- applications: This package holds anything we want to do with our db full of events.
These two parts interface with the DB. The DB is currently an sqlite3 db, wrapped with sqlalchemy. The EventFinder uploads events to the DB, and the applications can read data from the DB.
You can dig further into the documentation to find more ways of using Donight !
Donight's event finder gets a list of scrapers when it is initialized, which defaults to all the scrapers. It then uses every scrapers scrape() method in order to scrape events. After it collects all the events it either uploads them to the DB, for applications to use, or updates the information of events that already exist in the DB.
In order to scrape events from a new source, all you have to do is create a new scraper class in the scrapers package, for example:
donight/event_finder/scrapers/birthday.py
import datetime from donight.event_finder.scrapers.base_scraper import Scraper from donight.events import Event class BirthDayScraper(Scraper): BIRTHDAYS = { 'Ehud': datetime.datetime(1996, 7, 23), 'Or': datetime.datetime(1994, 2, 1) } def scrape(self): return [ Event(title=name + "'s birthday", start_time=date) for name, date in self.BIRTHDAYS.iteritems() ]
Then, if you want to enable it in any default event finder you should add it to ALL_SCRAPERS
in donight/event_finder/scrapers/__init__.py__
The scraper you create can implement the scrape method however you wish, but it has to return a list of Event
items.
For more detailed examples you should look at the scrapers already implemented.
After implementing a scraper, every call to EventFinder().index_events()
will also upload events scraped by your scraper to the DB :)
- We can give much more value if we add information that can be inferred in a generic way for any event, for example:
- Add a link to youtube videos of artists.
- Show percentage of free space for an event.
- Add a link to the bandcamp / facebook page of artists
- We can probably think of many more features in this area...
- Many many more scrapers, including:
- Rothschild 12
- The Container
- The Yellow Submarine
- Pasaz
- Leeba
- Bet Avihay
- Poetry Slam
- Theatre shows (The Bima, The Camery)
- We should think of more...
- Nicer views for the data.
Currently we only support exporting the events data to excel, but we should support:
- Sending an organized mail with the data (for a weekly newsletter).
- A website.
- An application.