An exploratory analysis of technology communities across the UK using data from Meetup.com.
Data is crawled from the Meetup API. See below for a description of the data collection process.
The results of the analysis can be found in the report in the presentation
directory. A typeset version can be viewed here. Data analysis was carried out in the IPython Notebook analysis/meetup_analysis.ipynb
.
Data collection scripts are located in crawl
.
The configuration file config.json
should be structured as follows:
{
"meetup_api_key": "<32-character meetup API key>",
"mongo_port": <mongodb-port>,
"mongo_host": "<mongodb-hostname>"
}
The data collection pipeline is as follows:
extract_geonames_top_cities.py
: Extracts top cities (by population) from the Geonames gazetteer. The gazetteer is held indat/geonames_cities/
.cities1000.txt
is obtained from the Geonames export. The script outputsgeonames_top_cities.json
todat/
.- You may wish to edit
dat/geonames_top_cities.json
to remove some redundant cities in the Geonames data, depending on the chosen countries. crawl_groups.py
: Carries out a proximity crawl using the cities obtained from the gazetteer. Using the processed gazeteer (from previous step), retrieves meetup groups within proximity to each POI (i.e., cities). Outputs todat/groups_crawl/
. This is an initial, very broad, crawl of groups, that is to be filtered in subsequent steps.collect_city_groups.py
: Collects the groups from the proximity crawl, removing duplicates as necessary. Outputs todat/city_meetup_groups
.crawl_group_activity.py
: Using the sanitised and de-duplicated groups obtained from the previous steps, this script crawls a range of additional group attributes and stores the results (including the meetup groups) in a MongoDB datastore. The additional attributes include: group events, group membership, attendance at events, and any users encountered along the way. This crawl can take a while (around 5 hours for three years of UK tech groups). If the script is prematurely halted, it will re-start from where it left off.
The resulting MongoDB database, meetupdotcom
, consists of the following collections:
users
: Each document is a Meetup member. Crawled from themembers
endpoint.groups
: Each document is a Meetup group. Crawled from thegroups
endpoint, supplemented with list of the group's events crawled from theevents
endpoint.event_attendance
: Each document describes member attendance at a particular meetup event. The document specifies an event idevent_id
and list of member ids (attendee_ids
).
As noted on the Meetup developer page, the Python API is now quite out of date. A very simple alternative client, with rate limiting, is implemented in crawl_tools.py
(see class AltMeetup
).
- Meetup Python API Client. At:
https://github.com/meetup/python-api-client
.