Ever wanted to extract data from Svenskalag to build custom reports and perform analysis which is not supported out of the box?
This repository contains code to extract data and transform it into analytics ready tables.
This project has nothing to do with the company behind Svenskalag.se.
This is the result of a weekend project. As you might expect from such projects, corners have been cut to quickly solve problems and reach the wanted results. The motivational driver behind the project was to analyze training and game attendance, as well as learning a bit about web scraping and DuckDB.
Built using Python and Scrapy. It crawls your team page and extracts information about activities, members and presence data. Data is persisted into a DuckDB database. This is quite brittle, and could stop working at any time (hopefully it's caught by the error handling in Python or the tests created in DBT).
DBT is used to transform the raw data into analytics ready tables. Analytical data models are stored as DuckDB databases (locally).
You need to have the following software installed. It has only been tested on MacOS:
- Python. Install using Pyenv.
- Pipenv
- DuckDB
- Git
- Make Instead of installing Make, you may also look at the individual commands in the Makefile, to see how the steps are performed.
Clone this repository, by running git clone https://github.com/calleo/svenskalag-analytics.git
from the terminal. Ensure the correct Python version is installed, then run pipenv lock
to install the needed packages.
Set proper values for the environment variables by copying .env.tamplate
and naming the copy .env
.
- SVENSKALAG_USER: Your Svenskalag username
- SVENSKALAG_PASSWORD: Your password to Svenskalag
- SVENSKALAG_START_DATE: The date from which you would like to fetch data, for example "2021-01-01"
- SVENSKA_LAG_DOMAIN: The domain where your Svenskalag site is hosted, for example "www.difinnebandy.se"
- SVENSKALAG_TEAM_SLUG: The part of the domain which points to your team, for example "difibs-herr" (full URL is https://www.difinnebandy.se/difibs-herr)
Start the scraper by running make scrape
from within the newly cloned directory.
Run make dbt_build
runs DBT and produces the tables used for the analysis.
Running make query
will start the DuckDB client against the database which DBT created. Run a query, for example SELECT * FROM member LIMIT 10
.
If you prefer to import the data into a spreadsheet application (say Microsoft Excel), run make csv_export
to get a CSV file (stored in ./data
folder) which you can then import into the application.