Future-demand
Description
- Use Python to write a script to crawl events (date, time, location, title, artists, works and image link) from in the following link: https://www.lucernefestival.ch/en/program/summer-festival-22
- Create an endpoint that lists all the events happening (title and date)
- Insert data into database (PostgreSQL) - propose a schema that makes sense for the use case
- dockerize your solution so we can run it with docker compose
Quickstart
docker-compose up
visit: http://0.0.0.0:3000/docs#/
Be sure scrapper is done
Preparation
- Try to keep it simple (KISS principle)
- Facilitate possibility for fast adoptation of scraper when html changes
- Be fancy and try out psycopg3 and sqlmodel
Steps
- Explore website to gain insight for datamodel
- Create datamodel
- write scraper
- Download hmtl for local testing
- Write scraper functions
- Rewrite to class due to shared state
- write backend
- fastapi
- alembic
- make datamodel shared
- write full docker-compose
Schema
EVENT
Column | Datatype | Info |
---|---|---|
id (pk) | int | |
date | date | |
time | time | (HH:MM:SS) |
location | string | |
title | string | |
artists | array(string) | |
work | array(string) | artist - song (with separator ("-")) |
Could make two different tables for artists and works, due to higher cardinality. But that seems a litle much for the usecase, so decided to do one table.
In case of html change
Will probably be detected by the scraper and raise an exception on line 47. User can troubleshoot diff with the test html-file and come up with a new ccs selectors. The new css selectors can be given as environmental variables to a already deployed scraper.
For next change replace the testing htmlfile and overwrite the environmental variable default of the scrapper.
Tests
python -m unittest tests/test_scraper.py
Encoutered problems
Wanted to use the latest and greatest psycopg3 in combination with sqlmodel but this is not possible untill sqlalchemy releases 2.0. so went back to psycopg2.