Future-demand

Description

Use Python to write a script to crawl events (date, time, location, title, artists, works and image link) from in the following link: https://www.lucernefestival.ch/en/program/summer-festival-22
Create an endpoint that lists all the events happening (title and date)
Insert data into database (PostgreSQL) - propose a schema that makes sense for the use case
dockerize your solution so we can run it with docker compose

Quickstart

docker-compose up

visit: http://0.0.0.0:3000/docs#/

Be sure scrapper is done

Preparation

Try to keep it simple (KISS principle)
Facilitate possibility for fast adoptation of scraper when html changes
Be fancy and try out psycopg3 and sqlmodel

Steps

Explore website to gain insight for datamodel
Create datamodel
write scraper

Download hmtl for local testing
Write scraper functions
Rewrite to class due to shared state

write backend

fastapi
alembic

make datamodel shared
write full docker-compose

Schema

EVENT

Column	Datatype	Info
id (pk)	int
date	date
time	time	(HH:MM:SS)
location	string
title	string
artists	array(string)
work	array(string)	artist - song (with separator ("-"))

Could make two different tables for artists and works, due to higher cardinality. But that seems a litle much for the usecase, so decided to do one table.

In case of html change

Will probably be detected by the scraper and raise an exception on line 47. User can troubleshoot diff with the test html-file and come up with a new ccs selectors. The new css selectors can be given as environmental variables to a already deployed scraper.

For next change replace the testing htmlfile and overwrite the environmental variable default of the scrapper.

Tests

python -m unittest tests/test_scraper.py

Encoutered problems

Wanted to use the latest and greatest psycopg3 in combination with sqlmodel but this is not possible untill sqlalchemy releases 2.0. so went back to psycopg2.

DylanBartels/future-demand