Scrap France 2 and TF1 Tv news to analyse humanity's biggest challenge : fossil energies and climate change.
Data source: HTMLs pages :
- https://www.francetvinfo.fr/replay-jt/france-2/20-heures/jt-de-20h-du-jeudi-30-decembre-2021_4876025.html
- https://www.tf1info.fr/emission/le-20h-11001/extraits/
Data sink: JSON data to be store inside MySQL and displayed on a metabase dashboard :
Everyday, last replays with URLs from France 2 and TF1 are analysed with Github Actions if they contain "global warming" :
Some results can be found on this repo's website : https://polomarcus.github.io/television-news-analyser/website/
You can also check Github Actions worflows raw data :
- Click here : https://github.com/polomarcus/television-news-analyser/actions/workflows/save-data.yml
- Click on the last workflow ran, then on "click-here-to-see-data"
- Click on "List France 2 news urls containing global warming (see end)" to see France 2's urls
- Click on "List TF1 news urls containing global warming (see end)" to see TF1's urls :
- docker compose
- Optional: if you want to code Scala build tool (SBT)
# with docker compose - no need of sbt
docker-compose -f src/test/docker/docker-compose.yml up -d
# OR with scala built tool : sbt
./init-stack-with-data.sh
Go to http://localhost:8080/index.html
The source are inside the website
folder
You can check metabase here
- http://localhost:3000/
- configure an account
- configure PostgreSQL data source: (user/password - host : postgres - database name : metabase)
- You're good to go : "Ask a simple question", then select your data source and the "News" table
sbt "runMain com.github.polomarcus.main.TelevisionNewsAnalyser 3"
sbt "runMain com.github.polomarcus.main.SaveTVNewsToPostgres"
sbt "runMain com.github.polomarcus.main.UpdateNews"
Some examples are inside example.ipynb, but I prefered to use Metabase dashboard and visualisation using SQL
# ./init-stack-with-data.sh
sbt test # it will parsed some localhost pages from test/resources/
sbt> testOnly ParserTest -- -z parseFrance2Home