/television-news-analyser

Scrap France 2 and TF1 Tv news to analyse humanity's biggest challenge : fossil energies and climate change.

Primary LanguageHTMLGNU General Public License v2.0GPL-2.0

TV news analyser 📺 🔬 🛢️

List news urls containing global warming everyday at 5am

Scrap France 2 and TF1 Tv news to analyse humanity's biggest challenge : fossil energies and climate change.

metabaseexample

Data source: HTMLs pages :

Data sink: JSON data to be store inside MySQL and displayed on a metabase dashboard :

Can I have a look at the results ?

Everyday, last replays with URLs from France 2 and TF1 are analysed with Github Actions if they contain "global warming" :

Some results can be found on this repo's website : https://polomarcus.github.io/television-news-analyser/website/

You can also check Github Actions worflows raw data :

  1. Click here : https://github.com/polomarcus/television-news-analyser/actions/workflows/save-data.yml
  2. Click on the last workflow ran, then on "click-here-to-see-data"
  3. Click on "List France 2 news urls containing global warming (see end)" to see France 2's urls
  4. Click on "List TF1 news urls containing global warming (see end)" to see TF1's urls :

Urls are listed on the github action workflow

Requirements

Run

Spin up 1 Postgres, Metabase, nginxand load data to PG

Docker Compose

# with docker compose - no need of sbt
docker-compose -f src/test/docker/docker-compose.yml up -d

SBT

# OR with scala built tool : sbt
./init-stack-with-data.sh

Checkout the project website locally

Go to http://localhost:8080/index.html The source are inside the website folder

Init Metabase

You can check metabase here

  • http://localhost:3000/
  • configure an account
  • configure PostgreSQL data source: (user/password - host : postgres - database name : metabase)
  • You're good to go : "Ask a simple question", then select your data source and the "News" table

To scrap data from 3 pages from France 2 website

sbt "runMain com.github.polomarcus.main.TelevisionNewsAnalyser 3"

To store the JSON data to PG and explore it with Metabase

sbt "runMain com.github.polomarcus.main.SaveTVNewsToPostgres"

To update data for the website alone

sbt "runMain com.github.polomarcus.main.UpdateNews"

Jupyter Notebook

Some examples are inside example.ipynb, but I prefered to use Metabase dashboard and visualisation using SQL

Test

# ./init-stack-with-data.sh
sbt test # it will parsed some localhost pages from test/resources/

Test only one method

sbt> testOnly ParserTest -- -z parseFrance2Home

Libraries documentation