rodekruis/social-media-listening

[Task] Migrate to new pipeline

Closed this issue · 4 comments

  • Clone Argilla VM.
  • Test locally that new/refactored pipeline works and correctly pushes to cloned Argilla VM.
  • Create scripts to run new pipeline for each country, and push to cloned Argilla VM.
  • Create new logic apps to run new pipeline for each country (or is there a smarter way?).
  • Set scraping day to TUESDAY, to avoid getting banned (cannot run two scrapes at the same time).
  • Verify in next scrape (4-5 March) that pipeline runs and data is in the cloned Argilla VM.
  • Delete old logic apps and make the new pipeline point to production Argilla VM.

By db687d0, these need to be done to continue OP's todo point 3 :

  • Define Context() to fit country settings
  • Update push_to_argilla() to the smm's latest commit @Wessel93
  • if we decide to continue with docker for CICD, find way to optimise docker+poetry to reduce image size (it is currently 13GB). See best practices, multistages build
  • Update argilla version in poetry

Latest commit is successfully tested.

Docker image for the new pipeline is pushed to Azure CR. Image name: social-media-listening. A github workflow is set to push new updates to ACR whenever push is made.

New logic apps per country are set. Their start time is as following:

  • Bulgaria: 1AM
  • Poland: 2 AM
  • Slovakia: 3 AM
  • Ukraine: 4 AM

Notes:

  • The image is still heavy, like 11.3 GB. I think it is not a top priority now. But we can consider to minimise it when needed. See my comment above.
  • We still need to work on the Context()?
  • We still need to work a bit on the code for reading Azure secrets. Atm, the code looks for secrets variables name (e.g. API_HASH, API_ID), not Azure secret name (telegram-secrets).

@p-phung I'll add a new issues for the open points, let's finish the test with whatever we have