[Task] Migrate to new pipeline
Closed this issue · 4 comments
jmargutt commented
- Clone Argilla VM.
- Test locally that new/refactored pipeline works and correctly pushes to cloned Argilla VM.
- Create scripts to run new pipeline for each country, and push to cloned Argilla VM.
- Create new logic apps to run new pipeline for each country (or is there a smarter way?).
- Set scraping day to TUESDAY, to avoid getting banned (cannot run two scrapes at the same time).
- Verify in next scrape (4-5 March) that pipeline runs and data is in the cloned Argilla VM.
- Delete old logic apps and make the new pipeline point to production Argilla VM.
jmargutt commented
- New VM sml-argilla-vm-dev
- URL: http://52.166.251.117
- Credentials same as sml-argilla-vm
p-phung commented
By db687d0, these need to be done to continue OP's todo point 3 :
- Define
Context()
to fit country settings - Update
push_to_argilla()
to thesmm
's latest commit @Wessel93 - if we decide to continue with docker for CICD, find way to optimise docker+poetry to reduce image size (it is currently 13GB). See best practices, multistages build
- Update
argilla
version inpoetry
p-phung commented
Latest commit is successfully tested.
Docker image for the new pipeline is pushed to Azure CR. Image name: social-media-listening
. A github workflow is set to push new updates to ACR whenever push is made.
New logic apps per country are set. Their start time is as following:
- Bulgaria: 1AM
- Poland: 2 AM
- Slovakia: 3 AM
- Ukraine: 4 AM
Notes:
- The image is still heavy, like 11.3 GB. I think it is not a top priority now. But we can consider to minimise it when needed. See my comment above.
- We still need to work on the
Context()
? - We still need to work a bit on the code for reading Azure secrets. Atm, the code looks for secrets variables name (e.g.
API_HASH
,API_ID
), not Azure secret name (telegram-secrets
).