openzim/zimfarm

Review all input validations

Opened this issue · 6 comments

In order to prevent incorrect data to be entered in the Zimfarm, we want to review all recipes inputs and update their constraints.

All constraints are in https://github.com/openzim/zimfarm/blob/main/dispatcher/backend/src/common/schemas/fields.py and https://github.com/openzim/zimfarm/tree/main/dispatcher/backend/src/common/schemas/offliners

Beside Offliner Flags, recipe inputs are:

  • Recipe Name
  • Language
  • Tags
  • Category
  • Warehouse Path
  • Status
  • Periodicity
  • Offliner
  • Platform
  • Image Name
  • Image Tag
  • Monitoring
  • CPU
  • Memory
  • Disk
  • RAM fs

Please list below the list of changes you'd like to constraints on those fields or Offliner flags. If a change is to be applied to all offliners (albeit using their own scaper-specific names), please say so.

My proposal:

  • Recipe Name should enforce the Naming Convention
  • All fields setting Name metadata should enforce the Naming Convention
  • All fields setting Title metadata should be restricted to 30 chars max (as per our recommendations)
  • All fields setting the Description metadata should be restricted to 80 chars max
  • All fields setting the LongDescription metadata should be restricted to 400chars max
  • All fields setting the Language metadata should be checked for ISO-639-3 validity (as a comma-separated list).
  • All fields setting ZIM filename shoud enforce {Name}_{period}.zim

devdocs.ios failing scenario because of this weakness https://farm.openzim.org/pipeline/cb6a074a-e33a-4552-9c45-932486a1dde9/debug

I believe it's a bit different: allowing arbitrary text in choice field or (more likely) a previously value choice that is not anymore and we did not update the recipe

I think that #910 proves that we also need to regularly check that flags are still valid.

We should check:

  • does the flag still exists for given offliner?
  • is the flag value still valid?

I've opened a distinct issue #911 since this is a bit different than just ensuring that constraints of all offliners are appropriate.

kiwix/operations#192 just shown how important it is to avoid unwanted chars in ZIM name / filename.

Removed from zimit2 project unfortunately