owid/etl

๐Ÿ“œ docs: improve section on `private` steps

Opened this issue ยท 1 comments

One-liner

The guideline on private steps is not be up-to-date. We should probably review and update it to match our current data flow.

Context & details

In #2947, we worked on improving sections in our docs talking about private steps. However, there are still some doubts on certain fragments, in particular those relating to metadata field isPrivate and nonRedistributable and whether datasets are published to GitHub. These were raised by Pablo in his revision (here, or here).

The current text in our docs were based on the following issue-closing comment in #2631 (comment):

Turns out this was a false alarm due to a misunderstanding of the meaning of something being "private".

In the ETL, private means that the general public cannot access those files, except when they are published as indicators in the grapher:// step. At that stage, anything private should be marked as nonRedistributable in the metadata.

In Grapher, datasets marked as !isPrivate && !nonRedistributable are automatically re-published to Github. If something is !nonRedistributable, it means CSV download is available with Grapher.

This means !isPrivate should probably be renamed publishToGithub, and it should be false any time nonDistributable is true.

Originally posted by @larsyencken in #2631 (comment)

TODO

  • Public relies on a private snapshot?
  • Snapshot has the flag, but doesn't have the prefix in the DAG.
  • non_redistributable vs. private (create a 2x2 matrix)
    • true true: all private, makes sense
    • true false: private in Grapher, public in ETL. doesnt make sense?
    • false false: all public, makes sense
    • false true: we only allow for a slice of the data to be downloadable, makes sense
  • making data private in staging. does it make sense?
  • isPrivate flag in Grapher dataset table.
  • Use Enum instead of boolean flags ('fully public', 'partly private', etc.)
  • Tooling public โ†’ private

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.