Allowing public ETL steps to depend on private steps
Opened this issue · 2 comments
We have a single case where public dataset (data://garden/covid/latest/combined
and hence our full covid dataset) depends on private dataset data-private://garden/covid/latest/sequence
.
data://garden/covid/latest/combined:
- data://garden/covid/latest/testing
- data://garden/covid/latest/cases_deaths
- data-private://garden/covid/latest/sequence
- data://garden/demography/2024-07-15/population
An error is raised when you try to run ETL without using --private
flag. So running full ETL etl run
fails with
ValueError: Public step data://garden/covid/latest/combined depends on private step data-private://garden/covid/latest/sequence. Use --private flag.
This is a bit annoying as we have to exclude covid dataset from running in nightly builds. It'd also be confusing for anyone trying to build it.
Should we exclude steps depending on private steps by default and raise a warning instead of failing?
@lucasrodes why isdata-private://garden/covid/latest/sequence
private? Maybe the solution would be to make it public (given that it's used by a public step).
why isdata-private://garden/covid/latest/sequence private?
It must be private, as requested by the data provider since they have a very restrictive license. That's GISAID.
Maybe the solution would be to make it public (given that it's used by a public step).
That's not possible; we cannot share this data publicly. The data://garden/covid/latest/combined
processes and aggregates a private indicator to compute a ratio ー that's fine as public.