VERBATIM_TO_IDENTIFIER runs small datasets on Spark
Closed this issue · 2 comments
timrobertson100 commented
The VERBATIM_TO_IDENTIFIER
stage is running everything on Spark (Yarn), even for tiny datasets such as this one.
We should either fix the config to be something reasonable (e.g. 1M records or >1GB uncompressed size or so) or rework this stage so that it doesn't require distributed computing.
muttcg commented
There is only one implementation of that workflow - yarn/beam
muttcg commented
Deployed to PROD