VERBATIM_TO_IDENTIFIER runs small datasets on Spark

Question

VERBATIM_TO_IDENTIFIER runs small datasets on Spark

Closed this issue 10 months ago · 2 comments

The VERBATIM_TO_IDENTIFIER stage is running everything on Spark (Yarn), even for tiny datasets such as this one.

We should either fix the config to be something reasonable (e.g. 1M records or >1GB uncompressed size or so) or rework this stage so that it doesn't require distributed computing.

Answer 1 · 2023-09-19T10:15:22.000Z

There is only one implementation of that workflow - yarn/beam

Answer 2 · 2023-09-25T12:41:25.000Z

Deployed to PROD