Handling Decimal / Numeric Data - Singer.Decimal Standards
s7clarke10 opened this issue · 1 comments
There has been some discussion around how to handle Floats, Decimal and Numeric data.
We were finding that decimal data was ending up as floats in target-snowflake from various database taps. The taps were opinionated using the MultipleOf etc, but the results were in the wrong data type and we were finding there was truncation / rounding. Part of the issue was to do with the Python Language not supporting large numeric data leading to truncation.
We have learned about the singer.decimal format which doesn't take an opinion on the output format, leaving the target to determine how to treat the string representation of the numeric data. The problem with this is you loose a lot of automation, like the automated creation of tables or addition of columns with the correct datatype.
The problem is there needs to be further information provided to allow the correct creation of datatypes by the target as part of the schema message i.e. information about the precision and scale if it is available at source.
We have elected to add some AdditionalProperties - providing the Scale and Precision to help with decimal data. See the slack message below.
While we have taken an opinionate view on how to implement this, it would be great if there was some standards around how to navigate the challenges of emitting numeric data without data loss. We would also be interested in views around our approach to dealing with this issue.
Just to add to @s7clarke10's post, as a colleague working on the same platforms. The way we have our stack configured, singer.decimal
treats values at each stage in the following ways:
- Source - Large numeric data value
- Tap
- SCHEMA message:
format=singer.decimal
,type=['string']
- RECORD message: value is a
str
- SCHEMA message:
- Target - Sets up a numeric column, overriding the
type
value in the schema because the format issinger.decimal
. If present theadditionalProperties
value is used to give the target column scale and precision - Destination - receives str value as part of an
INSERT INTO ... VALUES ...
and parses it into the data type set by thetarget
for that particular column