apache/arrow-rs

More consideration to `Decimal(precision = 0, ..)`

Closed this issue · 9 comments

Which part is this question about

The definition of the Decimal type.

Describe your question

The current definition of the decimal type is Decimal(u8, u8), which allows the precision to be zero.
In the Datafusion, we meet as error which the zero precision leads the system to panic: apache/datafusion#3665.

My questions are:

  1. Does precision = 0 make sense?
  2. If it is, what is the value of Decimal(0, ..)?
  3. If it is not, how should we avoid the zero precision? Here are 2 ways I think:
    a. Add runtime check to all decimal functions when creating a new decimal type.
    b. Using the NonZeroU8 to represent precision: https://doc.rust-lang.org/std/num/struct.NonZeroU8.html. Actually, NonZeroU8 also does runtime checking when creating a value. But the benefit is that the type system help us find where to add the checking. The disadvantage it that we will change the API.

Additional context

cc @tustvold @liukun4515 @viirya. PTAL. Thank you!

We already have validation that checks if the precision is too high, I think we should just extend that logic. I don't think encoding it in the type system is necessary

In the datafusion, we can check the value of precision in the SQL level, and make sure the input precision and scale is valid.

I don't see any system use the precision = 0

In the datafusion, we can check the value of precision in the SQL level, and make sure the input precision and scale is valid.

If all precision = 0 cases can be detected during the constant folding stage in Datafusion, then it is great. 👍

However, as this is a bug in arrow-rs, we should also add the zero checking here, because arrow-rs is open to all projects, not only the Datafusion.

My suggestion is to use NonZeroU8, which can avoid duplicated checking. (avoid checking twice in Datafusion and arrow-rs)

psvri commented

Hello,

Should we rather refactor the function validate_precision_scale into DecimalType . This way we can avoid code duplication in datafusion and expose it to others as well.

That sounds like a good idea to me 👍

psvri commented

Okay, I will do the refactor in another PR if others are okay with it as well.

Closed by #3162