datafaker-net/datafaker

`number_between` for integers returns doubles from time to time

valfirst opened this issue ยท 27 comments

Describe the bug
This is a regression issue. It's not reproduced on 2.4.11, but reproduced from time to time on 2.4.2. Most likely it is caused by this commit: f073184.

To Reproduce

import net.datafaker.Faker;

public class Main
{
    public static void main(String[] args)
    {
        Faker faker = new Faker();
        String result = faker.expression("#{number.number_between '1000','1000'}");
        System.out.printf(result);
    }
}

Expected behavior
The result should be '1000', but it is '1000.0' from time to time.

Versions:

  • OS: any
  • JDK: no correlation detected
  • Faker Version: 2.4.2

Additional context
The methods order is preserved when they are returned from class:

Method[] classMethods = clazz.getMethods();

But then they are stored in HashMap with unpredictable order:
classMethods.length == 0 ? emptyMap() : new HashMap<>(classMethods.length);

I see 2 possible solutions:

  • rely on method order in the class and use ordered collections to keep sequence - fragile
  • improve the algorithm to distinguish double-s vs long-s vs int-s, and search for the method in priority starting from best matching type: e.g. 10 -> int, 10000000000 -> long, 10.0 -> double

Doubles are numbers right?

Doubles are numbers right?

yes, and integers are numbers

Ok, so no issue?

I mean I guess we (or someone) could introduce integerBetween, longBetween, doubleBetween if that helps.

I mean I guess we (or someone) could introduce integerBetween, longBetween, doubleBetween if that helps.

May I ask why it's needed?
We already have such methods:

int a = faker.number().numberBetween(int, int);
double b = faker.number().numberBetween(double, double);
long c = faker.number().numberBetween(long, long);

I guess it isn't, I hadn't looked ๐Ÿ™„

I mean I guess we (or someone) could introduce integerBetween, longBetween, doubleBetween if that helps.

May I ask why it's needed? We already have such methods:

int a = faker.number().numberBetween(int, int);
double b = faker.number().numberBetween(double, double);
long c = faker.number().numberBetween(long, long);

When string representation of expression is evaluated, the incorrect method from this list is invoked from time to time, e.g.:

faker.expression("#{number.number_between '1','10'}");

may result in 2.9189383404296914.
This is a regression in 2.4.2.

@valfirst I don't understand why you call it incorrect. 2.9189383404296914 is a number between 1 and 10.

i guess the reason is changed behavior
logically it might be still correct since a number is between boundaries
however it might be a breaking change.

The root cause seems to be that
expressions like

faker.expression("#{number.number_between '1','10'}");

under the hood use reflections. So from this #{number.number_between '1','10'} it finds a class and a method based on name and a number of args.

The problem is that if before the mentioned change there were 2 methods matching this like

int a = faker.number().numberBetween(int, int);
long c = faker.number().numberBetween(long, long);

now there is a new one since it doesn't check for types.

IMHO it seems to be a breaking change.

Probably the fastest/easiest way to fix it is renaming of method for doubles with some javadoc why it has different name

Hi all, I did this on purpose since decimals are numbers too, and the contract for the original method was never integers only. If you want that, make a new method which returns whole numbers only.

I don't see this as a breaking change, though I can see that this might give some unexpected results, but the behavior of the method was never well defined.

I don't want to introduce a method which has a deviating name for doubles, but I see no reason why number_between (1,4) could not return 2.5, but I have no issue with "whole number between (1,4)" which doesn't return decimals.

improve the algorithm to distinguish double-s vs long-s vs int-s, and search for the method in priority starting from best matching type: e.g. 10 -> int, 10000000000 -> long, 10.0 -> double

@bodiam what about this proposal?

I don't see this as a breaking change,

here it is a definition of breaking change from wiktionary https://en.wiktionary.org/wiki/breaking_change

A change in one part of a software system that potentially causes other components to fail; occurs most often in shared libraries of code used by multiple applications.

and it matches this case.

Moreover we have this case covered in tests, however seems while review it was missed that the tests were changed as well. So the behavior of expression was well defined in tests in this line f073184#diff-d867ad35f7992df4ac0dfe8d5e0195ec106d0cf5dd951c36a56fc3022faddc8eR165.

regarding

improve the algorithm to distinguish double-s vs long-s vs int-s, and search for the method in priority starting from best matching type: e.g. 10 -> int, 10000000000 -> long, 10.0 -> double

I have a slightly different suggestion.
Need to go into code to be more precised however IIRC the args should be parsed first before passing to the method. It means that if we parse them then we know their type. If we know their type we could use this knowledge to filter the required method no

improve the algorithm to distinguish double-s vs long-s vs int-s, and search for the method in priority starting from best matching type: e.g. 10 -> int, 10000000000 -> long, 10.0 -> double

@bodiam what about this proposal?

Sorry, I don't understand what you mean.

I don't see this as a breaking change,

here it is a definition of breaking change from wiktionary https://en.wiktionary.org/wiki/breaking_change

A change in one part of a software system that potentially causes other components to fail; occurs most often in shared libraries of code used by multiple applications.

and it matches this case.

It's the most ambiguous definition I've seen. If we add a method to Datafaker, and my system is depending on the number of methods in Datafaker, so it "potentially" causes other components to fail, it would make adding a method a breaking change.

The method here was always supposed to return numbers. It's still returning numbers. So, no breaking change. Maybe the original method was returning only numbers between -100 and +100, and I changed it to -1000 to +1000. It might break a system here or there, but that's not really the fault of Datafaker: it's a bad idea to depend on the implementation of the method instead of the contract, and no contracts have changed.

Moreover we have this case covered in tests, however seems while review it was missed that the tests were changed as well. So the behavior of expression was well defined in tests in this line f073184#diff-d867ad35f7992df4ac0dfe8d5e0195ec106d0cf5dd951c36a56fc3022faddc8eR165.

I don't see what is missed here. I explicitly added this. The behaviour was defined without taking into account decimals, I added decimals, so I updated the test cases.

I was hoping we could all agree that 1.5 is a number, just as 2 or 3 is, so not really sure why we're having this discussion. I understand the behaviour changed, but the behaviour was never well defined, but I have no issue with introducing a few more methods which are not ambiguous if that helps you.

faker.expression("#{number.number_between '1','10'}");

Introduce faker.expression("#{number.whole_number_between '1','10'}"); instead?

what about faker.expression("#{number.number_between '1','10', '2'}"); <- where '2' is the
precision - the number of digits after the floating point?
in the same way for java:

int i = faker.number().numberBetween(int, int);
double d1 = faker.number().numberBetween(double, double);
double d2 = faker.number().numberBetween(double, double, int);
long l = faker.number().numberBetween(long, long);

how do you make it returning only 2 digits for double ?

how do you make it returning only 2 digits for double ?

Convert it to string? ๐Ÿคท


@valfirst
If you're going to do something like that I think you may as well use the regexify functionality.

how do you make it returning only 2 digits for double ?

For now faker.expression("#{number.number_between '1','10'}"); may generate the number like 2.9189383404296914, while with precision '2' it could be 2.91

@valfirst
If you're going to do something like that I think you may as well use the regexify functionality.

could you please share example, how regexify can be used to generate a number from 1 to 100?

For now faker.expression("#{number.number_between '1','10'}"); may generate the number like 2.9189383404296914, while with precision '2' it could be 2.91

faker.expression just invokes the method it found in expression and returns the toString result. It doesn't have any knowledge about it's return type of the method it invokes since for different use cases it could be different.

could you please share example, how regexify can be used to generate a number from 1 to 100?

Sorry I should have said numerify not regexify, but anyway:

    @RepeatedTest(100)
    void temp() {
        System.out.println(faker.numerify("#".repeat(faker.number().numberBetween(1, 3))));
    }

If you want to ensure no leading zero, or simply zero:

    @RepeatedTest(100)
    void temp() {
        int digits = faker.number().numberBetween(1,34);
        String pattern = digits == 1 ? "ร˜" : "ร˜" + "#".repeat(digits - 1);
        System.out.println(faker.numerify(pattern));
    }

@valfirst
I think this should made every participant of this issue discussion more or less happy
#1465

can you please test your case with this update?

@valfirst I think this should made every participant of this issue discussion more or less happy #1465

can you please test your case with this update?

I've tested the changes locally, everything works fine now. I greatly appreciate your help and dedication.

I'm not convinced about this. I haven't changed my fact that 1.5 is a number between 1 and 2. This PR is not changing that.

I forgot why I added it for the expression test, it actually may have been a side effect of adding the double method, so I doubt it's going to affect me (or anyone) in reality.

So I'll go with the majority vote :)

I haven't changed my fact that 1.5 is a number between 1 and 2.

nobody was trying to make you changing this

however apart from regular math there is also type system (not only java specific)
if you try to execute the code

for (int i = 0; i < $PUT_YOUR_NUMBER_HERE$; i++) {
    System.out.println(faker.number().numberBetween(1, 2));
}

you will never get 1.5 or any other number with fractional part
this happens because of Narrowing Primitive Conversion mentioned in chapter 5 of JLS

If you want to have anything with fractional part you need to say explicitly that your args are double like

for (int i = 0; i < $PUT_YOUR_NUMBER_HERE$; i++) {
    System.out.println(faker.number().numberBetween(1.0, 2));
}

similar approach is applied in this PR