SciProgCentre/kmath

Slow multiplication of BigIntegers

Opened this issue · 21 comments

The used multiplication method is naive. Its time complexity is O(n2). Even Karatsuba (that is used in java.util.math.BigInteger) multiplication takes O(nlog23). FFT algorithm (since year 2019 approach by David Harvey) can help to make it exactly O(n log n).

Thanks for reporting! I am also very interested what is your use case for big arithmetics. This information will help us to prioritize issues.

I would like to have some multiplatform BigInteger's in stdlib in the future. My current desire is to have some interfaces that would allow me to return the real size of some progressions without reducing it to some MAX_VALUE when an overflow happens. The interesting fact is that the size of T.MIN_VALUE..T.MAX_VALUE is out of bounds of T for all bounded T. That is why I need unbounded BigIntegers.

Hi. Thanks for the issue. I've created a benchmark for simple BigInt operations here.

Currently results for JVM BigInteger and KM BigInt seem to be the same both for addition and for multiplication (I've checked on GraalVM 11 and Liberica-15). On Liberica KM multiplication seems to be even slightly (like 10%) faster.

Still, the BigInt/BigDecimal part still needs a lot of work both in terms of code (we need at least #129) and in terms of documentation. Contributions are quite welcome.

@altavir That is because java.util.math.BigInteger is also slow. Maybe, performing benchmarks on bigger numbers wil show the difference even with java.util implementation. I'll check it a bit later.

It would be interesting if you could contribute something. I do not think that we have multiplatform FFT right now (see #42). So if you need it, you will have to implement it as well or rely on JVM Commons-Math implementation in kmath-commons.

When I cloned the repo, I got several problems:

  • I couldn't sync the master branch
    image

  • Trying to launch benchmarks of BigInts in dev branch I faced another problem:
    image
    The reason is that this method is available since Java 9.
    I tried to specify java version 11 in several places but failed to make it work. Is there any recipe?

The minimal required JDK version is 11, so you need it to be present both in IDEA and gradle settings.

As for master, it was not updated for some time. I need to check. It could still point to bintray, which is not available anymore.

I've updated the readme to make it clear about JDK. I've also merged dev branch to master to cleanup old dependencies

Calling benchmark task caused the following:
image

All other benchmarks pass successfully.

There is a some kind of incompatibility in latest kotlinx-benchmarks or gradle 7. I will try to resolve that. By the way, the discussion is available at kotl.in/slack in mathematics channel.

Thank you! I will join it now then!

I have just written the tests. They show the slowness of the current implementation.

Solution:

  • Fast power algorithm must be used to calculate power.
  • At least Karatsuba or (better) FFT-based algorithms must be used to calculate multiplication.

Current results on my laptop:

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmAdd

Warm-up 1: 16623818.766 ops/s
Iteration 1: 18948124.083 ops/s
Iteration 2: 24940179.011 ops/s
Iteration 3: 32620909.027 ops/s

25503070.707 ±(99.9%) 125037925.223 ops/s [Average]
  (min, avg, max) = (18948124.083, 25503070.707, 32620909.027), stdev = 6853750.603
  CI (99.9%): [≈ 0, 150540995.930] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmMultiply

Warm-up 1: 19940315.907 ops/s
Iteration 1: 22572181.957 ops/s
Iteration 2: 31369627.025 ops/s
Iteration 3: 35495187.932 ops/s

29812332.305 ±(99.9%) 120422246.074 ops/s [Average]
  (min, avg, max) = (22572181.957, 29812332.305, 35495187.932), stdev = 6600749.654
  CI (99.9%): [≈ 0, 150234578.379] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmMultiplyLarge

Warm-up 1: 224431.420 ops/s
Iteration 1: 290739.693 ops/s
Iteration 2: 319635.001 ops/s
Iteration 3: 332487.569 ops/s

314287.421 ±(99.9%) 390078.240 ops/s [Average]
  (min, avg, max) = (290739.693, 314287.421, 332487.569), stdev = 21381.505
  CI (99.9%): [≈ 0, 704365.661] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmPower

Warm-up 1: 4.629 ops/s
Iteration 1: 9.461 ops/s
Iteration 2: 12.534 ops/s
Iteration 3: 15.228 ops/s

12.407 ±(99.9%) 52.642 ops/s [Average]
  (min, avg, max) = (9.461, 12.407, 15.228), stdev = 2.885
  CI (99.9%): [≈ 0, 65.050] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmAdd

Warm-up 1: 9645898.765 ops/s
Iteration 1: 13034943.473 ops/s
Iteration 2: 13226740.829 ops/s
Iteration 3: 15728034.121 ops/s

13996572.807 ±(99.9%) 27412158.694 ops/s [Average]
  (min, avg, max) = (13034943.473, 13996572.807, 15728034.121), stdev = 1502552.916
  CI (99.9%): [≈ 0, 41408731.501] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmMultiply

Warm-up 1: 20185148.904 ops/s
Iteration 1: 21782694.098 ops/s
Iteration 2: 27228890.681 ops/s
Iteration 3: 28195685.354 ops/s

25735756.711 ±(99.9%) 63076074.286 ops/s [Average]
  (min, avg, max) = (21782694.098, 25735756.711, 28195685.354), stdev = 3457412.472
  CI (99.9%): [≈ 0, 88811830.997] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmMultiplyLarge

Warm-up 1: 19304.845 ops/s
Iteration 1: 23215.614 ops/s
Iteration 2: 24009.905 ops/s
Iteration 3: 24774.657 ops/s

24000.059 ±(99.9%) 14222.227 ops/s [Average]
  (min, avg, max) = (23215.614, 24000.059, 24774.657), stdev = 779.568
  CI (99.9%): [9777.831, 38222.286] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmPower

Warm-up 1: 0.335 ops/s
Iteration 1: 0.257 ops/s
Iteration 2: 0.252 ops/s
Iteration 3: 0.251 ops/s

0.254 ±(99.9%) 0.056 ops/s [Average]
  (min, avg, max) = (0.251, 0.254, 0.257), stdev = 0.003
  CI (99.9%): [0.198, 0.309] (assumes normal distribution)

Thanks, I will merge it as soon as I have time. I had to rework benchmarks to make them work with current versions. Your tests show extremely bad performance only for power. We will see what the problem is.

Well, big multiplications are about 20 times slower... I suppose that to be slow. Am I right?

390078.24 for JVM per 14222.227 for KM

Indeed. It probably could be improved.

I have just increased the big numbers to multiply, the difference is now about 100 times. And that is not the limit.

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmAdd

Warm-up 1: 17638432.849 ops/s
Iteration 1: 23641677.865 ops/s
Iteration 2: 25754833.328 ops/s
Iteration 3: 26222370.772 ops/s

25206293.988 ±(99.9%) 25085387.513 ops/s [Average]
  (min, avg, max) = (23641677.865, 25206293.988, 26222370.772), stdev = 1375014.736
  CI (99.9%): [120906.475, 50291681.501] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmMultiply

Warm-up 1: 8455599.190 ops/s
Iteration 1: 15061868.694 ops/s
Iteration 2: 21897871.651 ops/s
Iteration 3: 22512031.738 ops/s

19823924.028 ±(99.9%) 75446509.562 ops/s [Average]
  (min, avg, max) = (15061868.694, 19823924.028, 22512031.738), stdev = 4135477.772
  CI (99.9%): [≈ 0, 95270433.590] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmMultiplyLarge

Warm-up 1: 78.250 ops/s
Iteration 1: 110.630 ops/s
Iteration 2: 129.524 ops/s
Iteration 3: 140.550 ops/s

126.901 ±(99.9%) 276.054 ops/s [Average]
  (min, avg, max) = (110.630, 126.901, 140.550), stdev = 15.131
  CI (99.9%): [≈ 0, 402.955] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmPower

Warm-up 1: 5.665 ops/s
Iteration 1: 8.819 ops/s
Iteration 2: 15.286 ops/s
Iteration 3: 17.343 ops/s

13.816 ±(99.9%) 81.141 ops/s [Average]
  (min, avg, max) = (8.819, 13.816, 17.343), stdev = 4.448
  CI (99.9%): [≈ 0, 94.957] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmAdd

Warm-up 1: 12743199.982 ops/s
Iteration 1: 15636619.941 ops/s
Iteration 2: 14940632.235 ops/s
Iteration 3: 15145173.819 ops/s

15240808.665 ±(99.9%) 6526033.325 ops/s [Average]
  (min, avg, max) = (14940632.235, 15240808.665, 15636619.941), stdev = 357713.908
  CI (99.9%): [8714775.340, 21766841.990] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmMultiply

Warm-up 1: 16431023.370 ops/s
Iteration 1: 22855376.562 ops/s
Iteration 2: 25031011.891 ops/s
Iteration 3: 30259404.752 ops/s

26048597.735 ±(99.9%) 69425740.706 ops/s [Average]
  (min, avg, max) = (22855376.562, 26048597.735, 30259404.752), stdev = 3805459.115
  CI (99.9%): [≈ 0, 95474338.441] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmMultiplyLarge

Warm-up 1: 1.171 ops/s
Iteration 1: 1.229 ops/s
Iteration 2: 1.219 ops/s
Iteration 3: 1.201 ops/s

1.216 ±(99.9%) 0.260 ops/s [Average]
  (min, avg, max) = (1.201, 1.216, 1.229), stdev = 0.014
  CI (99.9%): [0.956, 1.476] (assumes normal distribution)

benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmPower

Warm-up 1: 0.286 ops/s
Iteration 1: 0.205 ops/s
Iteration 2: 0.204 ops/s
Iteration 3: 0.205 ops/s

0.205 ±(99.9%) 0.013 ops/s [Average]
  (min, avg, max) = (0.204, 0.205, 0.205), stdev = 0.001
  CI (99.9%): [0.192, 0.218] (assumes normal distribution)

By the way, FFT-based multiplication is implemented in Haskell:

zhelenskiy@zhelenskiy-Swift-SF315-52G:~$ time (echo "print(10**1000000)" | python) >/dev/null

real    0m14,191s
user    0m14,178s
sys     0m0,015s
zhelenskiy@zhelenskiy-Swift-SF315-52G:~$ time (echo "print(10.toBigInteger().pow(1000000))" | kotlinc) >/dev/null
Apr 16, 2021 11:53:54 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)

real    0m8,600s
user    0m13,922s
sys     0m4,507s
zhelenskiy@zhelenskiy-Swift-SF315-52G:~$ time (echo "print((10::Integer)^1000000)" | ghci) >/dev/null

real    0m1,804s
user    0m1,244s
sys     0m0,544s
zhelenskiy@zhelenskiy-Swift-SF315-52G:~$ time (echo "print((10::Integer)^10000000)" | ghci) >/dev/null

real    0m15,859s
user    0m11,540s
sys     0m4,328s

(The second Haskell attempt has one more 0)

I test with 100000 in the KMath because even it is very slow for both KMath and java.util.BigInteger. kotlinc initialization takes some time, but not 14 seconds. GHCI is an interpreter by the way so compiling would even fasten the Haskell code.

Karatsuba is added by me, some bugs are fixed. However, Fourier based method is still not implemented.

Fourier based method is still not implemented

Wow, really interesting article, thanks :)
But based on the article, the FFT-based algorithm currently has no practical use, it is "only" theoretically significant:

... The new algorithm is not really practical in its current form ... we are hopeful that with further refinements, the algorithm might become practical for numbers with merely billions or trillions of digits ...