Slow multiplication of BigIntegers
Opened this issue · 21 comments
The used multiplication method is naive. Its time complexity is O(n2). Even Karatsuba (that is used in java.util.math.BigInteger
) multiplication takes O(nlog23). FFT algorithm (since year 2019 approach by David Harvey) can help to make it exactly O(n log n).
Thanks for reporting! I am also very interested what is your use case for big arithmetics. This information will help us to prioritize issues.
I would like to have some multiplatform BigInteger's in stdlib in the future. My current desire is to have some interfaces that would allow me to return the real size of some progressions without reducing it to some MAX_VALUE
when an overflow happens. The interesting fact is that the size of T.MIN_VALUE..T.MAX_VALUE
is out of bounds of T
for all bounded T
. That is why I need unbounded BigIntegers
.
Hi. Thanks for the issue. I've created a benchmark for simple BigInt operations here.
Currently results for JVM BigInteger
and KM BigInt
seem to be the same both for addition and for multiplication (I've checked on GraalVM 11 and Liberica-15). On Liberica KM multiplication seems to be even slightly (like 10%) faster.
Still, the BigInt/BigDecimal part still needs a lot of work both in terms of code (we need at least #129) and in terms of documentation. Contributions are quite welcome.
@altavir That is because java.util.math.BigInteger
is also slow. Maybe, performing benchmarks on bigger numbers wil show the difference even with java.util
implementation. I'll check it a bit later.
It would be interesting if you could contribute something. I do not think that we have multiplatform FFT right now (see #42). So if you need it, you will have to implement it as well or rely on JVM Commons-Math implementation in kmath-commons.
When I cloned the repo, I got several problems:
The minimal required JDK version is 11, so you need it to be present both in IDEA and gradle settings.
As for master, it was not updated for some time. I need to check. It could still point to bintray, which is not available anymore.
I've updated the readme to make it clear about JDK. I've also merged dev branch to master to cleanup old dependencies
All other benchmarks pass successfully.
There is a some kind of incompatibility in latest kotlinx-benchmarks or gradle 7. I will try to resolve that. By the way, the discussion is available at kotl.in/slack in mathematics channel.
Thank you! I will join it now then!
I have just written the tests. They show the slowness of the current implementation.
Solution:
- Fast power algorithm must be used to calculate power.
- At least Karatsuba or (better) FFT-based algorithms must be used to calculate multiplication.
Current results on my laptop:
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmAdd
Warm-up 1: 16623818.766 ops/s
Iteration 1: 18948124.083 ops/s
Iteration 2: 24940179.011 ops/s
Iteration 3: 32620909.027 ops/s
25503070.707 ±(99.9%) 125037925.223 ops/s [Average]
(min, avg, max) = (18948124.083, 25503070.707, 32620909.027), stdev = 6853750.603
CI (99.9%): [≈ 0, 150540995.930] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmMultiply
Warm-up 1: 19940315.907 ops/s
Iteration 1: 22572181.957 ops/s
Iteration 2: 31369627.025 ops/s
Iteration 3: 35495187.932 ops/s
29812332.305 ±(99.9%) 120422246.074 ops/s [Average]
(min, avg, max) = (22572181.957, 29812332.305, 35495187.932), stdev = 6600749.654
CI (99.9%): [≈ 0, 150234578.379] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmMultiplyLarge
Warm-up 1: 224431.420 ops/s
Iteration 1: 290739.693 ops/s
Iteration 2: 319635.001 ops/s
Iteration 3: 332487.569 ops/s
314287.421 ±(99.9%) 390078.240 ops/s [Average]
(min, avg, max) = (290739.693, 314287.421, 332487.569), stdev = 21381.505
CI (99.9%): [≈ 0, 704365.661] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmPower
Warm-up 1: 4.629 ops/s
Iteration 1: 9.461 ops/s
Iteration 2: 12.534 ops/s
Iteration 3: 15.228 ops/s
12.407 ±(99.9%) 52.642 ops/s [Average]
(min, avg, max) = (9.461, 12.407, 15.228), stdev = 2.885
CI (99.9%): [≈ 0, 65.050] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmAdd
Warm-up 1: 9645898.765 ops/s
Iteration 1: 13034943.473 ops/s
Iteration 2: 13226740.829 ops/s
Iteration 3: 15728034.121 ops/s
13996572.807 ±(99.9%) 27412158.694 ops/s [Average]
(min, avg, max) = (13034943.473, 13996572.807, 15728034.121), stdev = 1502552.916
CI (99.9%): [≈ 0, 41408731.501] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmMultiply
Warm-up 1: 20185148.904 ops/s
Iteration 1: 21782694.098 ops/s
Iteration 2: 27228890.681 ops/s
Iteration 3: 28195685.354 ops/s
25735756.711 ±(99.9%) 63076074.286 ops/s [Average]
(min, avg, max) = (21782694.098, 25735756.711, 28195685.354), stdev = 3457412.472
CI (99.9%): [≈ 0, 88811830.997] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmMultiplyLarge
Warm-up 1: 19304.845 ops/s
Iteration 1: 23215.614 ops/s
Iteration 2: 24009.905 ops/s
Iteration 3: 24774.657 ops/s
24000.059 ±(99.9%) 14222.227 ops/s [Average]
(min, avg, max) = (23215.614, 24000.059, 24774.657), stdev = 779.568
CI (99.9%): [9777.831, 38222.286] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmPower
Warm-up 1: 0.335 ops/s
Iteration 1: 0.257 ops/s
Iteration 2: 0.252 ops/s
Iteration 3: 0.251 ops/s
0.254 ±(99.9%) 0.056 ops/s [Average]
(min, avg, max) = (0.251, 0.254, 0.257), stdev = 0.003
CI (99.9%): [0.198, 0.309] (assumes normal distribution)
Thanks, I will merge it as soon as I have time. I had to rework benchmarks to make them work with current versions. Your tests show extremely bad performance only for power. We will see what the problem is.
Well, big multiplications are about 20 times slower... I suppose that to be slow. Am I right?
390078.24 for JVM per 14222.227 for KM
Indeed. It probably could be improved.
I have just increased the big numbers to multiply, the difference is now about 100 times. And that is not the limit.
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmAdd
Warm-up 1: 17638432.849 ops/s
Iteration 1: 23641677.865 ops/s
Iteration 2: 25754833.328 ops/s
Iteration 3: 26222370.772 ops/s
25206293.988 ±(99.9%) 25085387.513 ops/s [Average]
(min, avg, max) = (23641677.865, 25206293.988, 26222370.772), stdev = 1375014.736
CI (99.9%): [120906.475, 50291681.501] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmMultiply
Warm-up 1: 8455599.190 ops/s
Iteration 1: 15061868.694 ops/s
Iteration 2: 21897871.651 ops/s
Iteration 3: 22512031.738 ops/s
19823924.028 ±(99.9%) 75446509.562 ops/s [Average]
(min, avg, max) = (15061868.694, 19823924.028, 22512031.738), stdev = 4135477.772
CI (99.9%): [≈ 0, 95270433.590] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmMultiplyLarge
Warm-up 1: 78.250 ops/s
Iteration 1: 110.630 ops/s
Iteration 2: 129.524 ops/s
Iteration 3: 140.550 ops/s
126.901 ±(99.9%) 276.054 ops/s [Average]
(min, avg, max) = (110.630, 126.901, 140.550), stdev = 15.131
CI (99.9%): [≈ 0, 402.955] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.jvmPower
Warm-up 1: 5.665 ops/s
Iteration 1: 8.819 ops/s
Iteration 2: 15.286 ops/s
Iteration 3: 17.343 ops/s
13.816 ±(99.9%) 81.141 ops/s [Average]
(min, avg, max) = (8.819, 13.816, 17.343), stdev = 4.448
CI (99.9%): [≈ 0, 94.957] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmAdd
Warm-up 1: 12743199.982 ops/s
Iteration 1: 15636619.941 ops/s
Iteration 2: 14940632.235 ops/s
Iteration 3: 15145173.819 ops/s
15240808.665 ±(99.9%) 6526033.325 ops/s [Average]
(min, avg, max) = (14940632.235, 15240808.665, 15636619.941), stdev = 357713.908
CI (99.9%): [8714775.340, 21766841.990] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmMultiply
Warm-up 1: 16431023.370 ops/s
Iteration 1: 22855376.562 ops/s
Iteration 2: 25031011.891 ops/s
Iteration 3: 30259404.752 ops/s
26048597.735 ±(99.9%) 69425740.706 ops/s [Average]
(min, avg, max) = (22855376.562, 26048597.735, 30259404.752), stdev = 3805459.115
CI (99.9%): [≈ 0, 95474338.441] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmMultiplyLarge
Warm-up 1: 1.171 ops/s
Iteration 1: 1.229 ops/s
Iteration 2: 1.219 ops/s
Iteration 3: 1.201 ops/s
1.216 ±(99.9%) 0.260 ops/s [Average]
(min, avg, max) = (1.201, 1.216, 1.229), stdev = 0.014
CI (99.9%): [0.956, 1.476] (assumes normal distribution)
benchmarks: space.kscience.kmath.benchmarks.BigIntBenchmark.kmPower
Warm-up 1: 0.286 ops/s
Iteration 1: 0.205 ops/s
Iteration 2: 0.204 ops/s
Iteration 3: 0.205 ops/s
0.205 ±(99.9%) 0.013 ops/s [Average]
(min, avg, max) = (0.204, 0.205, 0.205), stdev = 0.001
CI (99.9%): [0.192, 0.218] (assumes normal distribution)
By the way, FFT-based multiplication is implemented in Haskell:
zhelenskiy@zhelenskiy-Swift-SF315-52G:~$ time (echo "print(10**1000000)" | python) >/dev/null
real 0m14,191s
user 0m14,178s
sys 0m0,015s
zhelenskiy@zhelenskiy-Swift-SF315-52G:~$ time (echo "print(10.toBigInteger().pow(1000000))" | kotlinc) >/dev/null
Apr 16, 2021 11:53:54 PM org.jline.utils.Log logr
WARNING: Unable to create a system terminal, creating a dumb terminal (enable debug logging for more information)
real 0m8,600s
user 0m13,922s
sys 0m4,507s
zhelenskiy@zhelenskiy-Swift-SF315-52G:~$ time (echo "print((10::Integer)^1000000)" | ghci) >/dev/null
real 0m1,804s
user 0m1,244s
sys 0m0,544s
zhelenskiy@zhelenskiy-Swift-SF315-52G:~$ time (echo "print((10::Integer)^10000000)" | ghci) >/dev/null
real 0m15,859s
user 0m11,540s
sys 0m4,328s
(The second Haskell attempt has one more 0)
I test with 100000 in the KMath
because even it is very slow for both KMath and java.util.BigInteger
. kotlinc
initialization takes some time, but not 14 seconds. GHCI is an interpreter by the way so compiling would even fasten the Haskell code.
Karatsuba is added by me, some bugs are fixed. However, Fourier based method is still not implemented.
Fourier based method is still not implemented
Wow, really interesting article, thanks :)
But based on the article, the FFT-based algorithm currently has no practical use, it is "only" theoretically significant:
... The new algorithm is not really practical in its current form ... we are hopeful that with further refinements, the algorithm might become practical for numbers with merely billions or trillions of digits ...