performance: A Python repository from EvgenyNerush

Overview

Performance tests (for Linux) of (not only) programming languages for scientific calculations, namely Python, Julia, R, C#, Rust and Haskell, which are compared to C and C++.

Performance of math, file output and file input is measured as follows. First, random numbers, uniform in (0,1), are generated. Pairs of random numbers are used to generate floats distributed so that the distribution function is proportional to arcsin(sqrt(x)), and these floats are dumped to the file generated_rand_numbers. Then these numbers are read from the file and center-of-mass of this distribution is computed. The answer should be 5/8 = 0.625.

Running of all the tests can take about a minute. Rust test can take time to install some crates. Haskell test can demand manual installation of some packages (see Haskell/io.cabal).

How to use

To run all the tests, execute
$ ./runMe.sh
Comment some lines containing ./run.sh in runMe.sh file if you do not want run tests for some languages.

Test results are stored in report files. To see them, run
$ ./showAll.sh
report files are rewritten every time you run the tests.

Test results can be then manually added to results.py file and plotted with plotResults.py.

First results

First, 'naive' tests were written. Their results obtained with Intel(R) Xeon(R) X5550 CPU @ 2.67GHz processor are shown in Figure below.

Here marker type corresponds to test type ('o' to the random number generation, '<' to output and '>' to input). Marker color corresponds to the language: blue to C, light blue to C++, pink to Julia, violet to Haskell, yellow to Python, green to Pypy, gray to R, light gray-purple to C#, orange to Rust. Note that subplots differ only by scale of y axis.

It is seen that R and Python random-number generation is extremely slow, Rust is about twice slower than C and Haskell is 1.5 times slower than Rust. Haskell input is also extremely slow.

With Pypy RN generation speed becomes 30 times higher than with Python interpreter, and I/O speed becomes orders of magnitude higher with binary output. However, Pypy cannot be used with all Python code, and files written in binary mode are not human-readable.

Test of C code together with clang compiler (that is based on llvm as well as rustc) shows the same performance as was obtained with gcc.

Slightly better code

In order to improve Rust code, Rust RNG was replaced by simple linear congruential generator, similat to used by C and C++ codes. This lead to the same performance of Rust and C++ codes.

As far as for loops in R and Python are too slow, alternative, slightly tricky, methods for generation of random numbers were provided. These methods are based on 'vector' operations and require much less work of the interpreters than the previous methods.

Haskell code was also improved, with use of Data.Vector.Unboxed and binary I/O.

The obtained results are shown in Figure below.

Update: Kotlin

Kotlin generates random numbers in 1.2 s (with nextDouble from kotlin.random.Random, openjdk-11.0.12), whereas C++ does it in 0.25 s, five times faster. This is the first variant.

To check if kotlin Random is slow, the second variant with the hand-made RNG (Park-Miller) is written which yields 1.3 s. Thus Random is fast. The third variant which uses Park-Miller RNG, does not store the random numbers and computes sum on-the-fly, takes almost the same time, 1.2 s. Thus, no tricks are needed if you are ok with the JVM speed (anyway it is much faster than plain Python or R code). All these variants use ArrayList<Double>, which stores boxed values.

With Kotlin Native (1.4) the first, the second and the third variants takes: 3.5, 1.15, 0.43 s, respectively. All this looks quite confusing: Kotlin compiler looks less efficient than JIT in JVM. However, with array which stores unboxed values (DoubleArray) the results are much better: 2.6, 0.45, 0.43 s. Thus, using unboxed values and compilation to native (machine code), one can get code just two times slower than the C++ code (the time of C#, Pypy or Haskell).

Conclusion