Python Optimization Demonstration Project, showcasing various optimization techniques such as Cython, Numba, Numpy, etc., and providing benchmark references in C# & Java.
The goal of this project is to calculate all prime numbers between 2 and 2^20. PS. All code were generated by ChatGPT.
Source File | Description |
---|---|
test.cs | Benchmark Reference, C# version |
Test.java | Benchmark Reference, Java version |
test1.py | Pure Python |
test2.py | Cython test entry |
test2.pyx | Pure Python clone, without any modification |
test2_1.pyx | Add parameter, return value, and local variable type information. |
test2_2.pyx | Using C version math.sqrt |
test2_3.pyx | Convert the list to a NumPy array and use memory views to directly access array elements via pointers. |
test3.py | Use Numba to optimize only the "is_prime" function. |
test3_1.py | Optimize the "find_primes" function using Numba and replace Python lists with Numba's built-in typed list. |
setup.py | Cython build script. With "annotate=True" to enable annotation mode, the compilation also generates an HTML file that helps developers quickly identify performance bottlenecks. |
Version | Execution Time | Benchmark ratio |
---|---|---|
C# | ~0.9s | 100% |
Java | ~0.93s | 103% |
Pure Python | ~24s | 2667% |
Python to Cython clone | ~15s | 1667% |
Cython type added | ~1.72s | 191% |
Cython C version sqrt |
~0.96s | 107% |
Cython Numpy & MemoryView | ~0.87s | 97% |
Numba jit on is_prime only |
~2.7s | 300% |
Numba use type-list | ~1.04s | 115% |
- The JIT (Just-In-Time) virtual machines of C# and Java are quite powerful and are on par with Python optimized to the extreme.
- Without changing a single line of code, simply transitioning from interpreting Python source code to compiling it can improve performance by at least 20% and enable code encryption. There is no reason not to include this treatment in the release version.
- Cython enables progressive and localized optimization. With a small amount of new knowledge built upon Python, significant performance improvements on the order of magnitude can be achieved. Unlike many other languages' native optimization techniques, it doesn't require learning a new language or syntax.
- After analyzing performance bottlenecks using cProfile, you can directly optimize the most impactful parts by simply adding parameter, return value, and local variable type annotations, which can yield excellent results.
- One of the main bottlenecks lies in the conversion between Python objects and C types. Minimizing such conversions by using C functions can maximize performance gains.
- Numpy, with its memory management taken care of, allows for optimal balance between performance and complexity when combined with Numpy arrays and memory views.
- In this example, the size of the resulting list data is relatively small and requires no processing, which doesn't showcase the full power and performance of Numpy. Numpy is the killer framework for Python.
- Numba has certain limitations and incurs a startup cost upon first invocation. However, it doesn't require compilation and can retain the interpretive execution approach, making it highly flexible and suitable for specific scenarios.