RustyNum is a high-performance numerical computation library written in Rust, created to demonstrate the potential of Rust's SIMD (Single Instruction, Multiple Data) capabilities using the nightly portable_simd
feature, and serving as a fast alternative to Numpy.
- High Performance: Utilizes Rust's
portable_simd
for accelerated numerical operations across various hardware platforms, achieving up to 2.86x faster computations for certain operations compared to Numpy. - Python Bindings: Seamless integration with Python, providing a familiar Numpy-like interface.
- Lightweight: Minimal dependencies (no external crates are used), ensuring a small footprint and easy deployment. RustyNum Python wheels are only 300kBytes (50x smaller than Numpy wheels).
Supported Python versions: 3.8
, 3.9
, 3.10
, 3.11
, 3.12
Supported operating systems: Windows x86
, Linux x86
, MacOS x86 & ARM
You can install RustyNum directly from PyPI:
pip install rustynum
If that does not work for you please create an issue with the operating system and Python version you're using!
If you're familiar with Numpy, you'll quickly get used to RustyNum!
import numpy as np
import rustynum as rnp
# Using Numpy
a = np.array([1.0, 2.0, 3.0, 4.0], dtype="float32")
a = a + 2
print(a.mean()) # 4.5
# Using RustyNum
b = rnp.NumArray([1.0, 2.0, 3.0, 4.0], dtype="float32")
b = b + 2
print(b.mean().item()) # 4.5
You can perform advanced operations such as matrix-vector and matrix-matrix multiplications:
# Matrix-vector dot product using Numpy
import numpy as np
import rustynum as rnp
a = np.random.rand(4 * 4).astype(np.float32)
b = np.random.rand(4).astype(np.float32)
result_numpy = np.dot(a.reshape((4, 4)), b)
# Matrix-vector dot product using RustyNum
a_rnp = rnp.NumArray(a.tolist())
b_rnp = rnp.NumArray(b.tolist())
result_rust = a_rnp.reshape([4, 4]).dot(b_rnp).tolist()
print(result_numpy) # Example Output: [0.8383043, 1.678406, 1.4153088, 0.7959367]
print(result_rust) # Example Output: [0.8383043, 1.678406, 1.4153088, 0.7959367]
RustyNum offers a variety of numerical operations and data types, with more features planned for the future.
- float64
- float32
- uint8 (experimental)
- int32 (Planned)
- int64 (Planned)
Operation | NumPy Equivalent | RustyNum Equivalent |
---|---|---|
Zeros Array | np.zeros((2, 3)) |
rnp.zeros((2, 3)) |
Ones Array | np.ones((2, 3)) |
rnp.ones((2, 3)) |
Arange | np.arange(start, stop, step) |
rnp.arange(start, stop, step) |
Linspace | np.linspace(start, stop, num) |
rnp.linspace(start, stop, num) |
Mean | np.mean(a) |
rnp.mean(a) |
Min | np.min(a) |
rnp.min(a) |
Max | np.max(a) |
rnp.max(a) |
Exp | np.exp(a) |
rnp.exp(a) |
Log | np.log(a) |
rnp.log(a) |
Sigmoid | 1 / (1 + np.exp(-a)) |
rnp.sigmoid(a) |
Dot Product | np.dot(a, b) |
rnp.dot(a, b) |
Reshape | a.reshape((2, 3)) |
a.reshape([2, 3]) |
Concatenate | np.concatenate([a,b], axis=0) |
rnp.concatenate([a,b], axis=0) |
Element-wise Add | a + b |
a + b |
Element-wise Sub | a - b |
a - b |
Element-wise Mul | a * b |
a * b |
Element-wise Div | a / b |
a / b |
Fancy indexing | np.ones((2,3))[0, :] |
rnp.ones((2,3))[0, :] |
Fancy flipping | np.array([1,2,3])[::-1] |
rnp.array([1,2,3])[::-1] |
Initialization
from rustynum import NumArray
# From a list
a = NumArray([1.0, 2.0, 3.0], dtype="float32")
# From another NumArray
b = NumArray(a)
# From nested lists (2D array)
c = NumArray([[1.0, 2.0], [3.0, 4.0]], dtype="float64")
Methods
reshape(shape: List[int]) -> NumArray
Reshapes the array to the specified shape.
reshaped = a.reshape([3, 1])
matmul(other: NumArray) -> NumArray
Performs matrix multiplication with another NumArray.
result = a.matmul(b)
# or
result = a @ b
dot(other: NumArray) -> NumArray
Computes the dot product with another NumArray.
dot_product = a.dot(b)
mean(axes: Union[None, int, Sequence[int]] = None) -> Union[NumArray, float]
Computes the mean along specified axes.
average = a.mean()
average_axis0 = a.mean(axes=0)
min() -> float
Returns the minimum value in the array.
minimum = a.min()
max() -> float
Returns the maximum value in the array.
maximum = a.max()
tolist() -> Union[List[float], List[List[float]]]
Converts the NumArray to a Python list.
list_representation = a.tolist()
- Matrix-vector dot product
- Matrix-matrix dot product
Planned Features:
- N-dimensional arrays
- Useful for filters, image processing, and machine learning
- Additional operations: median, argmin, argmax, sort, std, var, zeros, cumsum, interp
- Integer support
- Extended shaping and reshaping capabilities
- C++ and WASM bindings
Not Planned:
- Random number generation (use the rand crate)
RustyNum is built on four core principles:
- No 3rd Party Dependencies: Ensuring transparency and control over the codebase.
- Leverage Portable SIMD: Utilizing Rust's nightly SIMD feature for high-performance operations across platforms.
- First-Class Language Bindings: Providing robust support for Python, with plans for WebAssembly and C++.
- Numpy-like Interface: Offering a familiar and intuitive user experience for those coming from Python.
RustyNum leverages Rust's portable_simd
feature to achieve significant performance improvements in numerical computations. On a MacBook Pro M1 Pro, RustyNum outperforms Numpy in several key operations. Below are benchmark results comparing RustyNum 0.1.4
with Numpy 1.24.4
:
Operation | RustyNum (us) | Numpy (us) | Speedup Factor |
---|---|---|---|
Mean (1000 elements) | 8.8993 | 22.6300 | 2.54x |
Min (1000 elements) | 10.1423 | 28.9693 | 2.86x |
Sigmoid (1000 elems) | 10.6899 | 23.2486 | 2.17x |
Dot Product (1000 elems) | 17.0640 | 38.2958 | 2.24x |
Matrix-Vector (1000x1000) | 10,041.6093 | 24,990.2646 | 2.49x |
Matrix-Vector (10000x10000) | 2,731,092.0332 | 2,103,920.4830 | 0.77x |
Matrix-Matrix (500x500) | 7,010.6638 | 14,878.9556 | 2.12x |
Matrix-Matrix (2000x2000) | 225,595.8832 | 257,832.6334 | 1.14x |
Operation | RustyNum (us) | Numpy (us) | Speedup Factor |
---|---|---|---|
Mean (1000 elements) | 9.1026 | 24.0636 | 2.64x |
Min (1000 elements) | 18.2651 | 24.8170 | 1.36x |
Dot Product (1000 elems) | 16.6583 | 38.8000 | 2.33x |
Matrix-Vector (1000x1000) | 9,941.3305 | 23,788.9570 | 2.39x |
Matrix-Vector (10000x10000) | 3,635,297.4664 | 4,962,900.9084 | 1.37x |
Matrix-Matrix (500x500) | 9,683.3815 | 15,866.6376 | 1.64x |
Matrix-Matrix (2000x2000) | 412,333.8586 | 365,047.5000 | 0.89x |
- RustyNum significantly outperforms Numpy in basic operations such as mean, min, and dot product, with speedup factors over 2x.
- For larger operations, especially matrix-vector and matrix-matrix multiplications, Numpy currently performs better, which highlights areas for potential optimization in RustyNum.
These results demonstrate RustyNum's potential for high-performance numerical computations, particularly in operations where SIMD instructions can be fully leveraged.
In addition to the Python bindings, RustyNum’s core library is implemented in Rust. Below is a comparison of RustyNum (rustynum_rs) with two popular Rust numerical libraries: nalgebra 0.33.0
and ndarray 0.16.1
. The benchmarks were conducted using the Criterion crate to measure performance across various basic operations.
Input Size | RustyNum | nalgebra | ndarray |
---|---|---|---|
Addition (10k elements) | 760.53 ns | 695.73 ns | 664.29 ns |
Vector mean (10k elements) | 683.83 ns | 14.602 µs | 1.2370 µs |
Vector Dot Product (10k elements) | 758.65 ns | 1.1843 µs | 1.1942 µs |
Matrix-Vector Multiplication (1k elements) | 77.851 us | 403.39 µs | 115.75 µs |
Matrix-Matrix Multiplication (500 elements) | 2.5526 ms | 2.9038 ms | 2.7847 ms |
Matrix-Matrix Multiplication (1k elements) | 17.836 ms | 21.895 ms | 22.423 ms |
- RustyNum is able to perform on par with nalgebra and ndarray in most operations and sometimes even outperforms them.
- There seems a significant overhead in the Python bindings. We're getting 10ms in Python in RustyNum for the matrix-vector multiplication with 1k elements vs 77us in Rust.
Run using
cargo test
cargo doc --open
Run using
cargo criterion
Don't use maturin. But only setup.py
cd bindings/python/ && python setup.py install
or
cd bindings/python/ && python setup.py bdist_wheel
Then run tests using
pytest tests
or benchmarks using
pytest benchmarks