Use C
code to realize online calculation of weighted linear regression, and pack the code to Python
function.
When we finish the calculation of weighted linear regression for n1
samples, we want to update our results for other n2
samples in the future, instead of calculating n1 + n2
from the beginning with high time and space complexity.
-
lapack is used.
liblapack
can be find in package manager of Linux/Homebrew/Windows Subsystem for Linux. -
Use
gcc calculator.c -llapack
to compileC
code -
In the main file of
calculator.c
, a very large loop exists to test whether all memory used for one loop is successfully recycled. Runa.out
to test, and check the usage of system memory. -
Use
gcc -fPIC -llapack -shared -o a.so calculator.c
to compile.so
forPython
. -
Python
separates the functions from.so
files. Therefore, if we call anotherC
function from outside, this may lead to failures. Thus, we have to copy all functions used forwls_iter
inside. -
Run
python benchmark.py
for a series of tests.- Check whether the
C
code agrees with the results fromnumpy
. - Check the speed of the
statsmodels
package, myC
code, andnumpy
, especially for online updates. - Check the speed and memory usage of the
statsmodels
package and myC
code with a very large dataset.
- Check whether the
My C
code is much more efficient than the other methods. More details can be found in report.ipynb
.