Very slow build of the bspline object
hlucian opened this issue · 20 comments
Hi everyone
I integrated the splinter library into an inhouse cfd software for testing with lookup-table methods.
As I have tables for various thermophysical quantities (e.g. rho = f(p,h), ....) I create an BSpline object
for each of these tables.
splinterPtr_ = make_unique<SPLINTER::BSpline>(SPLINTER::BSpline::Builder(samples).degree(order_).build());
Originally, I was using an internal implementation for table interpolation using tables with a resolution if 1000x1000,
but I had to realize this seams to be too fine for BSpline. However, even when using a relatively coarse table of 200x200,
the command shown above takes several minutes to complete. Is this a normal behavior? Any options to accelerate this
process?
Kindly
Lucian
Hi Lucian,
Glad that you may have a use case for SPLINTER.
I did a quick test and on my computer a cubic (degree 3) B-spline interpolates a 200x200 table in approximately 10 seconds. I ran this single-threaded on an Intel Core i7-8700K CPU. Note that I used the Python interface and the compiled source in develop-branch. We haven't released this version, but if I remember correctly it should have equal performance to the latest release.
Regarding interpolation of larger tables:
- You could try to switch to another Eigen solver for sparse linear systems (see https://eigen.tuxfamily.org/dox/group__TopicSparseSystems.html). Some of these support multi-threading (see https://eigen.tuxfamily.org/dox/TopicMultiThreading.html).
- If the tables do not change often, you could interpolate (with a high resolution like 1000x1000) and then store the B-splines for later use.
Please keep us posted!
Bjarne
EDIT: The above example used a version of SPLINTER compiled to debug mode. In release mode, building the B-spline takes less than 0.5 seconds.
Hi Bjarne
Thank you for the quick feedback. For me, the 200x200 tables take about 4 minutes. When using a resolution of 1000x1000, i kept it running over the night, but it did not finish. The degree order is set to 2. This is true for all thermophysical tables.
The full call to the splinter library looks as follows:
SPLINTER::DataTable samples;
SPLINTER::DenseVector x(2);
double y;
for(int i = 0; i < nX_; i++)
{
for(int j = 0; j < nY_; j++)
{
// Sample function at x
x(0) = X_coordinate_[i];
x(1) = Y_coordinate_[j];
y = Value_[i][j];
// Store sample
samples.addSample(x,y);
}
}
auto start = chrono::steady_clock::now();
if(RANK==0) cout << "Building splinter interpolation object" << endl;
splinterPtr_ = make_unique<SPLINTER::BSpline>(SPLINTER::BSpline::Builder(samples).degree(order_).build());
auto end = chrono::steady_clock::now();
auto diff = chrono::duration_cast<chrono::seconds>(end - start).count();
if(RANK==0) cout << "Finished building splinter interpolation object in " << diff << "seconds" << endl;
The output looks as follows for the table h = f(p,T):
===============================
Creating ThermoTabulated Object
pTh
===============================
=======================
reading tabular data...
Table: pTh
=======================
Entered Table
-------------------------------------
Table type = pTh
xMin_ = 1.00000000000000e+04
xMax_ = 7.00000000000000e+07
yMin_ = 1.40000000000000e+01
yMax_ = 1.00000000000000e+02
valueMin_ = -5.35947000000000e+04
valueMax_ = 1.51751000000000e+06
-------------------------------------
Building splinter interpolation object
Finished building splinter interpolation object in 231seconds
The material I am looking at is liquid hydrogen in the vicinity of the two phase region. Hence, there are some considerable jumps in the properties when passing from the liquid to the gaseous region. In case, I can also provide you the table including a small Matlab routine to visualize the data in a plot. The output looks as follows for e.g. the density = f(p,h)
You mentioned to use a different solver from the Eigen-library. How can I achieve this? Is there a function to select a different solver? Which one would you suggest?
Also the tables usually never change. So yes, it would absolutely make sense to store the bspline. Also here, how can I save and load them from within my c++ code? Although, as said above 1000x1000 is currently not working for me.
Thank you so much
Kindly
Lucian
Hi Bjarne
Sorry for the new message, but I did some further testing and it does not look like it is related to my tables. In the folder that I cloned from your git repository, there is a folder called test. I compiled the sources as shown below using two different levels of optimization.
g++ *.cpp ../src/*.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -O3 -o outputO3
g++ *.cpp ../src/*.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -O0 -o outputO0
The executable is then started using:
./outputO3 [bspline][general][knotinsertion] |tee 2>&1 logO3
./outputO0 [bspline][general][knotinsertion] |tee 2>&1 logO0
I modified bsplinetestingutilities.cpp so that it creates a table of 200x200 instead 20x20 and added some runtime statements as shown below:
DataTable sampleTestFunction()
{
DataTable samples;
// Sample function
auto x0_vec = linspace(0,2,200);
auto x1_vec = linspace(0,2,200);
std::cout << x1_vec.size() << std::endl;
DenseVector x(2);
double y;
for (auto x0 : x0_vec)
{
for (auto x1 : x1_vec)
{
// Sample function at x
x(0) = x0;
x(1) = x1;
y = sixHumpCamelBack(x);
// Store sample
samples.addSample(x,y);
}
}
return samples;
}
bool testKnotInsertion()
{
DataTable samples = sampleTestFunction();
// Build B-splines that interpolate the samples
std::cout << __LINE__ << std::endl;
auto start = std::chrono::steady_clock::now();
BSpline bspline1 = BSpline::Builder(samples).degree(1).build();
auto end = std::chrono::steady_clock::now();
std::cout << "Runtime 200x200 first order: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " milliseconds" << std::endl;
start = std::chrono::steady_clock::now();
std::cout << __LINE__ << std::endl;
BSpline bspline2 = BSpline::Builder(samples).degree(2).build();
end = std::chrono::steady_clock::now();
std::cout << "Runtime 200x200 second order: " << std::chrono::duration_cast<std::chrono::seconds>(end - start).count() << " seconds" << std::endl;
std::cout << __LINE__ << std::endl;
start = std::chrono::steady_clock::now();
BSpline bspline3 = BSpline::Builder(samples).degree(3).build();
end = std::chrono::steady_clock::now();
std::cout << "Runtime 200x200 third order: " << std::chrono::duration_cast<std::chrono::seconds>(end - start).count() << " seconds" << std::endl;
std::cout << __LINE__ << std::endl;
The output is the following and shows that even when using your testing function, the build commands takes around 3min.
./outputO0 [bspline][general][knotinsertion] |tee 2>&1 logO0
200
51
BSpline::Builder::computeBSplineCoefficients: Computing B-spline control points using sparse solver.
Runtime 200x200 first order: 927 milliseconds
58
BSpline::Builder::computeBSplineCoefficients: Computing B-spline control points using sparse solver.
Runtime 200x200 second order: 183 seconds
62
BSpline::Builder::computeBSplineCoefficients: Computing B-spline control points using sparse solver.
Runtime 200x200 third order: 186 seconds
67
Can you tell me whether you encounter the same behavior on your side? My CPU is a Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
Thank you for posting your test results.
When inspecting your output, I noticed that SPLINTER prints the following line "BSpline::Builder::computeBSplineCoefficients: Computing B-spline control points using sparse solver." I think this means that SPLINTER has been compiled to debug mode. Perhaps you can try to compile to release mode by adding the flag -DNDEBUG.
Another follow-up. I went back to check if had tested SPLINTER in debug mode myself, and indeed I had. I made an edit to my previous comment to mention this. When the sources are compiled to release mode, the 200x200 grid interpolation takes less than 0.5 seconds and the 1000x1000 grid interpolation takes around 100 seconds.
Hi Bjarne
Thank you for your inputs, I recompiled with the -DNDEBUG flag, however, the change is not as you mention in my case.
The results you get, I can only achieve if I set the degree to 1. As soon as I switch to 2 or 3, it takes about 100 seconds already on the 200x200 table. Did you achieve your results on the sixHumbCamelBack function? Otherwise, can you provide me your test setup and how you compile the source?
I tried the following:
- Without DNDEBUG, no optimization (-O0)
g++ .cpp ../src/.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -O0 -o output
1.1 degree = 1, 200x200 -> 0.56 seconds
1.2 degree = 2, 200x200 -> 180 seconds - Without DNDEBUG, optimization (-O3)
g++ .cpp ../src/.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -O3 -o output
2.1 degree = 1, 200x200 -> 0.1 seconds
2.2 degree = 2, 200x200 -> 100 seconds - With DNDEBUG, optimization (-O3)
g++ .cpp ../src/.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -DNDEBUG -O3 -o output
3.1 degree = 1, 200x200 -> 0.09 seconds
3.2 degree = 2, 200x200 -> 100 seconds - With DNDEBUG, optimization (-O3), vectorization (-msse2)
g++ .cpp ../src/.cpp general/*.cpp -I../thirdparty/Catch -I. -I../include -I../thirdparty/Eigen/ -DNDEBUG -O3 -o output
4.1 degree = 1, 200x200 -> 0.09 seconds
4.2 degree = 2, 200x200 -> 100 seconds
Again, thank you a lot for your time and assistance
Kindly
Lucian
I am building with CMake using the configuration in CMakeLists.txt. My quick test was a modified version of the Python-example in https://github.com/bgrimstad/splinter/blob/develop/python/examples/bspline_multivariate.py
. This test function is simpler than the six-hump camel back function, but I do not think that explains the high runtime of your test.
I will try to get some assistance from @gablank, who made SPLINTERS compiler setup.
That would be great. We believe that having smooth interpolation of our table data could
help to stabilize our cavitation predictions. However, as it is currently the case, it takes about
an hour to read all the data, and I am not even sure if the 200x200 resolution is not too coarse
anyway.
Another follow up from my side. I switched to the python example you mentioned and increased the resolution to 200x200.
On my side it takes 36 seconds, did I do something wrong when compiling splinter with CMake? Here is what I changed in bspline_multivariate.py
x1 = np.arange(-5, 5, 0.05)
x2 = np.arange(-5, 5, 0.05)
And for the monitoring of time
start_time = time.time()
bspline = splinter.BSplineBuilder(x, y, degree=[1, 3], smoothing=splinter.BSplineBuilder.Smoothing.NONE).build()
print("--- %s seconds ---" % (time.time() - start_time))
Gives me the following output
python3.6 bspline_multivariate.py
Loaded SPLINTER from /usr/local/lib/libsplinter-3-0.so!
200
--- 36.179590702056885 seconds ---
At the top of the script you'll see a line that looks like this:
splinter.load("/home/username/SPLINTER/build/debug/libsplinter-3-0.so")
Have you made sure to compile to release mode? With my build setup, I have to change the path to:
splinter.load("/home/username/SPLINTER/build/release/libsplinter-3-0.so")
Last thing I can think of is to also test building from the develop-branch, like I did. But as I said earlier, I do not believe that to be the issue.
Yes, I changed this line since i installed in splinter.load("/usr/local/lib/libsplinter-3-0.so")
as you can also see from the output above. cmake ..
inside the build folder returns the following output:
-- Setting compiler options ' -m64 -std=c++11 -Werror=return-type'
Configuring SPLINTER version 3-0 in RELEASE mode for x86-64 (64 bit)
Compiler flags: -m64 -std=c++11 -Werror=return-type -O3 -DNDEBUG
This should be correct, if I am not wrong?
It looks correct to me... Do you mind testing the develop-branch? I see that the example has been modified since the version in the master branch (the degree of the spline is a bit different).
I have mentioned your issue to @gablank. He is quite busy at the moment and it may take a few days before he can look at it.
Yes, i checked out the develop branch but when running the same python command i get the following error:
Exception: Missing version file in SPLINTER directory! Please notify the developers of SPLINTER of this error.
You can try to manually add the file python/splinterpy/version
with the content "4-0".
Ok, just copied the version-file from splinter/version to splinter/python/splinterpy
I rerun the 200x200 table from bspline_multivariate.py modified as shown above and now
the output is:
It is possible that SPLINTER was not compiled for the operating system and architecture you are using. If that is the case, you can compile SPLINTER yourself and load it using 'splinterpy.load("/path/to/SPLINTER.so")'
200
--- 1.1126136779785156 seconds --
So, it looks as if this changes quite a bit
Hello,
Cool to hear you have an interesting use for our library!
Could you run file /usr/local/lib/libsplinter-3-0.so
and paste the output here?
That was quite the difference. Happy you finally got the numbers down.
I will have to take a look at the changes we have made in the develop branch to pin down the culprit. I cannot remember introducing such significant improvements to the interpolation performance.
Hopefully, @gablank and I will get around to preparing release 4.0 within next week. Please let us know if you wish to help in any way or if you have any suggestions for final improvements.
I tried to recompile my c++ sources with the development library. However, although following the docs/cpp_interface.mp
I get the error
error: ‘SPLINTER::DenseVector’ {aka ‘class Eigen::Matrix<double, -1, 1>’} has no member named ‘at’; did you mean ‘data’?
524 | returnVal = splinter().eval(x).at(0)
What would be the correct syntax here?
Ok, i believe the correct syntax should be without the at
as given below
returnVal = splinter().eval(x)(0)
For my CFD solver, the time to build the interpolator has decreased for 300x300 tables from 1150seconds to 5seconds, a factor of 230. Thank you again for all the assistance.
Hi Bjarne
The library seams to work. However, I encounter problems when switching to degree's higher than 1.
Can you tell me, that the algorithm does on the boarders of the table? Does it extrapolate or just uses
the boarder value?
Kindly
Lucian