Feedback Needed - Refactored SpMxSpM
kvaragan opened this issue · 14 comments
Single API is now split into multiple API's in order to allow the application to allocate memory rather than the library.
I have written the sample code on how to use this API.
Is this the better-way ? See the code @
sample file : https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/samples/sample-spmspm.cpp
refactor code: https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/src/library/blas3/clsparse-spm-spm.cpp
API's : https://github.com/kvaragan/clSPARSE/blob/kvaragan_PR150/src/include/clSPARSE.h
I'm checking it right now.
I think the way to multiply two sparse matrices here is way too complicated. I understand that the issue is related to the concept that user is responsible for buffer allocation. Here are my general comments
- Users should not see the
rowPtrCtSizeInBytes
andbuffer_d_SizeInBytes
I think they should be hidden inspgemmInfo
- I would rename
clsparseSpGEMM
toclsparseSpGEMMInfo
orclsparseSpGEMMInfoHelper
not to confuse that with function name. - I can't say that I like the names of the explicit functions like here
The general explicit call for spGemm should be done at most in two separated functions. The first one calculates the necessary memory requirements for result matrix, second is doing actual spmm.
clsparseSpGEMM spgemmInfo;
clsparseStatus status = clsparse<S/D>SpmmAnalyse(&A, &B, &spgemmInfo, control);
if (status == clsparseSuccess)
{
//check the spgemmInfo additional information, i.e. if there is enough memory to
//complete spmm operation.
//allocate the C matrix according to spgemminfo content;
//do the spmm
status = clsparse<S/D>Spmm(&A, &B, &C, &spgemmInfo, control);
}
clsparseReleaseSpGEMMInfo(&spgemmInfo);
I see that it might be difficult at the moment, but I would like to see that every iteration of this algorithm bring us to this kind of concept.
Here some more details inside of clsparse-spm-spm.cpp:
- This defines as previously should be defined in the places where they are used.
- I would rename those and those variables to Anum_rows, Anum_cols, etc. for the sake of clarity, it will be easier to maintain the code later.
- This kind of interface is unacceptable in my opinion. We will be flooded by issues from the users in cases when they will put something wrong there.
- In several places like here you were using explicit scan. I would create an internal function for that.
- I would use ``clsparse::vector` here and in any other places where you allocating cl_mem buffer instead. Here as well
- Why dont pass ref to std::vector instead of raw pointers here. In addition templetize the
statistics
funcion for various int types. - Another scan. Maybe
int nnzC = computeCsrNNZ(...)
whould be more convinient
I did not go trough the internal spmm functions deeply for now, but I saw that they need to be templetized for various types, the kernels should be feeded with additional definitions group size, tuple_queue etc. In the final optimisation stages several kernels like compute_nnz_Ct looks like spmv traversal so they might be optimised for vectorized versions.
Thanks for the feedback @jpola . I agree with all comments and your suggestions can be incorporated, but before I do that should we go ahead with this design. Basic principle of this design is, to separate memory allocations from computation and user should have control for buffer allocation.
@kvaragan
I approve of your goal here, but I have to agree with Jakub that this interface is not finished yet. If I have to ask users to change their SpM-SpM interface 1 time before v1.0, I would rather ask them to change from using this API:
CLSPARSE_EXPORT clsparseStatus
clsparseScsrSpGemm(
const clsparseCsrMatrix* sparseMatA,
const clsparseCsrMatrix* sparseMatB,
clsparseCsrMatrix* sparseMatC,
const clsparseControl control );
than to change from these API's
clstatus = clsparseSpMSpM_CreateInit(&A, &B, control,
&spgemmInfo, // output
&rowPtrCtSizeInBytes, // output
&buffer_d_SizeInBytes); // output
clstatus = clsparseSpMSpM_ComputennzCtExt(csrRowPtrCt_d,
rowPtrCtSizeInBytes,
sizeof(cl_int),
buffer_d,
buffer_d_SizeInBytes,
&nnzCt,
spgemmInfo,
control);
clstatus = clsparseSpMSpM_ScsrSpmSpmnnzC(C.rowOffsets, // ( m+1) * sizeof(int) - memory allocated
csrRowPtrCt_d, // ( m+1) * sizeof(int) - memory allocated
&csrColIndCt_d, // nnzCt * sizeof(int)
&csrValCt_d, // nnzCt * sizeof(float) - Single Precision
&nnzC, // nnz of Output matrix
spgemmInfo,
control);
clstatus = clsparseSpMSpM_FillScsrOutput(C.rowOffsets,
C.colIndices,
C.values,
csrRowPtrCt_d,
csrColIndCt_d,
csrValCt_d,
nnzC,
spgemmInfo,
control);
I agree that currently the SpM-SpM is allocating a lot of internal memory, but the user experience of the original API is more in line with what we want present to our users. I have not combed through the implementation files, but I trust that Jakub has given sufficient feedback for this iteration.
Here's your challenge: can you design a working implementation with an API like what Jakub mocked above? 1 helper clsparse<S/D>SpmmAnalyse()
, 1 worker clsparse<S/D>Spmm()
@jpola
Will it be possible to just typedef clsparse::vector
to the boost.compute vector? Same with array?
I wish but I think it would not be possible. I'll take a closer look on
that tomorrow.
2015-09-29 23:37 GMT+02:00 Kent Knox notifications@github.com:
@jpola https://github.com/jpola
Will it be possible to just typedef clsparse::vector to the boost.compute
vector? Same with array?—
Reply to this email directly or view it on GitHub
#158 (comment)
.
@jpola using clsparse::vector is same as using clCreateBuffer(),because internally that's what clsparse::vector is doing. If I use clsparse::vector then it will defeat the purpose of this design.
@kknox Because of the nature of the algorithm, 2 APIs are not enough but if we use clsparse::vector may be it can be done, but that means memory allocations inside the library and that will be as good as our first implementation.
So can I conclude we that will be going with the 1 API design which is already present in the develop branch?
@kvaragan of course the clsparse::vector is built over the opencl buffer. It was just my suggestion that I would use it because it have more benefits among many I think those are the most important:
- Code is more readible which means that it is easier to understand and maintain.
- clsparse::vector manages the memory for you. You can avoid potential memory leaks. Even if i saw that the code seems not to have any we can't be sure that in the future we do not make any mistake. The use of clsparse::vector or array will minimise the issues related to the code by enforcing future developers to adopt to the present style.
- In the near future it seems that we are moving to the boost::compute. It will indicate where the old clsparse::vector might be substituted for more advanced / user friendly data structure.
- vector know its size, We will be able to simplify the code greatly by using proper constructor of std::vector, compute::vector and compute::copy functions to eliminate clEnqueueWrite/Read/Copy functions.
@jpola If this is the case, then why don't we expose this header to application, so that application can start using clsparse::vector. I don't know whether this is currently possible or not. We will have advantages of what you mentioned and advantages of user having control over memory allocations.
If we can't expose this header, then we don't need this new design we can stick to the old one and just replace explicit device memory allocations with clsparse::vector.
Sorry Kiran, I might misunderstood you. The concept I mentioned before is
applicable only for internal code of our library. My answer was referring
to the point 5. from my list of feedbacks. Instead of cl_mem I wish to use
clsparse::vector.
When it comes to interface functions from clSPARSE.h header we are passing
cl_mem due to fact that we want that clSPARSE is having C style interface.
Library then can be easily adopted to other programming languages like
python or fortran.
However when you are implementing interface functions you can put a
clsparse::vector interface temporary for input and output parameters with
this constructor:
//create vector from preinitialized buffer.
vector (clsparseControl control, const cl::Buffer& buffer, size_t size) :
BASE::_buff(buffer), BASE::_size(size), queue(control->queue)
You may see the
https://github.com/clMathLibraries/clSPARSE/blob/master/src/library/solvers/conjugate-gradients.hpp
to check how it's done.
Concluding, I wanted to suggest that when you are in the definition of the
interface functions or in any other internal clSPARSE function use
clsparse::vector or some more use-friendly datatype than cl_mem or put the
clsparse::vector interface on incomming function parameters of type cl_mem.
Kuba.
2015-09-30 10:42 GMT+02:00 Kiran notifications@github.com:
@jpola https://github.com/jpola If this is the case, then why don't we
expose this header to application, so that application can start using
clsparse::vector. I don't know whether this is currently possible or not.
We will have advantages of what you mentioned and advantages of user having
control over memory allocations.If we can't expose this header, then we don't need this new design we can
stick to the old one and just replace explicit device memory allocations
with clsparse::vector.—
Reply to this email directly or view it on GitHub
#158 (comment)
.
Understood, I missed the point C-style interface. So we will stick to the 1 API design + your suggestions about memory allocations (clsparse::vector). Thanks for clarifying.
@kvaragan I looks like @jpola answered the majority of your questions, but to close this out I think we need to ship Beta2 with the 1 API version. It's not optimal, but its easy to document and use. I think a good approach to use is to try to simplify the guts of the SpM-SpM routine first (starting with the recommendations above), and once the current codepaths are simpler (possibly leveraging boost.compute) it may be easier to refactor the memory allocations in a cleaner way.
In the ideal world (I like Jakub's proposed names), the *Analyze function would 'simulate' what would happen end to end in the SpM-SpM call, and can record/store all required temp buffers needed for the algorithm. The user would then be free to allocate those buffers as they see fit and put the buffer handles into the spmminfo structure to be passed into the *spmm compute routine.
Everything included in or included by clsparse.h should be extern C
. We can use C++ inside the library and user can use c++ in the application. Users are able to use boost.compute on their own if they wish, but they need to pass handles into the library. As a future enhancement request, a c++ wrapper could be provided for the library to provide convenience objects and functions.