Implement CSR hierarchy
JohanMabille opened this issue · 7 comments
The CSR hierarchy should implement sparse tensors (N-D) with CSR format. This should follow the same pattern as xmap_container classes:
xcsr_container
is a CRTP base class that gathers common implementation and API of inheriting classesxcsr_array
inherits fromxcsr_container
and implements the case where the number of dimensions is dynamicxcsr_tensor
inherits fromxcsr_container
and implements the case where the number of dimensions is static
I'm ready to take this on. However, I need some explanation of the current code structure
@tdegeus as @JohanMabille said the xcsr_container
implements all the common methods which means the size
, dimension
, ... and the reshape
, resize
and access operator.
xcsr_array
and xcsr_tensor
define the storage and how to get access. You have to define xtensor_inner_types
to describe the types used for both of them.
So I think it's a copy-paste of xmap_container
with the access operator end the `update_entries modified.
@JohanMabille adds the xsparse_reference
to be able to return 0 if the index doesn't exist in the matrix. We can modify this value and if we assign this value to the same index, then we create an entry. In this way, you don't have to create a zero-entry each time you try to access an index.
@tdegeus I don't know if you had time to make progress in the implementation of CSR format. I wrote a data structure to be able to manage the COO and CSR (and CSF for ND) at the same time. You can see the code here
https://gist.github.com/gouarin/f5f3952ff494b326e77e1af01e3219c3
Finally, it will be easier to split the implementation for COO (where pos vector doesn't exist) and for CSR/CSF (when pos vector exists) and to implement find_element
, inseert_element
and remove_element
for each of them.
It's just an idea. Feel free to comment or add your own implementation.
@gouarin Nice. Can you explain your implementation a bit? In particular, what are pos
and coord
?
A different point of discussion is that I'm a bit worried about just created an n-d array. I think there should maybe be a dedicated 2-d matrix as it would be easier to interface with solvers if the storage is classically in three contiguous arrays.
Also because many of the storage strategies are there to do efficient linear algebra, much more than saving on storage.
In more broader sense I am having difficulties understanding the current implementation, it would help if you could explain a bit, before I start 'reinventing the wheel'. E.g. what should be the relation between a specific implementation (e.g. CSR) and xmap_container
?
The pos
and coord
arrays have exactly the same rule given in this article mentioned by @JohanMabille in this issue #21. And the examples given in the main function are those that you find in the article for the 2d and 3d cases.
If I resume, the COO format only needs the size of the coordinate arrays for each dimension and loop over this coord
array to find if the entry exists. For the CSR or CSF format you need an extra array called pos
which gives you for an element i
what are the segment in coord
corresponding to this element. It's generally called row_ptr
for CSR and the array col
is the coord
array. But this data structure is easily recursive and can manage N-D tensors.
These two implementations should work with third-party software since you have the right information. For example, for the CSR: pos[1] = row_ptr
and coord[1]=col
.
The only relation between CSR and xmap_conatiner
is the way to manage the shape and the strides to update the index entries when you make a reshape. But they don't have the same storage to store the index in each dimension and the non zero values.
Hope that I help you to better understand my implementation.
I think there should maybe be a dedicated 2-d matrix as it would be easier to interface with solvers if the storage is classically in three contiguous arrays.
If the solvers work with the storage itself rather than with APIs of the storage, then 2-D matrix should indeed be a specialization of the N-D case so it's easy to interface it with existing solvers.