Safety implications of unsorted indices
mulimoen opened this issue · 5 comments
This is a discussion issue for how to handle unsorted indices in arrays and matrices. The situation as it is today is relaxed, we try to avoid it, having unsorted indices won't cause safety issues, but might produce unexpected results/panics in some algorithms.
The primary motivation for bringing this up is to allow/disallow future optimisations to take advantage of the sorted property, and how the library should behave when these invariants are broken.
Should we formalise this behaviour and document it or should we disallow creation of unsorted structures?
As you pointed out, there are two separate concerns here:
- actual memory safety. We want to be able to rely on
CsMat
invariants to accelerate the more critical algorithms with eg unchecked array accesses. For the moment, I have not seen an algorithm that relies on the sorted indices property to be memory safe, the important property is having the indices in bounds. - correctness. Some algorithms rely on the sorted property to work, for instance finding an existing non-zero using dichotomic search.
- performance. I've always assumed the sorted property could help some algorithms to perform better, but I'm not sure I've seen
that many algorithms.
Here are the possible ways forward in my opinion, including the options you mentioned:
- Allow unsorted indices, but document algorithms that won't work in that case. Pros: less checks when creating a sparse matrix. Cons: unexpected failures, hard to debug.
- Disallow the creation of unsorted structures. Pros: this is the closest to the current situation, except bugs like #231. Cons: makes the library less flexible, adds some runtime checks that can be costly. Though as you showed in #228 it is way less expensive to check than to sort (O(n) vs O(nln(n)), and we already need O(n) checks for safety).
- Track the sorted state, either dynamically, or statically (#16). Pros: maximum flexibility. Cons: less ergonomic library, particularly if we use the type system to track this state. Maybe adding an
has_sorted_indices: bool
field to the matrix would be enough. Then algorithms that need sorted indices would be able to fail, or sort, when they get unsorted indices.
Do you see other alternatives?
The usage of unsafe
in this crate is minimal, which is great for memory safety. We should not have problems with this, as long as all indices are in bounds when passing to external bindings.
The sorted indices are very nice for so many algorithms, dot-product, kronecker product, arrays from a matrix slice*, more efficient use of cache.
I think option 2 would be preferable, as most algorithms that works on sorted indices will probably also produce sorted indices. We can avoid the runtime checks by allowing an unsafe
constructor. This would also be the least amount of breakage and API change as you mention.
In #35 you mention the need for unsorted indices. Has this changed?
[*]: If we introduce a new slicetype with an offset field
The sorted indices are very nice for so many algorithms, dot-product, kronecker product, arrays from a matrix slice*, more efficient use of cache.
I think option 2 would be preferable, as most algorithms that works on sorted indices will probably also produce sorted indices. We can avoid the runtime checks by allowing anunsafe
constructor. This would also be the least amount of breakage and API change as you mention.
Yes I think I agree, the sorted indices is a very convenient property in many algorithms. The unsafe
constructor is the way to go.
In #35 you mention the need for unsorted indices. Has this changed?
There are some decompositions that can only produce unsorted indices as far as I know. But the case is quite rare, and they don't need that many operations, so when these decompositions get implemented we can define a limited sparse matrix type with unsorted indices to help in these decompositions.
It seems to me like some algorithms requiring sorted indices and some not could be handled with a private is_sorted
Field, which is assumed to be correctly set. When creating a matrix from raw, there could be three options:
from_raw
Which checks if the array is sorted and setsis_sorted
accordingly.from_unsorted_raw
which performs no check and setsis_sorted
to falseunsafe from_sorted_raw_unchecked
which performs no check and setsis_sorted
to true
Additionally, a method to sort if needed would be useful. Possibly sorted_view
And sort
Methods?
Just for reference the C++ lib eigen has similar questions: https://gitlab.com/libeigen/eigen/-/issues/694#note_254554694