Question regarding the API

I'm writing a go-wrapper for this implementation; BTW the C wrapper makes that really convenient, thanks for that!

I struggle to completely understand the API though. What I'm trying to do is quite basic: given n original data buffers or symbols, I want to generate n "parity symbols" s.t. that I can loose any n (of the now 2*n total) symbols and still recover the data.

I mainly experimented with the API in banchmarks.cpp: OK, sounds like I would need to call leo_encode with original_count == recovery_count for that, right? leo_encode_work_count returns 2*original_count for these params and calling leo_encode yields encode_work_data with the size 2*original_count as expected. So far so good. What is surprising though: I would have expected encode_work_data to contain the original data too instead it does seem to contain 2*original_count recovery or parity symbols. So is there no way I can achieve what described above using the public C API or am I using recovery_count wrong?

Thanks in advance :-)

Looking through the paper that describes the encode and ReedSolomonEncode it seems like the code reads mostly like a straightforward implementation of this:

Source: https://www.citi.sinica.edu.tw/papers/whc/3450-F.pdf

So with that notation, n=2k and I want to hand over k message symbols and get k parity symbols.

What would be the correct parameters for that? As described above leo_encode returns 2k symbols instead (which do not clearly contain the original symbols, at least not in a non-encoded form).

I think we (or mainly @musalbas) figured it out: if the number k is <= 256, leopard behaves exactly as described above. If k> 256 then the recovery bytes will be 2k instead which seems to be a property of ff16 ...

Well, actually this is not the case (this is independent from ff8 or ff16):

leo_encode_work_count always multiplies the next power of 2 of recovery count by 2:

leopard/leopard.cpp

Line 102 in b58d1ea

return leopard::NextPow2(recovery_count) * 2;

So e.g. in the case of orig_count == recovery_count == some power of two, this means that half the arrays in encode_work won't be used (the 2nd half will always be just all buffer_bytes sized arrays that are all zeroes). It seems superfluous to always double the data here. Is there any particular reason for this? The encode method also requires this behaviour while there are edge cases (e.g. work_count < 2*recovery_count && work_count == m), where work_count doesn't need to be 2*m):

leopard/leopard.cpp

Lines 162 to 166 in b58d1ea

    
           const unsigned m = leopard::NextPow2(recovery_count); 
        
           const unsigned n = leopard::NextPow2(m + original_count); 
        
           if (work_count != m * 2) 
        
               return Leopard_InvalidCounts;

The workaround for my use-case was dropping the superfluous chunks (the 2nd half) on encode and decode.

Sorry just noticed your work here. I hope you were able to get through the issue

	const unsigned m = leopard::NextPow2(recovery_count);
	const unsigned n = leopard::NextPow2(m + original_count);

	if (work_count != m * 2)
	return Leopard_InvalidCounts;