Check potential bugs in compressed_files.py in lasse-py when b=1 and 3 bits
Opened this issue · 2 comments
I tested it and worked in most cases. It is not working for n=1 and 3 in the code below:
#it works for n=2, 4, 5 and 6 and 7
but it does not work for n=1 and 3
n = 7 # Minimum number of bits to represent the numbers, n<8
x = randint(low=0, high=2**n, size=100, dtype=np.uint8)
compressed = compact_bytes(x, n)
filename = "compressed.bin"
compressed.tofile(filename)
uncompressed = decompact_bytes(compressed, n)
filename = "uncompressed.bin"
uncompressed.tofile(filename)
uncompressed2 = np.fromfile(filename, dtype=np.uint8, count=-1)
I had debugged this code and the problem seems to be in the logic to figure out the quantities of the numbers from original list.
def decompact_bytes(input_array, num_bits):
if num_bits >= 8:
raise ValueError("This function is meant to work with less than 8 bits!")
output_arr_len = (
len(input_array)
* 8 # Figure out how many numbers were on the original array, which should be the uncompressed output
) // num_bits
I quick example:
When we use 1 as the number of the bits, and the length of the compress array is 13, when this length is multiplied by 8 and divide by 1, the result of this evaluation is a number greater than 100. Remembering, the input array generated has length equal 100. So, the output array size won't be equal to the original.
(input_array * 8)/1 > 100
Naturally, the same problem occurs when num_bits
is equal 3, but the output length is smaller than the original.
For other cases, the result is close
to 100(with approximation we get the 100) or exactly 100.
@claudio966 That's right, I have encountered the same. I could reproduce the error for arrays with length N*lmc - 1
, where lcm is the least commom multiple of num_bits
and 8.
I could "fix" this problem by pre-calculating the error in that formula, saving it in the first byte of the array and using that info to correct the formula's output. Since this takes up more space I haven't made any pull request yet, but couldn't think of another way of consistently correcting the error.