Check potential bugs in compressed_files.py in lasse-py when b=1 and 3 bits

Question

Check potential bugs in compressed_files.py in lasse-py when b=1 and 3 bits

Opened this issue 2 years ago · 2 comments

I tested it and worked in most cases. It is not working for n=1 and 3 in the code below:

#it works for n=2, 4, 5 and 6 and 7

but it does not work for n=1 and 3

n = 7 # Minimum number of bits to represent the numbers, n<8
x = randint(low=0, high=2**n, size=100, dtype=np.uint8)
compressed = compact_bytes(x, n)
filename = "compressed.bin"
compressed.tofile(filename)
uncompressed = decompact_bytes(compressed, n)
filename = "uncompressed.bin"
uncompressed.tofile(filename)
uncompressed2 = np.fromfile(filename, dtype=np.uint8, count=-1)

Answer 1 · 2023-04-10T12:25:49.000Z

I had debugged this code and the problem seems to be in the logic to figure out the quantities of the numbers from original list.

def decompact_bytes(input_array, num_bits):
    if num_bits >= 8:
        raise ValueError("This function is meant to work with less than 8 bits!")

    output_arr_len = (
        len(input_array)
        * 8  # Figure out how many numbers were on the original array, which should be the uncompressed output
    ) // num_bits

I quick example:
When we use 1 as the number of the bits, and the length of the compress array is 13, when this length is multiplied by 8 and divide by 1, the result of this evaluation is a number greater than 100. Remembering, the input array generated has length equal 100. So, the output array size won't be equal to the original.

(input_array * 8)/1 > 100

Naturally, the same problem occurs when num_bits is equal 3, but the output length is smaller than the original.

For other cases, the result is close to 100(with approximation we get the 100) or exactly 100.

Answer 2 · 2023-04-11T10:59:39.000Z

@claudio966 That's right, I have encountered the same. I could reproduce the error for arrays with length N*lmc - 1, where lcm is the least commom multiple of num_bits and 8. I could "fix" this problem by pre-calculating the error in that formula, saving it in the first byte of the array and using that info to correct the formula's output. Since this takes up more space I haven't made any pull request yet, but couldn't think of another way of consistently correcting the error.