asdf-format/asdf

Block checksums are only checked for first block if a block index is present

Closed this issue · 0 comments

When reading, even if validate_checksums is True only the first block checksum is checked.

If I used the following script to generate a file with incorrect block checksums:

import io
import asdf
import numpy

n_arrays = 3
with_block_index = True
buff = io.BytesIO()
arrs = [numpy.zeros(1, dtype='uint8') + i for i in range(n_arrays)]
asdf.AsdfFile({'arrs': arrs}).write_to(buff, include_block_index=with_block_index)

for checksum_index in range(n_arrays):
    fn = f'bad_checksum_{checksum_index}.asdf'
    with open(fn, 'wb') as f:
        print(f"{checksum_index=}")
        buff.seek(0)
        block_offset = 0
        while block_line := buff.readline():
            if len(block_line) > 4 and block_line[:4] == b'\xd3BLK':
                end_offset = buff.tell()
                break
            block_offset = buff.tell()
        buff.seek(0)
        # copy over the tree
        f.write(buff.read(block_offset))
        checksum_offset = (4 + 2 + 4 + 4 + 8 + 8 + 8)
        nbytes_per_block = checksum_offset + 16 + 1
        modification_index = nbytes_per_block * checksum_index + checksum_offset
        values = list(block_line)
        # write an invalid checksum
        values[modification_index:modification_index+16] = b'\1' * 16
        print(bytes(values))
        f.write(bytes(values))
        buff.seek(end_offset)
        f.write(buff.read())

The script will produce 3 files, each with 3 arrays. Each file will have an invalid checksum for one of the 3 arrays. Here is the file with an incorrect last block checksum (please excuse the poor formatting of the binary block contents, but note the A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^ for the checksum of the final block):

#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
  name: asdf, version: 3.0.0.dev265+g0c44742c}
history:
  extensions:
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension._manifest.ManifestExtension
    extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev265+g0c44742c}
  - !core/extension_metadata-1.0.0
    extension_class: asdf.extension.BuiltinExtension
    software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev265+g0c44742c}
arrs:
- !core/ndarray-1.0.0
  source: 0
  datatype: uint8
  byteorder: big
  shape: [1]
- !core/ndarray-1.0.0
  source: 1
  datatype: uint8
  byteorder: big
  shape: [1]
- !core/ndarray-1.0.0
  source: 2
  datatype: uint8
  byteorder: big
  shape: [1]
...
<D3>BLK^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A<93><B8><85><AD><FE>^M<A0><89><CD><F6>4<90>O՟q^@<D3>BLK^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^AU<A5><AD>ESC<A5><89><AA>!^M&)<C1><DF>A^A<D3>BLK^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^B#ASDF BLOCK INDEX
%YAML 1.1
---
- 948
- 1003
- 1058
...

The following test script will load each of the 3 files generated above (trying both lazy and non-lazy loading):

import asdf

n_arrays = 3
for lazy_load in (True, False):
    print(f"{lazy_load=}")
    for index in range(n_arrays):
        fn = f'bad_checksum_{index}.asdf'
        print(f"{fn=}")
        try:
            af = asdf.open(fn, validate_checksums=True, lazy_load=lazy_load)
            s = sum([a.sum() for a in af['arrs']])
            print(f"\tOpened file with no error: {s}")
        except ValueError as e:
            print(f"!!!!Failed to open with {e}")

and outputs the following:

fn='bad_checksum_0.asdf'
!!!!Failed to open with Block at 948 does not match given checksum
fn='bad_checksum_1.asdf'
	Opened file with no error: 3.0
fn='bad_checksum_2.asdf'
	Opened file with no error: 3.0
lazy_load=False
fn='bad_checksum_0.asdf'
!!!!Failed to open with Block at 948 does not match given checksum
fn='bad_checksum_1.asdf'
	Opened file with no error: 3.0
fn='bad_checksum_2.asdf'
	Opened file with no error: 3.0

If with_block_index is set to True in the generation script above, all files fail to open with sensible checksum errors. The above was tested with the current development branch of ASDF but appears to be a bug related to UnloadedBlock (used for files with a block index) which end up calling Block.read without overriding the default validate_checksum=False:

self.read(self._fd)