Block checksums are only checked for first block if a block index is present
Closed this issue · 0 comments
When reading, even if validate_checksums
is True
only the first block checksum is checked.
If I used the following script to generate a file with incorrect block checksums:
import io
import asdf
import numpy
n_arrays = 3
with_block_index = True
buff = io.BytesIO()
arrs = [numpy.zeros(1, dtype='uint8') + i for i in range(n_arrays)]
asdf.AsdfFile({'arrs': arrs}).write_to(buff, include_block_index=with_block_index)
for checksum_index in range(n_arrays):
fn = f'bad_checksum_{checksum_index}.asdf'
with open(fn, 'wb') as f:
print(f"{checksum_index=}")
buff.seek(0)
block_offset = 0
while block_line := buff.readline():
if len(block_line) > 4 and block_line[:4] == b'\xd3BLK':
end_offset = buff.tell()
break
block_offset = buff.tell()
buff.seek(0)
# copy over the tree
f.write(buff.read(block_offset))
checksum_offset = (4 + 2 + 4 + 4 + 8 + 8 + 8)
nbytes_per_block = checksum_offset + 16 + 1
modification_index = nbytes_per_block * checksum_index + checksum_offset
values = list(block_line)
# write an invalid checksum
values[modification_index:modification_index+16] = b'\1' * 16
print(bytes(values))
f.write(bytes(values))
buff.seek(end_offset)
f.write(buff.read())
The script will produce 3 files, each with 3 arrays. Each file will have an invalid checksum for one of the 3 arrays. Here is the file with an incorrect last block checksum (please excuse the poor formatting of the binary block contents, but note the A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^
for the checksum of the final block):
#ASDF 1.0.0
#ASDF_STANDARD 1.5.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.1.0
asdf_library: !core/software-1.0.0 {author: The ASDF Developers, homepage: 'http://github.com/asdf-format/asdf',
name: asdf, version: 3.0.0.dev265+g0c44742c}
history:
extensions:
- !core/extension_metadata-1.0.0
extension_class: asdf.extension._manifest.ManifestExtension
extension_uri: asdf://asdf-format.org/core/extensions/core-1.5.0
software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev265+g0c44742c}
- !core/extension_metadata-1.0.0
extension_class: asdf.extension.BuiltinExtension
software: !core/software-1.0.0 {name: asdf, version: 3.0.0.dev265+g0c44742c}
arrs:
- !core/ndarray-1.0.0
source: 0
datatype: uint8
byteorder: big
shape: [1]
- !core/ndarray-1.0.0
source: 1
datatype: uint8
byteorder: big
shape: [1]
- !core/ndarray-1.0.0
source: 2
datatype: uint8
byteorder: big
shape: [1]
...
<D3>BLK^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A<93><B8><85><AD><FE>^M<A0><89><CD><F6>4<90>O՟q^@<D3>BLK^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^AU<A5><AD>ESC<A5><89><AA>!^M&)<C1><DF>A^A<D3>BLK^@0^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^@^@^@^@^@^@^@^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^A^B#ASDF BLOCK INDEX
%YAML 1.1
---
- 948
- 1003
- 1058
...
The following test script will load each of the 3 files generated above (trying both lazy and non-lazy loading):
import asdf
n_arrays = 3
for lazy_load in (True, False):
print(f"{lazy_load=}")
for index in range(n_arrays):
fn = f'bad_checksum_{index}.asdf'
print(f"{fn=}")
try:
af = asdf.open(fn, validate_checksums=True, lazy_load=lazy_load)
s = sum([a.sum() for a in af['arrs']])
print(f"\tOpened file with no error: {s}")
except ValueError as e:
print(f"!!!!Failed to open with {e}")
and outputs the following:
fn='bad_checksum_0.asdf'
!!!!Failed to open with Block at 948 does not match given checksum
fn='bad_checksum_1.asdf'
Opened file with no error: 3.0
fn='bad_checksum_2.asdf'
Opened file with no error: 3.0
lazy_load=False
fn='bad_checksum_0.asdf'
!!!!Failed to open with Block at 948 does not match given checksum
fn='bad_checksum_1.asdf'
Opened file with no error: 3.0
fn='bad_checksum_2.asdf'
Opened file with no error: 3.0
If with_block_index
is set to True
in the generation script above, all files fail to open with sensible checksum errors. The above was tested with the current development branch of ASDF but appears to be a bug related to UnloadedBlock
(used for files with a block index) which end up calling Block.read
without overriding the default validate_checksum=False
:
Line 1367 in 587542e