WizardMac/ReadStat

Error reading SAS data set row with many numeric variables and character compression

mnizol opened this issue · 3 comments

When reading a certain SAS data set using pyreadstat.read_sas7bdat(), which uses ReadStat v1.1.7, I get the following error:

"A row in the file was not the expected length."

The data set in question uses character compression. This may be related to the closed issue #35.

I was able to narrow down the problem to a specific row, which I've attached below after obfuscating variable names and contents (the obfuscated version of the row copied below still exhibits the error).

compression_bug.sas7bdat.zip

@mnizol It will be useful to know if this file is opened successfully by other packages e.g. Python sas7bdat.

Technical notes for myself:

The decompression is tripping up on the control character 0x68, which decodes as a blank insertion of length 256*8 + 17 + (value of next byte), which is exceeding the length of the output buffer. It's not necessarily this control code that is the problem as others also have decompression lengths longer than 256. If I recall correctly there is/was some disagreement about the value of the length multiplier being 256 vs something else (64?).

Likely duplicate of #245

Further notes: both files have unrecognized control codes 1, 2, and 3. These will need investigation.