WizardMac/ReadStat

Incompatibility of SAS v8 XPORT files with SAS v9.0401M8

Opened this issue · 0 comments

Issue:

When creating a XPORT file using 'write_export' with option 'file_format_version=8', the resulting file is not correctly read in by SAS v9.0401M8. All filename and variable name information is correctly read into SAS but SAS reports 0 observations. This issue does not occur when using 'file_format_version=5'.

Identified Cause:

Using the same source SAS dataset, I created a v8 XPORT file using readstat and SAS v9.0401M8. There was only a single (meaningful) difference between the files which was in the Observation header. The SAS version includes an observation count in the header, similar to how NAMESTR and LABEL headers include variable counts. This is not spelled out clearly in the SAS documentation at https://support.sas.com/content/dam/SAS/support/en/technical-papers/record-layout-of-a-sas-version-8-or-9-data-set-in-sas-transport-format.pdf.

The description on Page 7 for the header simply shows:

HEADER RECORD*******OBSV8 HEADER RECORD!!!!!!!000000000000000000000000000000

However, examples on Page 8 and following show:

HEADER RECORD*******OBSV8 HEADER RECORD!!!!!!! 1

Where '1' is the number of observations in the example. I have verified this using SAS datasets containing 3, 1_000_000 (more than 5 decimal digits to store), and 10_000_000_000 (more than 10 decimal digits to store). Based on this, I believe that the 15 digits following the '!!!' are used to store the number of observations. Example hexdump output for the 10_000_000_000 observation XPORT file create by SAS is provided below.

000003c0  48 45 41 44 45 52 20 52  45 43 4f 52 44 2a 2a 2a  |HEADER RECORD***|
000003d0  2a 2a 2a 2a 4f 42 53 56  38 20 20 20 48 45 41 44  |****OBSV8   HEAD|
000003e0  45 52 20 52 45 43 4f 52  44 21 21 21 21 21 21 21  |ER RECORD!!!!!!!|
000003f0  20 20 20 20 31 30 30 30  30 30 30 30 30 30 30 20  |    10000000000 |
00000400  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |

The other item of note is that SAS pads before and after the observation count with spaces (0x20) rather than zeros (0x30). It has seemed to accept either given that front padding with 0s is equivalent numerically. I tested this using the the 3 observation dataset. Simply changing the appropriate '30' to '33' and leaving all other readstat formatting alone allowed SAS to read it properly.

Code location

I believe the row_count variable exists and just needs to be added to the xport_write_obs_header_record function in src/sas/readstat_xport_write.c. It may also be possible to read in the observation count data from the header, when it exists, in the readstat_xport_read module.