TomHarte/CLK

Changed Apple II Disk II sectors sometimes write to disk image as nulls

ryandesign opened this issue · 6 comments

Using this patch to work around #1218...

--- a/Components/DiskII/DiskII.cpp
+++ b/Components/DiskII/DiskII.cpp
@@ -141,6 +141,10 @@ void DiskII::run_for(const Cycles cycles) {
 }
 
 void DiskII::decide_clocking_preference() {
+	clocking_preference_ = ClockingHint::Preference::RealTime;
+	update_clocking_observer();
+	return;
+
 	ClockingHint::Preference prior_preference = clocking_preference_;
 
 	// If in read mode, clocking is either:

...I can boot an Apple IIe and save some information to disk, for example let's save the ROM:

] BSAVE ROM,A$D000,L$3000

I can then load it back into RAM:

] BLOAD ROM,A$2000

And see that they're the same because this command produces no output:

] CALL -151
* 2000<D000.FFFFV

But if I then close the emulator, which writes the changes to the disk image, and boot up another emulated machine and load it in from disk again:

] BLOAD ROM,A$2000

They're not the same now:

] CALL -151
* 2000<D000.FFFFV

It shows large portions replaced with nulls:

DBFC-85 (00)
DBFD-87 (00)
DBFE-84 (00)
DBFF-88 (00)

...and so on.

I originally reported this to AppleCommander before I realized it was not a bug with reading the disk image; the problem had occurred when writing it.

With a little logging...

--- a/Storage/Disk/Encodings/AppleGCR/SegmentParser.cpp
+++ b/Storage/Disk/Encodings/AppleGCR/SegmentParser.cpp
@@ -239,6 +239,9 @@ std::map<std::size_t, Sector> Storage::Encodings::AppleGCR::sectors_from_segment
 
 	uint_fast8_t shift_register = 0;
 
+	int i = 0;
+	fprintf(stderr, "sectors_from_segment\n");
+
 	const std::size_t scanning_sentinel = std::numeric_limits<std::size_t>::max();
 	std::unique_ptr<Sector> new_sector;
 	std::size_t sector_location = 0;
@@ -317,6 +321,7 @@ std::map<std::size_t, Sector> Storage::Encodings::AppleGCR::sectors_from_segment
 					// Potentially this is a Macintosh sector.
 					auto macintosh_sector = decode_macintosh_sector(had_header ? &header : nullptr, sector);
 					if(macintosh_sector) {
+						fprintf(stderr, "%2d. %5ld: macintosh sector\n", ++i, sector_location);
 						result.insert(std::make_pair(sector_location, std::move(*macintosh_sector)));
 						continue;
 					}
@@ -324,8 +329,12 @@ std::map<std::size_t, Sector> Storage::Encodings::AppleGCR::sectors_from_segment
 					// Apple II then?
 					auto appleii_sector = decode_appleii_sector(had_header ? &header : nullptr, sector, is_five_and_three);
 					if(appleii_sector) {
+						fprintf(stderr, "%2d. %5ld: apple ii sector\n", ++i, sector_location);
 						result.insert(std::make_pair(sector_location, std::move(*appleii_sector)));
+						continue;
 					}
+
+					fprintf(stderr, "%2d. %5ld: UNKNOWN SECTOR <----\n", ++i, sector_location);
 				} else {
 					new_sector->data.push_back(value);
 				}
@@ -341,5 +350,6 @@ std::map<std::size_t, Sector> Storage::Encodings::AppleGCR::sectors_from_segment
 		}
 	}
 
+	fprintf(stderr, "\n");
 	return result;
 }

...I see that the last sector parsed by Storage::Encodings::AppleGCR::sectors_from_segment is often not identified as either a Macintosh or Apple II sector so it's skipped:

sectors_from_segment
 1.     0: apple ii sector
 2.  3111: apple ii sector
 3.  6274: apple ii sector
 4.  9951: apple ii sector
 5. 13116: apple ii sector
 6. 16281: apple ii sector
 7. 19446: apple ii sector
 8. 22611: apple ii sector
 9. 25776: apple ii sector
10. 28941: apple ii sector
11. 32104: apple ii sector
12. 35267: apple ii sector
13. 38430: apple ii sector
14. 41593: apple ii sector
15. 44756: apple ii sector
16. 47919: apple ii sector
17. 51082: UNKNOWN SECTOR <----

sectors_from_segment
 1.   678: apple ii sector
 2.  3841: apple ii sector
 3.  7004: apple ii sector
 4. 10167: apple ii sector
 5. 13330: apple ii sector
 6. 16493: apple ii sector
 7. 19656: apple ii sector
 8. 22820: apple ii sector
 9. 26497: apple ii sector
10. 29661: apple ii sector
11. 32825: apple ii sector
12. 35989: apple ii sector
13. 39153: apple ii sector
14. 42316: apple ii sector
15. 45479: apple ii sector
16. 48642: apple ii sector

sectors_from_segment
 1.  1404: apple ii sector
 2.  4567: apple ii sector
 3.  7730: apple ii sector
 4. 10893: apple ii sector
 5. 14056: apple ii sector
 6. 17219: apple ii sector
 7. 20382: apple ii sector
 8. 23545: apple ii sector
 9. 26708: apple ii sector
10. 29871: apple ii sector
11. 33034: apple ii sector
12. 36197: apple ii sector
13. 39360: apple ii sector
14. 43036: apple ii sector
15. 46199: apple ii sector
16. 49362: UNKNOWN SECTOR <----

sectors_from_segment
 1.  1619: apple ii sector
 2.  4784: apple ii sector
 3.  8461: apple ii sector
 4. 11626: apple ii sector
 5. 14791: apple ii sector
 6. 17955: apple ii sector
 7. 21120: apple ii sector
 8. 24284: apple ii sector
 9. 27448: apple ii sector
10. 30613: apple ii sector
11. 33778: apple ii sector
12. 36942: apple ii sector
13. 40105: apple ii sector
14. 43269: apple ii sector
15. 46434: apple ii sector
16. 49599: apple ii sector

sectors_from_segment
 1.  2346: apple ii sector
 2.  5510: apple ii sector
 3.  8675: apple ii sector
 4. 11840: apple ii sector
 5. 15005: apple ii sector
 6. 18170: apple ii sector
 7. 21335: apple ii sector
 8. 25012: apple ii sector
 9. 28175: apple ii sector
10. 31339: apple ii sector
11. 34503: apple ii sector
12. 37668: apple ii sector
13. 40832: apple ii sector
14. 43996: apple ii sector
15. 47161: apple ii sector
16. 50325: UNKNOWN SECTOR <----

sectors_from_segment
 1.     0: apple ii sector
 2.  3074: apple ii sector
 3.  6237: apple ii sector
 4.  9400: apple ii sector
 5. 12563: apple ii sector
 6. 15726: apple ii sector
 7. 18889: apple ii sector
 8. 22052: apple ii sector
 9. 25215: apple ii sector
10. 28378: apple ii sector
11. 31541: apple ii sector
12. 34704: apple ii sector
13. 37867: apple ii sector
14. 41543: apple ii sector
15. 44706: apple ii sector
16. 47869: apple ii sector
17. 51032: apple ii sector

Understandable that a 17th sector would be skipped if it were garbage, but often it's the 16th sector that can't be detected, and sometimes 17 valid sectors are detected...

The reasons why decode_appleii_sector is returning nullptr are either because the data xor check failed:

// Undo the XOR step on sector contents, then check and discard the checksum.
for(std::size_t c = 1; c < sector->data.size(); ++c) {
sector->data[c] ^= sector->data[c-1];
}
if(sector->data.back()) return nullptr;

or the size check failed:

// There must be at least 411 bytes to decode a five-and-three sector from;
// there must be only 343 if this is a six-and-two sector.
const size_t data_size = is_five_and_three ? 411 : 343;
if(original->data.size() < data_size) return nullptr;

Both are because the data coming in to Storage::Encodings::AppleGCR::sectors_from_segment is corrupted (due to a missing bit or bits?).

Printing out the values that come in, I have seen data like this:

96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 9d b4 b4 b4 b4 b4 b4 b4
96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 b4 b4 b4 b4 b4 b4 b4 b4 b4 b4 b4 b4
96 96 96 96 96 96 96 96 96 96 96 96 96 ad b4 b4 b4 b4 b4 b4 b4 b4 b4 b4 b4 b4 b4
96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 95

These should all be 96's.

When it fails due to the xor check (first three examples), it's because the lost bit(s) still resulted in a valid six-and-two byte (9d, ad, b4).

When it fails due to the size check (fourth example), the bit corruption resulted in a byte that's not valid for six-and-two (95) so it aborted early.

The bit corruption happens at the point when bit % segment.data.size() rolls over:

shift_register = uint_fast8_t((shift_register << 1) | (segment.data[bit % segment.data.size()] ? 1 : 0));

Now that you've bothered to do the research, it strikes me that data validity across rollover is an incredibly under-tested part of this emulator; most platforms align disk output to the index hole, to which a writeable segment will be aligned, and therefore don't try to write anything meaningful across the fold.

I don't immediately know what the answer is, however.

I'm now using a DOS disk image that just initializes track 15 of the current disk on startup, using Pieter Lechner's "INIT - reformat a single track" program from Beneath Apple DOS (pages A-12 through A-15):

1DATA32,227,3,132,0,133,1,165,2,160,4,145,0,169,0,160,12,145,0,169,0,160,3,145,0,32,227,3,32,217,3,189,137,192,165,2,133,68,165,3,133,65,169,170,133,62,169,40,133
2DATA69,160,86,169,0,153,255,187,136,208,250,153,0,187,136,208,250,32,13,191,169,8,176,25,169,48,141,120,5,56,206,120,5,240,14,32,68,185,176,245,165,45,208,241,32
3DATA220,184,144,31,160,13,145,0,169,135,32,237,253,169,45,32,237,253,169,195,32,237,253,169,189,32,237,253,160,13,177,0,32,218,253,189,136,192,169,0,133,72,96
4FORI=768TO904:READJ:POKEI,J:NEXT
5T=15:V=PEEK(47)
6?"INITIALIZING TRACK "T" VOLUME "V
7POKE2,T:POKE3,V:CALL768

This is quicker and easier to debug than initializing the whole disk because I only have to wait for and analyze the data of a single track, and I don't have to swap in a blank disk.

To prepare such a disk, boot another emulator like OpenEmulator with DOS 3.3, then insert a blank disk image, paste in the above program, and initialize the disk to run that program at startup using:

INIT HELLO

After closing the other emulator, the disk image can be used to boot up Clock Signal, and the program can be rerun without rebooting with:

RUN 5

The data corruption is already present in the data that is sent to Storage::Encodings::AppleGCR::sectors_from_segment which was created by Storage::Disk::track_serialisation and always seems to consist of a single missing bit either at the beginning or end of the track serialization data, depending on how you want to look at it. It does not seem to make a difference whether that bit is a 0 or a 1.


I have also now seen the problem occur due to a header checksum mismatch here:

// Check the header checksum.
const uint_fast8_t checksum = (((*header)[6] << 1) | 1) & (*header)[7];
if(checksum != (sector->address.volume^sector->address.track^sector->address.sector)) return nullptr;

It's the same root cause, the missing bit just occurred within the address field this time.

When no error occurs, it's because the missing bit occurred in a sync byte, from which the disk system is designed to recover since some sync bytes are expected to be partially overwritten during disk writes.

always seems to consist of a single missing bit either at the beginning or end of the track serialization data, depending on how you want to look at it

If non-Apple systems use the index hole for synchronization and are happy with the data Clock Signal gives them following the index hole, then I guess the missing bit has to be considered to be at the end of the track serialization data.

And of course no sooner have I posted the above analysis than I find a case where everything works fine and there is no missing bit. This happens the very first time I boot the disk after starting Clock Signal. Retrying with RUN 5, there is a missing bit. Retrying with PR#6, there is sometimes a missing bit. Closing the entire emulated machine and starting a new one, there is a missing bit. Only quitting Clock Signal and relaunching it reliably returns me to not having a missing bit. This makes me think there are global or static variables involved in the disk system that are not being reinitialized correctly?