`basictool` strips embedded 0 bytes
Closed this issue · 10 comments
See attached, which has a 00
byte embedded in a REM. Judging by *SPOOL on my real Master the 00
bytes come through when LISTed.
I had a look at fixing this, but it was looking like more than a 10-minute job.
b559f8e912696dbb7a61c45a8e26d28e8b3d5130ad05b6fcfcada3c16ea37088.zip
I suspect most uses of this will be people making programs that are annoying to list. If you don't want to fix this, I think basictool should at least issue a warning or error if it finds itself in this situation, as the output isn't going to match what you'd get from a real Beeb.
Thanks Tom, I had been anticipating this bug report coming through. :-)
I've had a go at fixing it on https://github.com/ZornsLemma/basictool/tree/issue-11, please give that a try and let me know how you get on. The pending output is still tracked internally as a mostly C-style string, but we do track its length and for --ascii output via LIST only we allow NULs into the pending output, and when we output it we use the tracked length instead of the NUL terminator.
This branch also includes the issue 4 fix, by the way.
Yes, that fixed it. Thanks!
This fixed the majority of the discrepancies between BBCBasicToText and basictool --ascii. The 10 cases then remaining were due to BBCBasicToText not matching the LIST logic for detecting line endings, which I've now fixed.
I've put the script I've been using on GitHub here: https://github.com/tom-seddon/beeb/tree/master/tests/BBCBasic
It's a bit off topic, but, for the record, there were 2 issues with BBCBasicToText not matching LIST, and here are 2 interesting examples:
Embedded 0d
in line - see CITMENU
in http://bbcmicro.co.uk/explore.php?id=1265. Line 80 has an embedded CR. And it seems to run correctly, so I guess max_choice%
does get set even though it's on the same line as a star command.
Embedded end-of-program marker - see HANGMN2
in http://bbcmicro.co.uk/explore.php?id=1357. One of the initial REMs has an 0d db
in it, and LIST just stops. It seems to run fine though!
That HANGMN2 example is weird!
Am I missing the point with CITMENU? When I load it into an emulated (BBC B) machine, the carriage return appears to introduce a line 85, rather than line 80 actually having an embedded CR. Here's a snippet from basictool's output, though this seems to match what I see on the emulated machine with LIST (it's just that I can't copy and paste from the emulated screen):
80*TuneObj
85max_choice%=3:REM
90REPEAT
Line 80 starts with 0d 00 50 20
, so it's supposedly 32 bytes long! Viewed that way, it's got an embedded CR.
On the other hand, maybe the line length is wrong, and it's just coincidence that it works and isn't a bad program...
(My copy is slightly different for some reason, and the extra line is line 110. RENUMBER ignores it.)
Thanks, that makes sense now still doesn't make sense as such, but I see what you're pointing out. :-) It's almost as if BASIC sometimes uses the line length byte and sometimes just iterates through until it hits a CR. I wonder if this was done deliberately?! I suppose there's no real way to know.
I assume it's just whatever turned out to be convenient while writing the BASIC interpreter! - BBCBasicToText.py can now work through the code both ways, either following the line lengths or just looking for CRs as it goes.
I guess that makes sense, no point making the interpreter code slower and/or longer in a quest for perfect consistency on technically broken input after all.
Thanks for closing these issues off, I've tagged up a v0.10-pre3 and posted to stardot about it.