MEGA65/mega65-user-guide

PETSCII $60...$7F and $E0-$FE are undefined and MUST NOT be listed

Closed this issue · 12 comments

Describe where we can find the problematic topic
The appendix appendix-petsciicodes.tex

Describe the solution you'd like
The table should list the defined values only and not the undefined values.
Currently, the list includes the ranges $60....$7F (96...127) and lacks $C0...$DF (192...223).

I'm quoting from Discord:

21:12 ] dddaaannn: Does anyone have an informed opinion about when an ASCII-to-PETSCII conversion of an uppercase letter should use the $C1-$DA PETSCII range vs. the $61-$7A range? It seems like a bunch of tools (e.g. cross assemblers) prefer the higher range but I can't figure out why, or if it matters.
[22:57] Rhialto: Because it is based on older, uppercase-only ASCII, PETSCII is undefined in the range where ASCII has its lower case letters. And those values +128.
[22:59] Rhialto: The ROM will give you the values with asc("A") -> 193
[23:01] Rhialto: Do not rely on print chr$($61) because print chr$($c1) will seemingly give you the same. If you try both on a PET (with decimal values instead of hex of course), both print statements will give you different output.
[23:02] Rhialto: The Wikipedia page for PETSCII gets this spectacularly wrong. See the Discussion sub-page with already very old comments from yours truely.

To expand on this a bit:
Valid ranges are

  • 00-1F: control codes (0-31)
  • 20-3F: punctuation and digits (32-63)
  • 40-5F: letters (uppercase or lowercase) (64-95)
  • undefined: 60-7F (96-127)
  • 80-9F: control codes (128-159)
  • A0-BF: graphic characters corresponding to shifted (+128) punctuation and digits (160-191)
  • C0-DF except DE/222: graphics or uppercase letters (192-223)
  • undefined: E0-FE (224-254)
  • FF: pi, based on screen codes it would be expected to be shift-^ -> 222

The mistake many people, seemingly including some who wrote the Wikipedia page, is to look at what you get if you use PRINT CHR$(xx). This is not the right approach, since there are undefined ranges, and what you get from that is inconclusive. Try PRINT CHR$(97) on the PET (VICE is fine) and on the VIC/64/65. The PET gives you !. The later ones print A (or the shift-A spade graphic character). On the other hand, PRINT ASC("shift-A") will give you 193 and PRINT ASC("!") gives 33. That is the approach that should be used to determine the correspondence between PETSCII values and characters. The PET rom code path to convert PETSCII to screen codes was apparently changed enough for the VIC-20 that the effective result for the undefined ranges changed.

A strange exception is character $FF which is pi. This is explicitly handled separately in the ROMs.

One conclusion one can draw from this is that if we need more control characters, we could use the undefined ranges. Nobody should be using those...

A clear and concise description of what you want to happen.

Change the table accordingly, and the text below it as well.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

#605 attempts to improve this but it is untested (I didn't set up local formatting)

The weirdness with pi (keyboard produces 222, but when printed to the screen and re-converted back to petscii you get 255) stems already from the PET-2001, the first ROM version.... It is weird but "not a bug".

Can you cite a resource that describes ranges of PETSCII codes as undefined? (I don't see Wikipedia talk page comments on this subject, but maybe I'm looking in the wrong place?)

I can respect that these ranges may not have been intended to be defined. Much like how people code against undefined behaviors of the C64 ROM, there are decades of precedent where CHR$(x) has had consistent behavior for undefined ranges. If Commodore were alive today, they'd have a hard time reversing this precedent, despite a documented warning against use of these character ranges. (Of course, the C65 pushed ahead with a few new definitions for the upper function keys and the like.)

Technically, we're in a position to declare behaviors of the MEGA65 ROM as defined or undefined, even if they were widely used on the C64. Out of necessity, I describe ROM internals as undefined because we're still improving the ROM and internals may shift between versions. I routinely have to disappoint someone when they admit to creating a dependency on an internal memory location that they learned on the C64 and happens to work in a given version of the MEGA65 ROM, but is not formally documented or supported. I'm concerned that trying to declare ranges of PETSCII codes as undefined in the MEGA65 ROM would generate bad feelings in the community for very little gain. We would be more likely to consider declaring the current CHR$() behavior as official for all 256 code points, citing C65 compatibility as the primary target.

One thing I think we might consider is adopting more non-English letters in the official default character set. Given how many German users there are, it seems like a miss to not have German letters in the default set. You probably know better than I do how that was implemented on German Commodores. If the undefined PETSCII ranges were involved, that might be a compelling use.

... Michael Stiel's explorer is informative with regards to the non-English PETSCII implementations: https://www.pagetable.com/c64ref/charset/

If we adopted the "undefined" layout you suggest, then technically we could make $ED-$FE match the German C128 charset. German also changes $00 and $40 to ß, which I don't think we should do in the default charset. The German lettering use case is probably best met with a custom charset, which is trivial for programs to provide. We could provide a built-in but non-default FONT D that matches the German C128.

The wikipedia discussion page is at https://en.wikipedia.org/wiki/Talk:PETSCII.

As it turns out, finding a Commodore document that explicitly says these ranges are undefined is kind of hard.

I tried going back to the very first manual,
https://www.zimmers.net/anonftp/pub/cbm/manuals/pet/PET_User_Manual_(2001-8).pdf page 7 and 8. (All page numbers as numbered on the page)
However it has a stubborn tendency to list a normal ASCII table instead of PETSCII. (Aside: On page 11 it even claims you get lower case characters if you don't use shift... which was totally not true on the 2001, even in lowercase mode. I suspect the documentation team was working with outdated information.)

That sloppy tendency seems to recur in other manuals too, even combining ASCII with screen codes in one table, which makes totally no sense (PETSCII has no { or } characters). That is in https://www.zimmers.net/anonftp/pub/cbm/manuals/pet/bedienungshandbuch_cbm_2001_3001.pdf page 38. PDF page 32 of https://www.zimmers.net/anonftp/pub/cbm/manuals/pet/C=GermanTechDocs.pdf lists all values 0-255, including PET-style duplicates.

A better table to argue that the values are undefined is https://www.zimmers.net/anonftp/pub/cbm/manuals/pet/bedienungshandbuch_cbm_4032.pdf page 44. This table lists 0-95 and 128-223 only, and leaves the rest unmentioned. (It has however another confusion in the characters 91-95/219-223, where it swaps the upper/lowercase versions when switching between text mode and graphic mode).

4032-table

What I would find a very good argument for the undefinedness of 96-127 and 224-254 is that Commodore actually changed the mapping when printing them. I made some screenshots from a PET and a 64, but the change occurred first on the VIC-20 (CBM-II behaves like the PET).

vice-screen-2024082720074084 vice-screen-2024082720084856

It could be that Commodore changed the mapping on purpose, and that the new version is supposed to be defined. Maybe to get closer to outputting ASCII. But that doesn't really work since then still upper case and lower case are swapped. In any case, https://archive.org/details/c64-programmer-ref/page/n401/mode/2up lists the characters like the M-65 table did. I guess it was copied from there initially.

The undefinedness would also be supported by the fact that you don't get the values 96-127 or 224-255 from the keyboard.

It's all a bit indirect "proof", I'm afraid :-(

But I would strongly argue that even if we don't call these ranges "undefined", at least we leave them out of the table, and mention separately that they are duplicates, simply because they are not the canonical values. That is essentially the change in my merge request.

I kind of like the idea to use these ranges for extra printable characters (such as German ones, but others would work too). It would be less surprising for people who use these values unintentionally, since they have always been printable, and not control characters.

The way German characters were done on the PET/64 etc was, I think, by replacing some other characters, typically [ ] and @ (maybe more).

If I could find an online version of the PET editor disassembly, I could find and show the place where the shift key adds 128 to the PETSCII code. Any table that doesn't have this exact difference of 128 between the list of lower and higher values is misleading for that reason too. The original keyboard here or here shows this nicely.

I'm on board with making it clear in our documentation which PETSCII values are return values from BASIC GET/GETKEY, KERNAL BASIN, and hardware PETSCIIKEY, and describe how the others display redundant glyphs when printed.

I'm not on board with declaring code points as undefined, as if we might change their printing behavior in a future ROM. I believe it would provide too little value and cause too much grief to change them, or even just to declare them possible to change. If there's a fun historical note, we can include it, but they should continue to print what they currently print as officially supported behavior.

For fun, here are the German C128 "DIN" charsets as rendered by a MEGA65. Now that you have me thinking about it, I'm wondering about maybe adding this (or something like it) in as an alternate FONT D. :)

c1
c2

For more fun, here's my write-up of PETSCII and the PET typeface from a while ago: https://dansanderson.com/mega65/petscii-codes/

Not saying the values are undefined, is fine with me.

I found a disassembly of one version of the PET editor ROM ($E000-$E7FF, it's the one that exists in the most different versions).

http://www.zimmers.net/anonftp/pub/cbm/firmware/computers/pet/edit-4-40-n-60hz-901499-01.dis.txt
Look for label FDX.
$E4F0: A9 01 LDA #$01
$E4F2: 85 98 STA SFDX ; Flag: Print Shifted Chars.
sets the flag when the keyboard decode table contains a 0 (for shift)
Later down it is used:

$E563: 46 98     LSR SFDX   	; Flag: Print Shifted Chars.
$E565: 90 13     BCC L_E57A
$E567: EA        NOP
$E568: EA        NOP
$E569: EA        NOP
$E56A: EA        NOP
$E56B: EA        NOP
$E56C: EA        NOP
$E56D: EA        NOP
$E56E: EA        NOP
$E56F: EA        NOP
$E570: EA        NOP
$E571: EA        NOP
$E572: EA        NOP
$E573: EA        NOP
$E574: EA        NOP
$E575: EA        NOP
$E576: EA        NOP
$E577: EA        NOP
$E578: 09 80     ORA #$80
L_E57A:

I attached a proposed revision to the pull request, let me know what you think.

@dansanderson That looks nice!

Implemented here: 815db62

Thanks very much!