Support for code page 437
codengine opened this issue · 10 comments
It could theoretically be fixed. The problem is actually perfectly understood.
Like all European releases, SQ4 was written in DOS, with code page 437. If you look at the font resources and compare them to CP437, you'll find it lines up. There's no need for box drawing characters for obvious reasons.
SCI Companion is a Windows application. Specifically, it's an ANSI application, not Unicode. This means it assumes code page 1252 on most systems, which is like the first 256 characters in Unicode but with 32 codepoints worth of punctuation removed.
And that's why, if you look at the font resources, you'll find the ü in "Natürlich" is in position 0x81, matching DOS-437, but it's unassigned in Win-1252, so nothing shows up. In Win-1252 and Unicode, ü would be 0xFC.
I could add a translation step from 437 to 1252 and back for editing, but that would require some way to tell which of the two it already is. Especially when any new game made with SCI Companion would naturally be in Win-1252. Or even UTF-8 if you target my SCI11+. Which is also one reason why I have a bunch of font resources mapped to Win-1252 on my server.
The font resources don't care either way which symbol goes where, regardless.
One SUPER SILLY idea would be to make the message grid and text entry fields use the actual font resources (just #0?) but then you still wouldn't be able to type accented characters.
I used TraDuSCI for translations and it manages it by converting the messages to windows-1252 first and back to 437 during saving.
Maybe you could add a prompt when a game is opened for the first time, or an entry in the game.ini that tells which charset to use (default windows-1252 for example).
I am pretty sure that it is always going to be 437 for old games in the original resources, even for german versions. As far as I understood it, in the messages and texts I can use 437 just fine. In contrast, in the source files I have to use the characters as they are defined in the fonts, like \9E.
I see you had basically the same idea as I have.
I was thinking of a dropdown list, value saved in game.ini, to say either DOS-437 or Windows-1252. If the former, script (de)compilation converts string literals to/from Win-1252 to DOS-437. Text and message editors convert on load/save. If it's the latter, don't do anything, the game codepage and the program already agree.
If it's one of the Japanese versions (except for SQ4 which has a bespoke character set), set it to Win-1252 to skip translation and run SCI Companion with AppLocale to make it use Shift-JIS instead of Win-1252.
Now, I don't like MFC, at all, so I was thinking of bringing back the script language dropdown and turning that into a codepage dropdown, since SCI Studio dialect is quite not supposed to be used.
As far as I understood it, in the messages and texts I can use 437 just fine.
You understand incorrectly, I think. What you're seeing in messages and text is DOS-437 data misinterpreted as Win-1252. If you replaced that Natrlich with Natürlich, saved, and ran, you'd see... Natrlich again, or perhaps Natßrlich, depending on how the engine handles the font resource only going up to 0xE1 ß, when you put in an 0xFC. Which is ü in Win-1252.
With the above change, you'd see Natürlich in the text/message editor, in Win-1252 with a 0xFC, but it'd save Natürlich in DOS-437 with an 0x81.
Btw. I just wanted to leave a great compliment that you have taken over the maintainance over this project.
I'm currently using it for a german translation of Space Quest 4 - Thats where this "issue" originates from. Probably there will be more of those in the future.
Status update: the dropdown box thing is handled, and the choice is correctly recorded in game.ini. Newly created games default to Win-1252 (a game.ini is made on the spot) and opening a game defaults to DOS-437 (because there is no game.ini or it has no Codepage value).
Next up: writing a translation function that exits quickly when the game's codepage is set to Win-1252, so there's something to call.
Awesome, you cant imagine how much of a relief it will be.... not having to export the msg in order to edit it without messing up the umlauts.
I've been stuck offline for a bit, spent that time tracing through the decompilation process to find a good spot to put these translation calls.
It seems to work on SCI 1.1, with the separate heap resources. It doesn't work right yet on the older format, where the strings are stored in the script resource itself.
Just a quick update.
Very nice, big thank you! ❤️ I'm going to test it later today
It works as intended, thanks again.


