BGforgeNet/Fallout2_Unofficial_Patch

Convert to ascii?

Closed this issue · 8 comments

The original text uses different ways to make the same punctuation. For example, "…" (single character) vs "..." (3 dots). It'd be good to standartize it.

However, this is not just punctuation, there are other non-ascii characters:

{325}{}{(Shakes his head slightly.) Your naiveté has…no place here. Words are not

I'm not sure if they are displayed in the game correctly, need to check. If they are, it'd be a waste to drop them.

It seems both are displayed correctly in the game.
Imgur

I guess it's better to keep the letters, as for punctuation - I'm not sure.
Main variations are

  • … vs ...
  • ’ vs '
  • – vs -

There were some discussions about punctuation on NMA, but didn't reach a conclusion. Personally I prefer using ASCII variants if possible (except dash). Maybe use the vanilla (1.02d) text as the base to check if single ellipsis character, typographic apostrophe/quotation marks, etc. or simple typewriter/keyboard variants are used more, and standardize the rest with them (like simple majority votes)?

I think the inconsistency of punctuation became worse in RP due to various writers for different parts.

Some lines may use both types of characters, plus there are other corner cases, but in general this should be representative:

$grep -R '…' english | wc -l
2434
$ grep -R '\.\.\.' english | wc -l
3533

Ellipsis/triple dot is about equal. My problem with it is that I don't know how do people even type ellipsis. Do they create msg files in Word?

grep -R '’' english | wc -l
3746
$ grep -R "'" english | wc -l
17637

Ascii quote is much more prevalent, as expected.

grep -R '\—' english/ | wc -l
46
grep -R '-' english | wc -l
4514

It appears that I was mistaken about em dash, it's almost unused in the original, an in UP as well. But there are also en dashes, and they are also few:

$ grep -R '\–' english/ | wc -l
28

In fact, almost all em/en dashes seem to be coupled with hyphens, as in
{130}{}{I’m so bored -— none of these gauges work, anyhow.}
No idea how does one coin these together...

Next, there are some cases of double hyphens (which, as far as I understand, are usually used to represent em dash when it's not available):

grep -R '\-\-' english | wc -l
308

Lastly, here are all other non-ascii lines (in UP):

16:./english/dialog/hcmarcus.msg:76:{159}{}{How ‘bout more questions while I think it over.}
17:./english/dialog/hcmarcus.msg:93:{172}{mcs20}{Somebody busted two lunatics out of the jail in the bank. Find ‘em. Tell me and I'll stuff ‘em back in.}
18:./english/dialog/hcmarcus.msg:98:{176}{mcs21}{What exactly do you want? Okay. We've got some folks missing. No idea what's happened. Find ‘em, get $500.}
19:./english/dialog/fcgudpea.msg:53:{152}{}{I hear that the Emperor was only a façade for Ken Lee. }
20:./english/dialog/fcbadpea.msg:51:{150}{}{I hear that the Emperor was only a façade for Ken Lee. }
21:./english/dialog/hcfrank.msg:33: expected to forget about the past and live peaceful with ‘em. Not me! Live free or die!}
22:./english/dialog/hcfrank.msg:39: it to ‘em. Marcus is the ringleader of that whole circus.}
23:./english/dialog/hcfrank.msg:51:{133}{}{The law ‘round here is based on mutant love. Do you hate mutants?}
24:./english/dialog/kcggcust.msg:59:{162}{}{‘Bout time, tribal.}
25:./english/dialog/hcencha.msg:9:{105}{}{Me work ‘yere?}
26:./english/dialog/ncbarten.msg:21:{215}{}{You want two ½ ounce bags, or a canister?}
27:./english/dialog/gcfolk.msg:106:{282}{}{Nuke Vault City ‘til it glows!}
28:./english/dialog/ncmcgee.msg:110:# 13. GIVE McGEE BACK ½ (12)
29:./english/dialog/dcsheila.msg:20:{167}{}{I like ‘em big and dumb.}
30:./english/dialog/nccorbro.msg:461:{938}{}{::Sigh:: With the cube gone, maybe I should take up macramé.}

My problem with it is that I don't know how do people even type ellipsis. Do they create msg files in Word?

Dunno, maybe the word processor they used has auto correction. That could also explain the mixed em/en dashes+hyphens (software not smart enough to do a full correction or something).

Judging from the results I think ASCII variants should be used instead of typographic ones, including the used in abbreviations.

What about hyphen+dash hybrids, replace them with double hyphens?
They do look a little shorter:
Captura de pantalla de 2019-07-04 15-18-06

In game hyphen is 6px, en dash is 5px, and em dash is 8px.
I'd vote yes for replacing hyphen+dash hybrids with two hyphens. Or keep em dashes and replace all "two hyphens" and "hyphen+dash" with em dashes (and still replace single en dash with hyphen).

OK, let's try double hyphens. It's easy to change in needed.
If fact, thinking about it again, in all these cases, I'd prioritize whichever variant is easier on the eyes, even if it's contrary to the original.

So maybe revisit this later, if there are complaints.