OBD:Text encoding: Difference between revisions

OBD:Text encoding (view source)

Revision as of 01:39, 5 January 2022

655 bytes added , 5 January 2022

→‎(A1,A0) issue: ok, this is as much as I know

Geyser

Administrators

5,391

edits

@@ Line 874: / Line 874: @@
 Unlike other Western versions (UK English, French, German, Italian, Spanish, Russian), the US engine treats high-bit characters as part of a two-byte control sequence (a provision for Asian encodings), and therefore fails to render any character from the extended ASCII range. This happens twice in English Oni, because the ellipsis character (…), encoded as 0xC9, was accidentally used in [[Quotes/Consoles/level_19d|These]] [[Quotes/Consoles/level_19e|Two]] text consoles in place of three consecutive dots (probably auto-substituted by a text editor). The result is that the two lines using a "…" are cut off at the offending character.
-===(A1,A0) issue===
+===Invalid EUC-CN input===
-Unlike the Japanese version, where non-standard Shift JIS sequences are explicitly allowed in the .fnt files, the Chinese version does not have a code table and relies on a standard EUC-CN encoding, with exactly 8,836 code points (94x94). A proper EUC-CN control sequence consists of two bytes that are both in the range 0xA1-0xFE (single US-ASCII characters are also allowed in theory).
+Unlike the Japanese version, where non-standard Shift JIS sequences are explicitly allowed in the .fnt files, the Chinese version does not have a code table and relies on a standard EUC-CN encoding, with exactly 8,836 code points (94x94). A proper EUC-CN control sequence consists of two bytes that are both in the range 0xA1-0xFE (single US-ASCII characters are also allowed in theory), and anything else is technically illegal.
-The text strings in the Chinese version mostly conform to the EUC-CN scheme, except for the rare occurrence of the (A1,A0) sequence. This is not a valid control sequence under any common extension of EUC-CN, and in any case it does not correspond to any pixel data within xf_font.dat, which only covers the standard 94x94 ''quwei'' plane, corresponding to a strict 0xA1-0xFE range for the two encoding bytes. Any text string including the (A1,A0) sequence is broken off at the offending character: this is known to occur for
+The text strings in the Chinese version mostly conform to the EUC-CN scheme, except for the (A1,A0) sequence, which occurs in a few subtitles and is rendered with a blank glyph (i.e., a space btween valid glyphs, undistinguishable from an ordinary ideographic space), apparently due to some kind of wraparound. At the time of writing it is not known what was meant by the (A1,A0) sequence, as is doesn't seem to be a valid control sequence under any common extension of EUC-CN.
-Another illegal sequence is (0xA3,0x89), which occurs only in the SUBTmessages entry xdash1 (five identical glyphs at the end of the string).
+Another illegal sequence is (0xA3,0x89), which occurs only in the SUBTmessages entry xdash1 (five identical glyphs at the end of the string) and is somehow rendered as a ㈢, which would normally be encoded with (A2,E7). Such an improbable substitution is likely unintentional, and it is not known what the intended glyph was.
+At the time of writing the apparent wraparound behavior has not been investigated thoroughly, but it is established that some illegal code points are not recovered to a valid glyph at all, and instead result in garbled text or a crash. Possibly invalid EUC-CN input is what is causing most Chapters of the Chinese Oni version to crash on modern Windows systems (through varying degrees of memory corruption), although this has not been investigated thoroughly either.
 ===Over-tall text===