OBD:Text encoding: Difference between revisions

m
Line 877: Line 877:
Unlike the Japanese version, where non-standard Shift JIS sequences are explicitly allowed in the .fnt files, the Chinese version does not have a code table and relies on a standard EUC-CN encoding, with exactly 8,836 code points (94x94). A proper EUC-CN control sequence consists of two bytes that are both in the range 0xA1-0xFE and anything else is technically illegal (single US-ASCII characters could occur in theory, but are not handled properly by the custom text engine, xfhsm_oni.dll).
Unlike the Japanese version, where non-standard Shift JIS sequences are explicitly allowed in the .fnt files, the Chinese version does not have a code table and relies on a standard EUC-CN encoding, with exactly 8,836 code points (94x94). A proper EUC-CN control sequence consists of two bytes that are both in the range 0xA1-0xFE and anything else is technically illegal (single US-ASCII characters could occur in theory, but are not handled properly by the custom text engine, xfhsm_oni.dll).


The text strings in the Chinese version mostly conform to the EUC-CN scheme, except for the (A1,A0) sequence, which occurs in a few subtitles and is rendered with a blank glyph (i.e., a space btween valid glyphs, undistinguishable from an ordinary ideographic space), apparently due to some kind of wraparound. At the time of writing it is not known what was meant by the (A1,A0) sequence, as is doesn't seem to be a valid control sequence under any common extension of EUC-CN.
The text strings in the Chinese version mostly conform to the EUC-CN scheme. A notable exception is the (A1,A0) sequence, which occurs in a few subtitles and is rendered with a blank glyph (i.e., a space between valid glyphs, undistinguishable from an ordinary ideographic space), apparently due to some kind of wraparound. At the time of writing it is not known what was meant by the (A1,A0) sequence, as it doesn't seem to be a valid control sequence under any common extension of EUC-CN.


Another illegal sequence is (0xA3,0x89), which occurs only in the SUBTmessages entry xdash1 (five identical glyphs at the end of the string) and is somehow rendered as a ㈢, which would normally be encoded with (A2,E7). Such an improbable substitution is likely unintentional, and it is not known what the intended glyph was.
Another illegal sequence is (0xA3,0x89), which occurs only in the SUBTmessages entry xdash1 (five identical glyphs at the end of the string) and is somehow rendered as a ㈢, which would normally be encoded with (A2,E7). Such an improbable substitution is likely unintentional, and it is not known what the intended glyph was.