OBD:Text encoding: Difference between revisions

OBD:Text encoding (view source)

Revision as of 00:54, 5 January 2022

63 bytes added , 5 January 2022

m

→‎Japanese: disambig

Geyser

Administrators

5,391

edits

@@ Line 863: / Line 863: @@
 It must be noted that, as compared to the separate .fnt files, the Japanese TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1,357) and is essentially useless/unusable.
-*The Japanese engine requires all four .fnt files to be present (bails out if any of them are missing) and uses them for all of the vanilla text strings, which only contain double-byte control codes. Thus, under normal conditions, TSFFTahoma remains completely unused in the Japanese version.
+*The Japanese engine requires all four .fnt files to be present (bails out if any of them are missing) and uses them for all of the vanilla text strings, which only contain double-byte control codes. Thus, under normal conditions, TSFFTahoma remains completely unused in the Japanese version, and would only be used for (artificially added) US-ASCII input.
 *If the US engine is used on the Japanese game data, then the .fnt files are ignored (obviously), and the incomplete TSFFTahoma is used to render the Japanese text strings as well as the few English strings supplied by the EXE. Due to the limited character set, many strings end up broken.
@@ Line 869: / Line 869: @@
 At the time of writing, the code points and pixel data in the Japanese .fnt files have not been thoroughly analyzed and compared with JIS X 0208. We know that 1,357 glyphs are implemented, across 27 "lead bytes" (roughly 50 ''kuten'' rows). This is much smaller than the full ''kuten'' plane, and makes sense in terms of space efficiency. We also know that some code points are non-standard (rearranged) as compared to regular Shift JIS, although we do not yet know if this rearrangement is consistent with any common variation of Shift JIS. As long as Japanese game data contains text strings that match the game's encoding, non-standard code points are not a problem (but should be kept in mind).
 ==Text anomalies==