OBD:Text encoding: Difference between revisions

→‎Japanese: actually there is a lot of ASCII in Japanese text
m (→‎Chinese: minor touch-up)
(→‎Japanese: actually there is a lot of ASCII in Japanese text)
Line 869: Line 869:
As for the first code page of the Japanese TSFFTahoma, it implements only the 0x20-0x7F range of characters, i.e., is limited to [[wp:US-ASCII|US-ASCII]]. This is consistent with the simplified logic used by the Japanese engine, where any high-bit byte (in the 0x80-0xFF range) is treated as the start of a two-byte sequence. (In actual Shift JIS some high-bit bytes are interpreted as half-width kana, a feature that isn't supported by Oni's engine.)
As for the first code page of the Japanese TSFFTahoma, it implements only the 0x20-0x7F range of characters, i.e., is limited to [[wp:US-ASCII|US-ASCII]]. This is consistent with the simplified logic used by the Japanese engine, where any high-bit byte (in the 0x80-0xFF range) is treated as the start of a two-byte sequence. (In actual Shift JIS some high-bit bytes are interpreted as half-width kana, a feature that isn't supported by Oni's engine.)


It must be noted that, as compared to the separate .fnt files, the Japanese TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1,357) and is essentially useless/unusable.
It must be noted that, as compared to the separate .fnt files, the Japanese TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1,357) and is essentially useless/unusable except for its US-ASCII part.
*The Japanese engine requires all four .fnt files to be present (bails out if any of them are missing) and uses them for all of the vanilla text strings, which only contain double-byte control codes. Thus, under normal conditions, TSFFTahoma remains completely unused in the Japanese version, and would only be used for (artificially added) US-ASCII input.
*The Japanese engine requires all four .fnt files to be present (bails out if any of them are missing) and uses them for any double-byte code points, resorting to TSFFTahoma only for the rare occurrences of US-ASCII (resolution strings, the "On" labels in the Options menu, etc).
*If the US engine is used on the Japanese game data, then the .fnt files are ignored (obviously), and the incomplete TSFFTahoma is used to render the Japanese text strings as well as the few English strings supplied by the EXE. Due to the limited character set, many strings end up broken.  
*If the US engine is used on the Japanese game data, then the .fnt files are ignored (obviously), and the incomplete TSFFTahoma is used to render both US-ASCII and Japanese glyphs. Due to the limited character set (154 glyphs instead of 1,357), many strings end up broken in this situation.  


It appears that the Japanese localization team initially tried to put Oni's code page system to use, and to fill in all the required JIS glyphs into TSFT and TSGA. As the number of kanji increased, supposedly, the TSFT grew prohibitively large due to the use of 8-bit grayscale storage for the pixel data, and the size taken up by the sparsely populated TSGA also increased out of proportion with the rest of the game data. At some point the engine switched to separate .fnt files, and somehow no one bothered to clean up the incomplete code pages in TSFFTahoma.
It appears that the Japanese localization team initially tried to put Oni's code page system to use, and to fill in all the required JIS glyphs into TSFT and TSGA. As the number of kanji increased, supposedly, the TSFT grew prohibitively large due to the use of 8-bit grayscale storage for the pixel data, and the size taken up by the sparsely populated TSGA also increased out of proportion with the rest of the game data. At some point the engine switched to separate .fnt files, and somehow no one bothered to clean up the incomplete code pages in TSFFTahoma.