Jump to content

OBD:Text encoding: Difference between revisions

→‎Chinese: filling in missing/incorrect stuff
(→‎Chinese: filling in missing/incorrect stuff)
Line 329: Line 329:
----
----
===Chinese===
===Chinese===
The Chinese version of Oni has the same TSFFTahoma as the original US version (trimmed-down Mac OS Roman), but the engine cannot interpret the extended ASCII range, and in fact does not use TSFFTahoma at all. Instead the user launches a wrapper mini-app called '''oni.exe''' which executes the main game code in '''Oni.dat''' (a renamed copy of the original Oni.exe from the US version) with text logic injected from '''xfhsm_oni.dll''' and font data loaded from '''xf_font.dat'''. Text strings that Oni intends to display are then intercepted by xfhsm_oni.dll and the resulting pixel data from xf_font.dat is injected into Oni's OpenGL context.
The Chinese version of Oni is unique in how the main game code resides in '''Oni.dat''', a renamed copy of the original Oni.exe from the US version that is executed indirectly by a wrapper app called '''oni.exe''', alongside a custom text engine, '''xfhsm_oni.dll'''. The latter DLL intercepts any text about to be displayed by "Oni.dat", first reducing it to a set of two-byte control sequences, and then (if all goes well) to a set of custom glyphs, with pixel data coming from an external font file, '''xf_font.dat'''.


Unlike other versions of Oni, the Chinese font doesn't have a table listing the valid code points along with their "glyph descriptors" (i.e., instructions on how to extract a glyph from the raw pixel data). Instead all the glyphs have a standard size of 16x16 pixels and there are exactly 94x94=8,836 glyphs, filling up a standard [[wp:GB_2312|GB 2312]] plane (''qūwèi''), indexed through a compact numbering scheme known as [[wp:Extended_Unix_Code#EUC-CN|EUC-CN]]: each of the 94x94 code points is indexed by a pair of bytes that are both in the 0xA1-0xFE range. Code points that are not assigned under GB 2312 (e.g., rows 10-15 and 90-94) simply have blank pixel data in the corresponding regions of xf_font.dat.
Unlike for the original US engine or the Japanese one, xfhsm_oni.dll does not expect any single-byte characters in the input, does not interpret US-ASCII strings in any meaningful way and never resorts to level0_Final's TSFFTahoma for text display. The pixel data comes exclusively from xf_font.dat and the expected control sequences are exclusively two-byte code points (this includes string termination; instead of a single null char xfhsm_oni.dll expects a string to end with a pair of null chars).


The pixel packing used by xf_font.dat is 1-bit black-and-white (i.e., without antialiasing), which is much more space-efficient than the 8-bit grayscale storage used in Oni's [[TSFT]]. Another gain comes from not having any glyph descriptors ([[TSGA]]s). Both a regular and a bold typeface are available (but in one size only, fixed-width 16x16).
Unlike for other versions of Oni, the Chinese font does not have a table listing the valid code points along with their "glyph descriptors" (i.e., instructions on how to extract a glyph from the raw pixel data). Instead all the glyphs are stored as fixed-size bitmaps (16x16 pixels each) and there are exactly 94x94=8,836 glyphs, filling up a standard [[wp:GB_2312|GB 2312]] plane (''qūwèi''), indexed through a compact numbering scheme known as [[wp:Extended_Unix_Code#EUC-CN|EUC-CN]]: each of the 94x94 code points is indexed by a pair of bytes that are both in the 0xA1-0xFE range. Code points that are not assigned under GB 2312 (e.g., rows 10-15 and 90-94) simply have blank pixel data in the corresponding regions of xf_font.dat.


At the time of writing, the pixel data in xf_font.dat has not been thoroughly analyzed and compared with GB 2312, so we do not know for sure if all the GB 2312 glyphs are implemented or if there are some additional blanks. The encoding may also be one of several extensions of EUC-CN, although it should be kept in mind that control bytes need to remain inside the 0xA1-0xFE range for the raw 94x94 layout to work.
Two glyph sizes are available: 16x16 glyphs are stored in the first half of xf_font.dat, and 12x12 glyphs in the second half. Each 12x12 glyph is stored in the top left corner of a 16x16 bitmap, so the row/glyph alignment is the same in both cases: 2 bytes per pixel row and 32 bytes per glyph. The pixel packing is 1-bit black-and-white (i.e., without antialiasing), much more space-efficient than the 8-bit grayscale storage used in Oni's [[TSFT]]. Another gain comes from not having any glyph descriptors ([[TSGA]]s), and from having only two fonts instead of Oni's typical 15.


In theory, EUC-CN allows for single-byte control codes, which would be interpreted as US-ASCII and rendered using Oni's own TSFFTahoma. In practice, all of the strings in the Chinese game data use only two-byte control sequences.
All the GB 2312 glyphs listed [[wp:GB_2312#Non-Hanzi_rows|HERE]] and [[wikt:Appendix:Chinese_hanzi_by_GB_2312_quwei_code|HERE]] are implemented, except for the euro sign and the ten glyphs from [[wp:Vertical_Forms|Vertical Forms]].
 
Unlike for other versions of Oni, an invalid code point does not interrupt the interpretation/rendering of a text string by xfhsm_oni.dll and can lead to a wide range of unexpected behavior: at best, a blank or otherwise unintended glyph will be displayed; at worst the rendered text will be garbled (memory corruption most likely), or the game may simply [[Blam|crash]].




----
----
===Japanese===
===Japanese===
Japanese Oni uses a custom two-byte encoding that is mostly consistent with [[wp:Shift_JIS|Shift JIS]] but with some of the control sequences rearranged in seemingly non-standard ways. Like Chinese Oni, the glyph data is stored in new, external files; in this case they are .fnt files stored in GameDataFolder. Three font sizes are available, with pixel sizes 11x11 ('''JPN_SMALL.fnt'''), 12x12 ('''JPN_MIDDLE.fnt''') and 14x14 ('''JPN_BIG.fnt'''). The 14x14 font has a bold-face variant ('''JPN_BOLD.fnt'''). All four fonts are fixed-width, i.e. all glyphs have a square bounding box.
Japanese Oni uses a custom two-byte encoding that is mostly consistent with [[wp:Shift_JIS|Shift JIS]] but with some of the control sequences rearranged in seemingly non-standard ways. Like Chinese Oni, the glyph data is stored in new, external files; in this case they are .fnt files stored in GameDataFolder. Three font sizes are available, with pixel sizes 11x11 ('''JPN_SMALL.fnt'''), 12x12 ('''JPN_MIDDLE.fnt''') and 14x14 ('''JPN_BIG.fnt'''). The 14x14 font has a bold-face variant ('''JPN_BOLD.fnt'''). All four fonts are fixed-width, i.e. all glyphs have a square bounding box.