Jump to content

OBD:Text encoding: Difference between revisions

m
(mystery solved-ish & minor touch-up)
Line 344: Line 344:
The current understanding is that xfhsm_oni.dll simply turns any two-byte code point QQ WW into the offset [(QQ-A1)*5E + (WW-A1)]*0x20, relative either to the start of the xf_font.dat data (for the 16x16 font) or to the middle of the data (for the small 12x12 font). Both components of the offset can exceed the intended 0-93 range, with values as high as 94 and as low as -161, depending on the values of QQ and WW, and there doesn't seem to be any sanity check. The only special case is if QQ==00, in which case WW is ignored and the string is terminated.
The current understanding is that xfhsm_oni.dll simply turns any two-byte code point QQ WW into the offset [(QQ-A1)*5E + (WW-A1)]*0x20, relative either to the start of the xf_font.dat data (for the 16x16 font) or to the middle of the data (for the small 12x12 font). Both components of the offset can exceed the intended 0-93 range, with values as high as 94 and as low as -161, depending on the values of QQ and WW, and there doesn't seem to be any sanity check. The only special case is if QQ==00, in which case WW is ignored and the string is terminated.


A valid EUC-CN code point (with both bytes in the 0xA1-0xFE range) results in a valid offset pointing to an actual glyph for the relevant font, whereas illegal bytes or byte pairs may point to a different glyph within the same font, or to a glyph of the other font, or to a completely unrelated memory region. In the worst case scenario, pixel data will be read at 486,432 bytes (~475 kB) ahead of the actual pixel data (for the code point 01,00) or at 3008-3040 bytes (~3 kB) past the actual pixel data (for the code point FF,FF).
A valid EUC-CN code point (with both bytes in the 0xA1-0xFE range) results in a valid offset pointing to an actual glyph for the relevant font, whereas illegal bytes or byte pairs may point to a different glyph within the same font, or to a glyph of the other font, or to a completely unrelated memory region. In the worst case scenario, pixel data will be read at 486,432 bytes (~475 kB) ahead of the actual pixel data (if displaying the code point 01,00 for the large font) or at 3008-3040 bytes (~3 kB) past the actual pixel data (if diplaying the code point FF,FF for the small font).


Reading garbage pixel data shouldn't be causing memory corruption per se (merely nonsensical/garbled text), but if similar out-of-bounds pointers occur for glyph rendering, then xfhsm_oni.dll may occasionally overwrite its own memory or even Oni's. This has not been thoroughly investigated, but it seems advisable to ensure that all text consists of valid EUC-CN code points (which is unfortunately not the case, see [[#Invalid EUC-CN input|"Invalid EUC-CN input"]] below).
Reading garbage pixel data shouldn't be causing memory corruption per se (merely nonsensical/garbled text), but if similar out-of-bounds pointers occur for glyph rendering, then xfhsm_oni.dll may occasionally overwrite its own memory or even Oni's. This has not been thoroughly investigated, but it seems advisable to ensure that all text consists of valid EUC-CN code points (which is unfortunately not the case, see [[#Invalid EUC-CN input|"Invalid EUC-CN input"]] below).
Line 350: Line 350:


----
----
===Japanese===
===Japanese===
Japanese Oni uses a custom two-byte encoding that is mostly consistent with [[wp:Shift_JIS|Shift JIS]] but with some of the control sequences rearranged in seemingly non-standard ways. Like Chinese Oni, the glyph data is stored in new, external files; in this case they are .fnt files stored in GameDataFolder. Three font sizes are available, with pixel sizes 11x11 ('''JPN_SMALL.fnt'''), 12x12 ('''JPN_MIDDLE.fnt''') and 14x14 ('''JPN_BIG.fnt'''). The 14x14 font has a bold-face variant ('''JPN_BOLD.fnt'''). All four fonts are fixed-width, i.e. all glyphs have a square bounding box.
Japanese Oni uses a custom two-byte encoding that is mostly consistent with [[wp:Shift_JIS|Shift JIS]] but with some of the control sequences rearranged in seemingly non-standard ways. Like Chinese Oni, the glyph data is stored in new, external files; in this case they are .fnt files stored in GameDataFolder. Three font sizes are available, with pixel sizes 11x11 ('''JPN_SMALL.fnt'''), 12x12 ('''JPN_MIDDLE.fnt''') and 14x14 ('''JPN_BIG.fnt'''). The 14x14 font has a bold-face variant ('''JPN_BOLD.fnt'''). All four fonts are fixed-width, i.e. all glyphs have a square bounding box.