OBD:Text encoding: Difference between revisions

m
added a little detail on the level-load crash
m (tweaked layout of lede)
m (added a little detail on the level-load crash)
 
Line 341: Line 341:


;N.B.
;N.B.
Unlike for other versions of Oni, an invalid code point does not interrupt the interpretation/rendering of a text string by xfhsm_oni.dll and can lead to a wide range of unexpected behavior: at best, a blank or otherwise unintended glyph will be displayed; at worst the rendered text will be garbled (memory corruption most likely), or the game may simply [[Blam|crash]].
Unlike for other versions of Oni, an invalid code point does not interrupt the interpretation/rendering of a text string by xfhsm_oni.dll and can lead to a wide range of unexpected behavior: at best, a blank or otherwise unintended glyph will be displayed; at worst the rendered text will be garbled (memory corruption most likely), or the game may simply crash with a [[Blam!]] message.


The current understanding is that xfhsm_oni.dll simply turns any two-byte code point QQ WW into the offset [(QQ-A1)*5E + (WW-A1)]*0x20, relative either to the start of the xf_font.dat data (for the 16x16 font) or to the middle of the data (for the small 12x12 font). Depending on the values of QQ and WW, both components of the offset can fall outside the intended 0-93 range, with values as high as 94 and as low as -161. There doesn't seem to be any sanity check, and the only special handling is for QQ=00 (in this case WW is ignored and the string is terminated).
The current understanding is that xfhsm_oni.dll simply turns any two-byte code point QQ WW into the offset [(QQ-A1)*5E + (WW-A1)]*0x20, relative either to the start of the xf_font.dat data (for the 16x16 font) or to the middle of the data (for the small 12x12 font). Depending on the values of QQ and WW, both components of the offset can fall outside the intended 0-93 range, with values as high as 94 and as low as -161. There doesn't seem to be any sanity check, and the only special handling is for QQ=00 (in this case WW is ignored and the string is terminated).
Line 1,180: Line 1,180:
|}
|}


Without a proper sanity check, some illegal code points will clearly result in pixel data being loaded not from a valid glyph region, but from irrelevant memory that belongs either to xfhsm_oni.dll or to the main Oni engine, resulting in garbled text. Memory corruption or segmentation fault (access violation) may occur if similar out-of-bounds pointers are used when rendering glyph textures. Possibly invalid EUC-CN input is what is causing most Chapters of the Chinese Oni version to crash on modern Windows systems, although this has not been investigated thoroughly.
Without a proper sanity check, some illegal code points will clearly result in pixel data being loaded not from a valid glyph region, but from irrelevant memory that belongs either to xfhsm_oni.dll or to the main Oni engine, resulting in garbled text. Memory corruption or segmentation fault (access violation) may occur if similar out-of-bounds pointers are used when rendering glyph textures. Possibly invalid EUC-CN input is what is causing most Chapters of the Chinese Oni version to crash on modern Windows systems, although this crash is different because it happens without the Blam! dialog appearing; also, it can be avoided by turning down the graphics quality to Superlow. This indicates an issue related to the amount of memory being used, but it's possible the crash is also text-related; the cause has yet to be determined.


====Non-translated US-ASCII====
====Non-translated US-ASCII====
ASCII strings are much more harmful when handled by xfhsm_oni.dll, as compared to the two invalid code points (A3,A0) and (A3,0x89), because pairs of US-ASCII bytes, misinterpreted as EUC-CN code points, end up referencing completely strange memory regions (outside the region occupied by xf_font.dat). Unfortunately, there are a few ASCII strings that xfhsm_oni.dll can come across even during regular gameplay, and many more arise if one allows for modding.
ASCII strings are much more harmful when handled by xfhsm_oni.dll, as compared to the two invalid code points (A3,A0) and (A3,0x89), because pairs of US-ASCII bytes, misinterpreted as EUC-CN code points, end up referencing completely strange memory regions (outside the region occupied by xf_font.dat). Unfortunately, there are a few ASCII strings that xfhsm_oni.dll can come across even during regular gameplay, and many more arise if one allows for modding.
=====Count on it=====
=====Count on it=====
The following string in SUBTsubtitles has not been translated into Chinese:
The following string in SUBTsubtitles has not been translated into Chinese: