OBD:Text encoding: Difference between revisions

From OniGalore
Jump to navigation Jump to search
(→‎(A1,A0) issue: ok, this is as much as I know)
m (link fix)
 
(31 intermediate revisions by 3 users not shown)
Line 3: Line 3:
:(An overview of the known language versions can be found [[OBD:Versions|HERE]], whereas localized content is detailed [[OBD:Localization|HERE]].)
:(An overview of the known language versions can be found [[OBD:Versions|HERE]], whereas localized content is detailed [[OBD:Localization|HERE]].)
Depending on the language version, vanilla Oni uses one of the following five encodings to render text:
Depending on the language version, vanilla Oni uses one of the following five encodings to render text:
*The original US version uses a trimmed-down [[wp:Mac_OS_Roman|Mac OS Roman]] code page that is effectively limited to US-ASCII (96 code points).
*The original US version uses a trimmed-down [[wp:Mac_OS_Roman|Mac OS Roman]] code page that is effectively limited to [[wp:ASCII|US-ASCII]] (96 code points used, 256 available).
*European localizations (UK English, French, Italian, Spanish, German) use a custom version of Mac OS Roman (192 code points).
*European localizations (UK English, French, Italian, Spanish, German) use a custom version of Mac OS Roman (192 code points used, 256 available).
*The Russian localization uses a full implementation of the [[wp:Windows-1251|Windows-1251]] (Cyrillic) code page (224 code points).
*The Russian localization uses a (nearly) full implementation of the [[wp:Windows-1251|Windows-1251]] (Cyrillic) code page (224 code points used, 256 available).
*The Chinese localization uses the [[wp:Extended_Unix_Code#EUC-CN|EUC-CN]] implementation of [[wp:GB_2312|GB 2312]] (8,836 code points).
*The Chinese localization uses the [[wp:Extended_Unix_Code#EUC-CN|EUC-CN]] implementation of [[wp:GB_2312|GB 2312]] (7,668 code points used, 8,836 available).
*The Japanese localization uses 1,357 code points mostly conforming to the [[wp:Shift_JIS|Shift JIS]] implementation of [[wp:JIS_X_0208|JIS X 0208]].
*The Japanese localization uses 1,357 code points mostly conforming to the [[wp:Shift_JIS|Shift JIS]] implementation of [[wp:JIS_X_0208|JIS X 0208]].
Properties of the fonts that are eventually used to render the text (via the encoding) are briefly described throughout the page.
Properties of the fonts that are eventually used to render the text (via the encoding) are briefly described throughout the page.
:(A more thorough overview of the glyphs can be found [[/Fonts|HERE]].)
:(A more thorough overview of the glyphs can be found [[/Fonts|HERE]].)


==Encodings==
==Encodings==
Line 115: Line 114:
|-bgcolor=orange
|-bgcolor=orange
!bgcolor=silver|0xF...
!bgcolor=silver|0xF...
!bgcolor=black|[[File:Platform-Mac.png|12px]]
!bgcolor=black|[[Image:Platform-Mac.png|12px]]
!Ò
!Ò
!bgcolor=black|<span style="color:darkslategray">&#218;</span>
!bgcolor=black|<span style="color:darkslategray">&#218;</span>
Line 130: Line 129:
|}
|}
;Minor notes
;Minor notes
*The MacRoman layout was apparently "borrowed" before 1998, when Mac OS 8.5 came out and the [[wp:Currency sign (typography)|international currency sign]] a.k.a. scarab (¤), at 0xDB, was replaced with the euro symbol (€).
*The MacRoman layout was apparently "borrowed" before 1998, when Mac OS 8.5 came out and the [[wp:Currency sign (generic)|international currency sign]] a.k.a. scarab (¤), at 0xDB, was replaced with the euro symbol (€).
*The actual font (see [[/Fonts|HERE]]) has some unusual typographical features, such as a single-stroke Yen/Yuan symbol (Ұ) and a vertical-stroke cent symbol (¢).
*The actual font (see [[/Fonts|HERE]]) has some unusual typographical features, such as a single-stroke Yen/Yuan symbol (Ұ) and a vertical-stroke cent symbol similar to Unicode's Fullwidth Cent Sign (¢) character as seen in Windows Arial (note to Mac users: don't be confused, as this character will appear with a diagonal stroke on your system like the regular '¢' character).
;Major notes
;Major notes
*Some of the removed glyphs (most importantly ß, ù and û, but also Ê, Ú and ú) occur in [[wp:Languages of the European Union#Knowledge|common European languages]]. This made the US TSFFTahoma unsuitable for [[wikt:EFIGS|EFIGS]] localizations, requiring the creation of a new version (see below).  
*Some of the removed glyphs (most importantly ß, ù and û, but also Ê, Ú and ú) occur in [[wp:Languages of the European Union#Knowledge|common European languages]]. This made the US TSFFTahoma unsuitable for [[wikt:EFIGS|EFIGS]] localizations, requiring the creation of a new version (see below).  
*The US engine actually cannot interpret any code points beyond the US-ASCII range (first 6 rows, white background), notably failing on "…" (see [[#Ellipsis_issue|"Ellipsis issue"]] below). This is because of a provision for Asian encoding systems (EUC-CN and Shift JIS), which use two-byte sequences starting with a high-bit byte.
*The US engine actually cannot interpret any code points beyond the US-ASCII range (first 6 rows, white background), notably failing on 0xC9's "…". This is because of a nominal but unused provision for Asian text encodings. See "[[#Ellipsis_issue|Ellipsis issue]]" below for details.




----
----
===European===
===European===
The code page used by the five Western European versions (UK English, French, German, Spanish and Italian) is slightly different from the trimmed-down Mac OS Roman.
The code page used by the five Western European versions (UK English, French, German, Spanish and Italian) is slightly different from the trimmed-down Mac OS Roman.
*It tends to the needs of European localizations by adding back the following characters:<br>German ß; French Ê and û; French/Italian ù; Spanish/Italian Ú and ú (relatively rare).
*It tends to the needs of European localizations by adding back the following characters:<br>German ß; French Ê and û; French/Italian ù; Spanish/Italian Ú and ú (relatively rare).
:'''N.B.''' The characters Æ and ÿ are not reinstated, despite their (very rare) occurrence in French script.
:'''N.B.''' The characters Æ and ÿ are not reinstated, despite their (very rare) occurrence in French script.
*Awkwardly enough, the six characters are not restored in their original positions (grey-on-black), but take the place of math symbols.<br/>Four more "math" positions are inexplicably filled with three duplicate characters (œ, ¡ and ª) and a truly enigmatic ʖ̇ , which doesn't seem to occur in any known language and has no dedicated code point in Unicode.
*Awkwardly enough, the six characters are not restored in their original positions (grey-on-black), but take the place of math symbols.<br/>Four more "math" positions are inexplicably filled with three duplicate characters (œ, ¡ and ª) and a truly enigmatic ʖ̇ , which doesn't seem to occur in any known language and has no dedicated code point in Unicode (the character you see here was constructed from Unicode's U+0296 Latin Letter Inverted Glottal Stop (ʖ) plus U+0307 Combining Dot Above.
:'''N.B.''' The broken italic font variants (see [[/Fonts#Italic|HERE]]) do not fully implement the 10 new glyphs and use a regular question mark instead of the  ʖ̇.
:'''N.B.''' The broken italic font variants (see [[/Fonts#Italic|HERE]]) do not fully implement the 10 new glyphs and use a regular question mark instead of the  ʖ̇.
{|border=1 cellpadding=3 cellspacing=0
{|border=1 cellpadding=3 cellspacing=0
Line 260: Line 260:


----
----
===Cyrillic===
===Cyrillic===
In the Russian version of Oni, TSFFTahoma implements the [[wp:Windows-1251|Windows-1251]] (Cyrillic) code page, with some deviations.
In the Russian version of Oni, TSFFTahoma implements the [[wp:Windows-1251|Windows-1251]] (Cyrillic) code page, with some deviations.
Line 319: Line 320:
;Italic fonts
;Italic fonts
:The Russian version only provides an implementation of Windows-1251 for regular and bold fonts. The five italic fonts (7pt, 9pt, 10pt, 12pt and 14pt) have exactly the same data (pixels and glyph descriptors) as for the European iteration of Mac OS Roman. This makes sense because italic fonts are inherently broken (see [[/Fonts#Italic|HERE]]) and thus not used by any text in vanilla Oni.  
:The Russian version only provides an implementation of Windows-1251 for regular and bold fonts. The five italic fonts (7pt, 9pt, 10pt, 12pt and 14pt) have exactly the same data (pixels and glyph descriptors) as for the European iteration of Mac OS Roman. This makes sense because italic fonts are inherently broken (see [[/Fonts#Italic|HERE]]) and thus not used by any text in vanilla Oni.  
;Bold 14 font
;14pt bold font
:Somewhat surprisingly, the size-14 TSFT in the Russian version of TSFFTahoma does not have a complete Windows-1251 code page either. Instead it is limited to the US-ASCII character set (including the "printable delete" box at code point 0x7F), i.e., the upper section of the above table (white background). This causes no issue in vanilla Oni, but only because there is no text that uses bold 14.   
:Somewhat surprisingly, the 14pt bold TSFT in the Russian version of TSFFTahoma does not have a complete Windows-1251 code page either. Instead it is limited to the US-ASCII character set (including the "printable delete" box at code point 0x7F), i.e., the upper section of the above table (white background). This causes no issue in vanilla Oni, but only because there is no text that uses 14pt bold.   
;Incomplete transparency
;Incomplete transparency
:A unique "feature" of the Russian/Cyrillic TSFFTahoma is that all the characters in the extended ASCII range (0x80-0xFF) have a slightly opaque background (about 3% opacity) in the regular (non-bold) font variant. This isn't visible ingame, but only because the engine (re)posterizes all the glyphs into 4-bit grayscale when rendering (so that only opacities above 6% are visible).
:A unique "feature" of the Russian/Cyrillic TSFFTahoma is that all the characters in the extended ASCII range (0x80-0xFF) have a slightly opaque background (about 3% opacity) in the regular (non-bold) font variant. This isn't visible ingame, but only because the engine (re)posterizes all the glyphs into 4-bit grayscale when rendering (so that only opacities above 6% are visible).
;Glyph alignment and spacing
;Glyph alignment and spacing
:Last but not least, some fonts in the Russian TSFFTahoma have inconsistent vertical alignment, the most blatant example being 12 bold: some glyphs are one pixel shorter or taller than the full line height (ascender+descender), without a properly compensated vertical glyph offset; others simply have pixels that are not properly aligned within a glyph's rectangle. Besides, many glyphs have excessive padding to the left and/or right of a character, which affects readability.<br />'''N.B.''' There are other examples of poor alignment, e.g. for 12 bold, the character 0x9C (њ) has its right side cut off and is thus unusable (luckily it doesn't occur in Russian script).
:Last but not least, some fonts in the Russian TSFFTahoma have inconsistent vertical alignment, the most blatant example being 12pt bold: some glyphs are one pixel shorter or taller than the full line height (ascender+descender), without a properly compensated vertical glyph offset; others simply have pixels that are not properly aligned within a glyph's rectangle. Besides, many glyphs have excessive padding to the left and/or right of a character, which affects readability.<br />'''N.B.''' There are other examples of poor alignment, e.g., for 12pt bold, the character 0x9C (њ) has its right side cut off and is thus unusable (luckily it doesn't occur in Russian script).




----
----
===Chinese===
===Chinese===
The Chinese version of Oni is unique in how the main game code resides in '''Oni.dat''', a renamed copy of the original Oni.exe from the US version that is executed indirectly by a wrapper app called '''oni.exe''', alongside a custom text engine, '''xfhsm_oni.dll'''. The latter DLL intercepts any text about to be displayed by "Oni.dat", first reducing it to a set of two-byte control sequences, and then (if all goes well) to a set of custom glyphs, with pixel data coming from an external font file, '''xf_font.dat'''.
The Chinese version of Oni is unique in how the main game code resides in '''Oni.dat''', a renamed copy of the original Oni.exe from the US version that is executed indirectly by a wrapper app called '''oni.exe''', alongside a custom text engine, '''xfhsm_oni.dll'''. The latter DLL intercepts any text about to be displayed by "Oni.dat", first reducing it to a set of two-byte control sequences, and then (if all goes well) to a set of custom glyphs, with pixel data coming from an external font file, '''xf_font.dat'''.
Line 337: Line 339:
Two glyph sizes are available: 16x16 glyphs are stored in the first half of xf_font.dat, and 12x12 glyphs in the second half. Each 12x12 glyph is stored in the top left corner of a 16x16 bitmap, so the row/glyph alignment is the same in both cases: 2 bytes per pixel row and 32 bytes per glyph. The pixel packing is 1-bit black-and-white (i.e., without antialiasing), much more space-efficient than the 8-bit grayscale storage used in Oni's [[TSFT]]. Another gain comes from not having any glyph descriptors ([[TSGA]]s), and from having only two fonts instead of Oni's typical 15.
Two glyph sizes are available: 16x16 glyphs are stored in the first half of xf_font.dat, and 12x12 glyphs in the second half. Each 12x12 glyph is stored in the top left corner of a 16x16 bitmap, so the row/glyph alignment is the same in both cases: 2 bytes per pixel row and 32 bytes per glyph. The pixel packing is 1-bit black-and-white (i.e., without antialiasing), much more space-efficient than the 8-bit grayscale storage used in Oni's [[TSFT]]. Another gain comes from not having any glyph descriptors ([[TSGA]]s), and from having only two fonts instead of Oni's typical 15.


All the GB 2312 glyphs listed [[wp:GB_2312#Non-Hanzi_rows|HERE]] and [[wikt:Appendix:Chinese_hanzi_by_GB_2312_quwei_code|HERE]] are implemented, except for the euro sign and the ten glyphs from [[wp:Vertical_Forms|Vertical Forms]].
All the GB 2312 glyphs listed [[wp:GB_2312#Non-Hanzi_rows|HERE]] and [[wikt:Appendix:Chinese_hanzi_by_GB_2312_quwei_code|HERE]] are implemented, except for the euro sign (row 2) and the ten glyphs from [[wp:Vertical_Forms|Vertical Forms]] (row 6). Thus of the 8,836 available code points only 7,668 (including the ideographic space A1,A1) correspond to actual glyphs, whereas the other 1,168 correspond to blank pixel data (indistinguishable from a space). In terms of space efficiency, only 74,752 bytes are thus wasted on blank pixel data (2 x 1,168 x 32), whereas trimmed-down pixel data would require at least 2x7,668 + 2 = 15,338 bytes for an indexation of the available glyphs, as well as additional lookup logic.


;N.B.
Unlike for other versions of Oni, an invalid code point does not interrupt the interpretation/rendering of a text string by xfhsm_oni.dll and can lead to a wide range of unexpected behavior: at best, a blank or otherwise unintended glyph will be displayed; at worst the rendered text will be garbled (memory corruption most likely), or the game may simply [[Blam|crash]].
Unlike for other versions of Oni, an invalid code point does not interrupt the interpretation/rendering of a text string by xfhsm_oni.dll and can lead to a wide range of unexpected behavior: at best, a blank or otherwise unintended glyph will be displayed; at worst the rendered text will be garbled (memory corruption most likely), or the game may simply [[Blam|crash]].
The current understanding is that xfhsm_oni.dll simply turns any two-byte code point QQ WW into the offset [(QQ-A1)*5E + (WW-A1)]*0x20, relative either to the start of the xf_font.dat data (for the 16x16 font) or to the middle of the data (for the small 12x12 font). Depending on the values of QQ and WW, both components of the offset can fall outside the intended 0-93 range, with values as high as 94 and as low as -161. There doesn't seem to be any sanity check, and the only special handling is for QQ=00 (in this case WW is ignored and the string is terminated).
A valid EUC-CN code point (with both bytes in the 0xA1-0xFE range) results in a valid offset pointing to an actual glyph for the relevant font, whereas illegal bytes or byte pairs may point to a different glyph within the same font, or to a glyph of the other font, or to a completely unrelated memory region. In the worst case scenario, pixel data will be read at 486,432 bytes (~475 kB) ahead of the actual pixel data (if displaying the code point 01,00 for the large font) or at 3008-3040 bytes (~3 kB) past the actual pixel data (if displaying the code point FF,FF for the small font).
Reading garbage pixel data shouldn't be causing memory corruption per se (merely nonsensical/garbled text), but if similar out-of-bounds pointers occur for glyph rendering, then xfhsm_oni.dll may occasionally overwrite its own memory or even Oni's. This has not been thoroughly investigated, but it seems advisable to ensure that all text consists of valid EUC-CN code points (which is unfortunately not the case, see [[#Invalid EUC-CN input|"Invalid EUC-CN input"]] below).




Line 860: Line 869:
{{divhide|end}}
{{divhide|end}}


As for the first code page of the Japanese TSFFTahoma, it implements only the 0x20-0x7F range of characters, i.e., is limited to [[wp:US-ASCII|US-ASCII]]. This is consistent with the simplified logic used by the Japanese engine, where any high-bit byte (in the 0x80-0xFF range) is treated as the start of a two-byte sequence. (In actual Shift JIS some high-bit bytes are interpreted as half-width kana, a feature that isn't supported by Oni's engine.)
As for the first code page of the Japanese TSFFTahoma, it implements only the 0x20-0x7F range of characters, i.e., is limited to US-ASCII. This is consistent with the simplified logic used by the Japanese engine, where any high-bit byte (in the 0x80-0xFF range) is treated as the start of a two-byte sequence. (In actual Shift JIS some high-bit bytes are interpreted as half-width kana, a feature that isn't supported by Oni's engine.)


It must be noted that, as compared to the separate .fnt files, the Japanese TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1,357) and is essentially useless/unusable.
It must be noted that, as compared to the separate .fnt files, the Japanese TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1,357) and is essentially useless/unusable except for its US-ASCII part.
*The Japanese engine requires all four .fnt files to be present (bails out if any of them are missing) and uses them for all of the vanilla text strings, which only contain double-byte control codes. Thus, under normal conditions, TSFFTahoma remains completely unused in the Japanese version, and would only be used for (artificially added) US-ASCII input.
*The Japanese engine requires all four .fnt files to be present (bails out if any of them are missing) and uses them for any double-byte code points, resorting to TSFFTahoma only for the rare occurrences of US-ASCII (resolution strings, the "On" labels in the Options menu, etc).
*If the US engine is used on the Japanese game data, then the .fnt files are ignored (obviously), and the incomplete TSFFTahoma is used to render the Japanese text strings as well as the few English strings supplied by the EXE. Due to the limited character set, many strings end up broken.  
*If the US engine is used on the Japanese game data, then the .fnt files are ignored (obviously), and the incomplete TSFFTahoma is used to render both US-ASCII and Japanese glyphs. Due to the limited character set (154 glyphs instead of 1,357), many strings end up broken in this situation.  


It appears that the Japanese localization team initially tried to put Oni's code page system to use, and to fill in all the required JIS glyphs into TSFT and TSGA. As the number of kanji increased, supposedly, the TSFT grew prohibitively large due to the use of 8-bit grayscale storage for the pixel data, and the size taken up by the sparsely populated TSGA also increased out of proportion with the rest of the game data. At some point the engine switched to separate .fnt files, and somehow no one bothered to clean up the incomplete code pages in TSFFTahoma.
It appears that the Japanese localization team initially tried to put Oni's code page system to use, and to fill in all the required JIS glyphs into TSFT and TSGA. As the number of kanji increased, supposedly, the TSFT grew prohibitively large due to the use of 8-bit grayscale storage for the pixel data, and the size taken up by the sparsely populated TSGA also increased out of proportion with the rest of the game data. At some point the engine switched to separate .fnt files, and somehow no one bothered to clean up the incomplete code pages in TSFFTahoma.
Line 872: Line 881:
==Text anomalies==
==Text anomalies==
===Ellipsis issue===
===Ellipsis issue===
Unlike other Western versions (UK English, French, German, Italian, Spanish, Russian), the US engine treats high-bit characters as part of a two-byte control sequence (a provision for Asian encodings), and therefore fails to render any character from the extended ASCII range. This happens twice in English Oni, because the ellipsis character (…), encoded as 0xC9, was accidentally used in [[Quotes/Consoles/level_19d|These]] [[Quotes/Consoles/level_19e|Two]] text consoles in place of three consecutive dots (probably auto-substituted by a text editor). The result is that the two lines using a "…" are cut off at the offending character.
Unlike other Western versions (UK English, French, German, Italian, Spanish, Russian), the US engine treats characters above 0x7F as part of a two-byte control sequence (an unused provision for Asian encodings), and therefore fails to render any character from the extended ASCII range. This happens twice in English Oni because the ellipsis character (…), encoded as 0xC9, was accidentally used in <u>[[Quotes/Consoles/level_19d|these]]</u> <u>[[Quotes/Consoles/level_19e|two]]</u> text consoles instead of three consecutive periods (probably auto-substituted by a text editor). The result is that the two lines using a "…" are cut off at the offending character.


===Invalid EUC-CN input===
===Invalid EUC-CN input===
Unlike the Japanese version, where non-standard Shift JIS sequences are explicitly allowed in the .fnt files, the Chinese version does not have a code table and relies on a standard EUC-CN encoding, with exactly 8,836 code points (94x94). A proper EUC-CN control sequence consists of two bytes that are both in the range 0xA1-0xFE (single US-ASCII characters are also allowed in theory), and anything else is technically illegal.
Unlike the Japanese version, where non-standard Shift JIS sequences are explicitly allowed in the .fnt files, the Chinese version does not have a code table and relies on a standard EUC-CN encoding, with exactly 8,836 code points (94x94). A proper EUC-CN control sequence consists of two bytes that are both in the range 0xA1-0xFE and anything else is technically illegal (single US-ASCII characters could occur in theory, but are not handled properly by the custom text engine, xfhsm_oni.dll).
 
The text strings in the Chinese version mostly conform to the EUC-CN scheme, but there are two recurrent invalid characters, as well as some instances of non-translated US-ASCII (!!!).
 
====(A3,89)====
The illegal sequence (A3,0x89) occurs only in the SUBTmessages entry '''xdash1''', the original English text being "Face the center of the room and [c.tap the forward key just before pressing and holding it down again (tap W then press and hold W)].".
 
There are five identical (A3,0x89) glyphs at the end of the string, just before the (double) null. All of them end up rendered as ㈢. What happens under the hood is that xfhsm_oni.dll is simply subtracting 161 from both bytes, ending up with (2,-24), which is equivalent to (1,70) and produces the GB 2312 glyph ㈢. The correct EUC-CN code for ㈢ would be (A2,E7), although it is unlikely that this is what the translator meant to write. It is not currently known what the intended glyph was, as it doesn't seem to be a valid control sequence under any common extension of EUC-CN.
 
====(A3,A0)====
The illegal sequence (A3,A0) is much more common than (A3,0x89). It occurs in SUBT entries (both in actual subtitles and in "messages"), as well as in the [[OBD:IGSt|IGSt]] resources of multiple [[OBD:TxtC|TxtC]] (text consoles), two [[OBD:WPge|WPge]] (weapon pages) and one [[OBD:OPge|OPge]] (objective page). Lists of occurrences are provided below.
 
Like for (A3,0x89), the pixel data addressed by the invalid code point remains within the same font, in this case at the (A2,FE) slot, which happens to be blank (and thus indistinguishable from an intentional space glyph).
 
Unlike for (A3,0x89), there are multiple examples to look at, so we can make an informed guess as for what the intended glyph was: either an ordinary ideographic space, (A1,A1), or some variant thereof (such as a non-breaking space).
 
{|
|
{{divhide|&nbsp;List of Chinese SUBTmessages entries containing the (A3,A0) code point|align=left}}
{|border=1 cellspacing=0 cellpadding=3
!Handle
!Original text
!(A3,A0) location
|-valign=top
!xcombo
|To move diagonally, [c.use key combinations like (W+A) or (S+D)].°
|bytes 30-31,34-35
|-valign=top
!c01_50_11
|To perform a somersault escape move, [c.begin running (W,A,D,S) and then press (SHIFT)].°
|bytes 42-43
|-valign=top
!autoprompt_hypo
|Press [c.Q] to pick up HYPO SPRAY.°
|bytes 12-13
|-valign=top
!autoprompt_cell
|Press [c.Q] to pick up [b.ENERGY CELL].°
|bytes 16-17
|-valign=top
!xtabhypo
|Press [c.(TAB)] to use a hypo.°
|bytes 0-1
|}
{{divhide|end}}
|}
 
{|
|
{{divhide|&nbsp;List of Chinese SUBTsubtitles entries containing the (A3,A0) code point|align=left}}
{|border=1 cellspacing=0 cellpadding=3
!Handle
!Original text
!(A3,A0) location
|-valign=top
!01_01_08
|Shinatama:&nbsp;&nbsp;Daodan latency holding at twenty seven point one. Bioplasmic waveforms stable. A class three adrenal spike when you gave the order, but nothing out of the ordinary.°
|bytes 0-1
|-valign=top
!01_01_09
|Kerr:&nbsp;&nbsp;What are you sending her into?°
|bytes 0-1
|-valign=top
!01_01_10
|Griffin:&nbsp;&nbsp;It's a simple bust: in and out. She needs a trial run.°
|bytes 0-1
|-valign=top
!01_03_02
|Griffin:&nbsp;&nbsp;Well done Konoko. Fall back, I'll have you picked up.°
|bytes 0-1
|-valign=top
!01_03_04
|Griffin:&nbsp;&nbsp;Negative. Fall back.°
|bytes 0-1
|-valign=top
!02_05_03
|Barabas:&nbsp;&nbsp;You know it. They aren't getting out of here alive.°
|bytes 0-1
|-valign=top
!02_05_08
|Barabas:&nbsp;&nbsp;I'm ready for anything. You made sure of that.°
|bytes 0-1
|-valign=top
!02_05_09
|Muro:&nbsp;&nbsp;There is always someone stronger. Have you forgotten?°
|bytes 0-1
|-valign=top
!02_05_10
|Barabas:&nbsp;&nbsp;No. I haven't. I'll be careful.°
|bytes 0-1
|-valign=top
!02_05_11
|Muro:&nbsp;&nbsp;See that you are. You know the consequences of failure.°
|bytes 0-1
|-valign=top
!02_05_12
|Receptionist:&nbsp;&nbsp;Please have a seat, someone will be right with you.°
|bytes 0-1
|-valign=top
!14_54_02
|Civilian:&nbsp;&nbsp;Konoko, please don't hurt me. They made me do it, I swear.°
|bytes 44-45
|-valign=top
!15_59_01
|Muro:&nbsp;&nbsp;Welcome sister. I am very impressed with what you have been able to accomplish without drawing on the full power of your Chrysalis. You are capable of so much more. Let me show you...°
|bytes 32-33,36-37
|}
{{divhide|end}}
|}
 
{|
|
{{divhide|&nbsp;List of Chinese [[IGSt]] containing the (A3,A0) code point|align=left}}
{|border=1 cellspacing=0 cellpadding=3
!Owner
!Page
!Original text
!(A3,A0) location
|-valign=top
![[Quotes/Objectives#CHAPTER_08_._AN_INNOCENT_LIFE|OPgelevel_10]]
!align=center|3
|&nbsp;There's no one left to trust.°
|bytes 0-1
|-valign=top
![[Quotes/Weapons#vdg|WPgew6_vdg]]
!
|Hint:&nbsp;Shots disable one or more enemies at close range. Attack or escape while victims are disoriented.°
|bytes 6-7
|-valign=top
![[Quotes/Weapons#scream|WPgew9_scr]]
!
|Hint:&nbsp;The cannon masks its wielder's lifeforce from the entity, but any life that ventures too near it will be drained.°
|bytes 6-7
|-
|colspan=4 bgcolor=silver|
|-valign=top
!TxtClevel_1f
!align=center|1
|Reload............................R (or LEFT MOUSE BUTTON)°
|bytes 16-17
|-valign=top
!TxtClevel_1f
!align=center|2
|Reload............................R (or LEFT MOUSE BUTTON)°
|bytes 16-17
|-
|colspan=4 bgcolor=silver|
|-valign=top
![[Quotes/Consoles/level_2a|TxtClevel_2a]]
!align=center|1
|ENCRYPT SEQUENCE TaL0315-68 seq. 1°
|bytes 28-29,38-39
|-valign=top
![[Quotes/Consoles/level_2b|TxtClevel_2b]]
!align=center|1
|ENCRYPT SEQUENCE TaL0315-68 seq. 2°
|bytes 28-29,38-39
|-valign=top
![[Quotes/Consoles/level_2c|TxtClevel_2c]]
!align=center|1
|ENCRYPT SEQUENCE TaL0315-68 seq. 3°
|bytes 28-29,38-39
|-valign=top
![[Quotes/Consoles/level_2d|TxtClevel_2d]]
!align=center|1
|VOICE ENCRYPT 01.967.23 <Dr. Singh, Earnest M.>°<br />"Dr. Kafelnikov and I have just completed Test Part 483 in the ESS (Environmental Stress Simulator) with the settings prescribed by protocol AT-MOK 64.°
|bytes 26-27<br />bytes 56-57
|-valign=top
![[Quotes/Consoles/level_2e|TxtClevel_2e]]
!align=center|1
|VOICE ENCRYPT 01.965.04 <Dr. Kafelnikov, Roland V.>°
|bytes 26-27
|-
|colspan=4 bgcolor=silver|
|-valign=top
![[Quotes/Consoles/level_3a|TxtClevel_3a]]
!align=center|1
|(ref.TANKER&nbsp;v1.6&nbsp;-&nbsp;1.9)°
|bytes 18-19,28-29,32-33
|-valign=top
![[Quotes/Consoles/level_3d|TxtClevel_3d]]
!align=center|1
|VAGO BIOTECH - Life is for Everyone°<br />(ref.BIOTECHNOLOGY TODAY vol.XXI)°
|bytes 8-9,24-25<br />bytes 24-25
|-
|colspan=4 bgcolor=silver|
|-valign=top
![[Quotes/Consoles/level_4b|TxtClevel_4b]]
!align=center|1
|WCG.subref.AirCOn&nbsp;Region&nbsp;7Dispatch>°
|bytes 34-35,40-41
|-valign=top
![[Quotes/Consoles/level_4c|TxtClevel_4c]]
!align=center|1
|WCG.subref.AirCOn&nbsp;Environmental Update&nbsp;>°
|bytes 34-35,46-47
|-
|colspan=4 bgcolor=silver|
|-valign=top
![[Quotes/Consoles/level_6a|TxtClevel_6a]]
!align=center|1
|WCG.subref.AirCon&nbsp;General Alert&nbsp;>°
|bytes 34-35,44-45
|-valign=top
![[Quotes/Consoles/level_6b|TxtClevel_6b]]
!align=center|1
|WCG.subref.NavCom&nbsp;Advisory >°
|bytes 44-45
|-valign=top
![[Quotes/Consoles/level_6c|TxtClevel_6c]]
!align=center|1
|WCG.subref.Custodial Heads-Up&nbsp;>°
|bytes 44-45
|-
|colspan=4 bgcolor=silver|
|-valign=top
![[Quotes/Consoles/level_8a|TxtClevel_8a]]
!align=center|1
|CLASSIFIED - Clearance Gamma S16 and Above Only>°
|bytes 10-11,22-23,30-31
|-valign=top
![[Quotes/Consoles/level_8b|TxtClevel_8b]]
!align=center|1
|GENERAL ACCESS - Clearance Alpha G1>°
|bytes 24-25
|-valign=top
![[Quotes/Consoles/level_8f|TxtClevel_8f]]
!align=center|1
|GENERAL ACCESS - Clearance Alpha A1>°
|bytes 24-25
|-
|colspan=4 bgcolor=silver|
|-valign=top
![[Quotes/Consoles/level_10b|TxtClevel_10b]]
!align=center|1
|<-> security mainframe&nbsp;<->OVERRIDE°
|bytes 14-15
|-
|colspan=4 bgcolor=silver|
|-valign=top
![[Quotes/Consoles/level_14a|TxtClevel_14a]]
!align=center|1
|Project: 14&nbsp;(1.3.51)°
|bytes 10-11
|-valign=top
![[Quotes/Consoles/level_14b|TxtClevel_14b]]
!align=center|1
|Project: 14&nbsp;(9.1.28)°
|bytes 10-11
|-
|colspan=4 bgcolor=silver|
|-valign=top
![[Quotes/Consoles/level_18a|TxtClevel_18a]]
!align=center|2
|TCTFdb88\sld\zZ1 Update: Omega Security Vault Retrofit°
|bytes 32-33
|-valign=top
![[Quotes/Consoles/level_18b|TxtClevel_18b]]
!align=center|1<br/><br/>2<br/>3<br/>4
|<<Clearance Theta K12 and Above Only>>°<br/>TCTF32\sld\taL15 Shinatama/Konoko Relationship Analysis°<br />TCTF32\sld\taL15 Shinatama/Konoko Relationship Analysis [cont]°<br />TCTF32\sld\taL15 Shinatama/Konoko Relationship Analysis [cont]°<br />TCTF32\sld\taL15 Shinatama/Konoko Relationship Analysis [cont]°
|bytes 8-9,20-21,28-29<br/>bytes 32-33<br/>bytes 32-33<br/>bytes 32-33<br/>bytes 32-33
|-valign=top
![[Quotes/Consoles/level_18c|TxtClevel_18c]]
!align=center|1
| -Internal car park facilities closed.&nbsp;&nbsp;Traffic redirected to security kiosk A as per protocol Theta K12.°
|bytes 34-35
|-valign=top
![[Quotes/Consoles/level_18d|TxtClevel_18d]]
!align=center|1
|SECURITY ALERT&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;19:35:06°
|bytes 8-9,10-11,12-13,14-15,16-17,18-19,20-21,22-23
|-
|colspan=4 bgcolor=silver|
|-valign=top
![[Quotes/Consoles/level_19b|TxtClevel_19b]]
!align=center|1
|5)&nbsp;&nbsp;Low Orbit satellite control signal burrow established°<br />6)&nbsp;&nbsp;STURMANDERUNG mountain compound construction°<br />7)&nbsp;&nbsp;Daodan core technology (ref.TITAN\ssob)°
|bytes 4-5,6-7<br />bytes 4-5,6-7<br />bytes 4-5,6-7
|-valign=top
![[Quotes/Consoles/level_19c|TxtClevel_19c]]
!align=center|1
|9)&nbsp;&nbsp;&nbsp;Daodan core technology (ref.TITAN\uwlb)°<br />10)&nbsp;&nbsp;STURMANDERUNG mountain compound construction°<br />11)&nbsp;&nbsp;Symbiote candidate selection and implantation°
|bytes 4-5,6-7,8-9<br />bytes 6-7,8-9<br />bytes 6-7,8-9
|-valign=top
![[Quotes/Consoles/level_19d|TxtClevel_19d]]
!align=center|1<br/><br/><br/>2<br/><br/><br/><br/>3
|13)&nbsp;&nbsp;ACC installation modification COMPLETE°<br />14)&nbsp;&nbsp;STURMANDERUNG mountain compound COMPLETE°<br />15)&nbsp;&nbsp;STURMANDERUNG transmitter array COMPLETE°<br />1)&nbsp;Initialize°<br />2)&nbsp;Test with current settings°<br />3)&nbsp;Edit current settings°<br />4)&nbsp;Abort current process°<br />Frequency: 1002&nbsp;&nbsp;&nbsp;Amplitude: 233&nbsp;&nbsp;Mode: 1°<br />input>Frequency&nbsp;=&nbsp;9999,&nbsp;Amplitude&nbsp;=&nbsp;9999,&nbsp;Mode:&nbsp;9999°
|bytes 6-7,8-9<br />bytes 6-7,8-9<br />bytes 6-7,8-9<br />bytes 4-5<br />bytes 4-5<br />bytes 4-5<br />bytes 4-5<br />bytes 14-15,16-17,18-19,32-33,34-35<br />bytes 10-11,14-15,26-27,32-33,36-37,48-49,56-57
|-valign=top
![[Quotes/Consoles/level_19e|TxtClevel_19e]]
!align=center|1
|1)&nbsp;Stop Point A operators must coordinate Blue tunnel for two-way traffic.°<br />2)&nbsp;Reprimand all personnel for sloppy operating behavior.°<br />3)&nbsp;Replace all remaining doors with Musashi DX1000.°<br />
|bytes 4-5<br />bytes 4-5<br />bytes 4-5
|}
{{divhide|end}}
|}
 
Without a proper sanity check, some illegal code points will clearly result in pixel data being loaded not from a valid glyph region, but from irrelevant memory that belongs either to xfhsm_oni.dll or to the main Oni engine, resulting in garbled text. Memory corruption or segmentation fault (access violation) may occur if similar out-of-bounds pointers are used when rendering glyph textures. Possibly invalid EUC-CN input is what is causing most Chapters of the Chinese Oni version to crash on modern Windows systems, although this has not been investigated thoroughly.
 
====Non-translated US-ASCII====
ASCII strings are much more harmful when handled by xfhsm_oni.dll, as compared to the two invalid code points (A3,A0) and (A3,0x89), because pairs of US-ASCII bytes, misinterpreted as EUC-CN code points, end up referencing completely strange memory regions (outside the region occupied by xf_font.dat). Unfortunately, there are a few ASCII strings that xfhsm_oni.dll can come across even during regular gameplay, and many more arise if one allows for modding.
=====Count on it=====
The following string in SUBTsubtitles has not been translated into Chinese:
:Barabas:&nbsp;&nbsp;Count on it. When I get through with them they're...
Being encoded as plain US-ASCII, this string is entirely illegal considering the limited implementation of EUC-CN by xfhsm_oni.dll, which does not detect US-ASCII as single-byte code points and keeps interpreting pairs of ASCII bytes as (invalid) quwei indices. Through lucky coincidence, the string has an even number of printable bytes, so that the null character is still in a suitable place for terminating the string (the EUN-CN parser will see it as a null lead-byte and will not keep reading further data). However, the string still consists of 31 invalid two-byte code points (not counting the null). As a further lucky coincidence, this string is never read by Oni's engine, because the subtitle's handle (02_05_05) is one of those that have been clobbered by the spurious double-null (see [[#Chinese_SUBT_issues|"Chinese_SUBT_issues"]] below). If it wasn't for the clobbering, the game would crash upon displaying this subtitle.
 
=====Pre-beta ONLDs=====
The "level definitions" ([[ONLD]]s) of [[Pre-beta_content#Cut_levels|pre-beta levels]] are never seen in vanilla Oni, but would appear in the "Load Game" dialog if a valid level#_Final.dat were to be supplied at startup (e.g. by a mod) and unlocked in persist.dat. Since xfhsm_oni.dll does not actually support US-ASCII, any untranslated ONLDs are potentially disruptive.
 
The following 8 pre-beta ONLDs were fully translated: "The Airport Part Deux" (level_05), "Obsolete" (level_07), "The Arena of Pain" (level_30), "Crossing Zone" (level_31), "Pit" (level_32), "Crossing Zone Too" (level_33), "Capture" (level_34), "Territories" (level_35).
 
The following 8 pre-beta ONLDs remained as US-ASCII: "Test_Stuff" (level_36), "AlexTestSite" (level_55), "Experimental_II" (level_66), "MARTY'S SOUND CORRIDOR" (level_68), "FiringRange" (level_71), "One Room" (level_77), "One Room 2" (level_88) and "Test Barn II" (level_99).
 
The most awkward case is that of the string "BGI HQ" (ONLDlevel_16), which was translated only partly: "HQ" was replaced with a pair of GB 2312 glyphs, but the first four characters "BGI " remained as plain ASCII (i.e., as two illegal EUC-CN code points).


The text strings in the Chinese version mostly conform to the EUC-CN scheme, except for the (A1,A0) sequence, which occurs in a few subtitles and is rendered with a blank glyph (i.e., a space btween valid glyphs, undistinguishable from an ordinary ideographic space), apparently due to some kind of wraparound. At the time of writing it is not known what was meant by the (A1,A0) sequence, as is doesn't seem to be a valid control sequence under any common extension of EUC-CN.
=====Cheat messages=====
None of the 38 cheat messages was translated into Chinese (!!!), so that means 38 more strings entirely made of illegal EUC-CN code points. Any time a cheat is entered, xfhsm_oni.dll attempts to display one of the following strings, which almost always causes a crash on modern Windows systems. Note how the null byte does not interrupt the input if it occurs in a trail-byte position.
{|
|
{{divhide|&nbsp;List of invalid EUC-CN strings triggered by cheats|align=left}}
{|border=1 cellspacing=0 cellpadding=3
!Cheat
!Invalid double-byte arrays (ASCII)
|-valign=top
!shapeshifter
|<tt>Ch<u>an</u>ge<u> C</u>ha<u>ra</u>ct<u>er</u>s <u>En</u>ab<u>le</u>d°<br /><u>Ch</u>an<u>ge</u> C<u>ha</u>ra<u>ct</u>er<u>s </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!liveforever
|<tt>In<u>vi</u>nc<u>ib</u>il<u>it</u>y <u>En</u>ab<u>le</u>d°<br /><u>In</u>vi<u>nc</u>ib<u>il</u>it<u>y </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!touchofdeath
|<tt>Om<u>ni</u>po<u>te</u>nc<u>e </u>En<u>ab</u>le<u>d°</u>to<u>uc</u>ho<u>fd</u>ea<u>th</u><br /><u>Om</u>ni<u>po</u>te<u>nc</u>e <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!canttouchthis
|<tt>Un<u>st</u>op<u>pa</u>bl<u>e </u>En<u>ab</u>le<u>d°</u>ca<u>nt</u>to<u>uc</u>ht<u>hi</u>s°<br /><u>Un</u>st<u>op</u>pa<u>bl</u>e <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!fatloot
|<tt>Fa<u>t </u>Lo<u>ot</u> R<u>ec</u>ei<u>ve</u>d°</tt>
|-valign=top
!glassworld
|<tt>Gl<u>as</u>s <u>Fu</u>rn<u>it</u>ur<u>e </u>En<u>ab</u>le<u>d°</u>gl<u>as</u>sw<u>or</u>ld<br /><u>Gl</u>as<u>s </u>Fu<u>rn</u>it<u>ur</u>e <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!winlevel
|<tt>In<u>st</u>an<u>tl</u>y <u>Wi</u>n <u>Le</u>ve<u>l°</u>wi<u>nl</u>ev<u>el</u></tt>
|-valign=top
!loselevel
|<tt>In<u>st</u>an<u>tl</u>y <u>Lo</u>se<u> L</u>ev<u>el</u></tt>
|-valign=top
!bighead
|<tt>Bi<u>g </u>He<u>ad</u> E<u>na</u>bl<u>ed</u><br /><u>Bi</u>g <u>He</u>ad<u> D</u>is<u>ab</u>le<u>d°</u></tt>
|-valign=top
!minime
|<tt>Mi<u>ni</u> M<u>od</u>e <u>En</u>ab<u>le</u>d°<br /><u>Mi</u>ni<u> M</u>od<u>e </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!superammo
|<tt>Su<u>pe</u>r <u>Am</u>mo<u> M</u>od<u>e </u>En<u>ab</u>le<u>d°</u>su<u>pe</u>ra<u>mm</u>o°<br /><u>Su</u>pe<u>r </u>Am<u>mo</u> M<u>od</u>e <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!reservoirdogs
|<tt>La<u>st</u> M<u>an</u> S<u>ta</u>nd<u>in</u>g <u>En</u>ab<u>le</u>d°<br /><u>La</u>st<u> M</u>an<u> S</u>ta<u>nd</u>in<u>g </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!roughjustice
|<tt>Ga<u>tl</u>in<u>g </u>Gu<u>ns</u> E<u>na</u>bl<u>ed</u><br /><u>Ga</u>tl<u>in</u>g <u>Gu</u>ns<u> D</u>is<u>ab</u>le<u>d°</u></tt>
|-valign=top
!chenille
|<tt>Da<u>od</u>an<u> P</u>ow<u>er</u> E<u>na</u>bl<u>ed</u><br /><u>Da</u>od<u>an</u> P<u>ow</u>er<u> D</u>is<u>ab</u>le<u>d°</u></tt>
|-valign=top
!behemoth
|<tt>Go<u>dz</u>il<u>la</u> M<u>od</u>e <u>En</u>ab<u>le</u>d°<br /><u>Go</u>dz<u>il</u>la<u> M</u>od<u>e </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!elderrune
|<tt>Re<u>ge</u>ne<u>ra</u>ti<u>on</u> E<u>na</u>bl<u>ed</u><br /><u>Re</u>ge<u>ne</u>ra<u>ti</u>on<u> D</u>is<u>ab</u>le<u>d°</u></tt>
|-valign=top
!moonshadow
|<tt>Ph<u>as</u>e <u>Cl</u>oa<u>k </u>En<u>ab</u>le<u>d°</u>mo<u>on</u>sh<u>ad</u>ow<br /><u>Ph</u>as<u>e </u>Cl<u>oa</u>k <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!munitionfrenzy
|<tt>We<u>ap</u>on<u>s </u>Lo<u>ck</u>er<u> C</u>re<u>at</u>ed</tt>
|-valign=top
!fistsoflegend
|<tt>Fi<u>st</u>s <u>Of</u> L<u>eg</u>en<u>d </u>En<u>ab</u>le<u>d°</u>fi<u>st</u>so<u>fl</u>eg<u>en</u>d°<br /><u>Fi</u>st<u>s </u>Of<u> L</u>eg<u>en</u>d <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!killmequick
|<tt>Ul<u>tr</u>a <u>Mo</u>de<u> E</u>na<u>bl</u>ed<br /><u>Ul</u>tr<u>a </u>Mo<u>de</u> D<u>is</u>ab<u>le</u>d°<u>Ul</u>tr<u>a </u>Mo<u>de</u> E<u>na</u>bl<u>ed</u></tt>
|-valign=top
!carousel
|<tt>Sl<u>ow</u> M<u>ot</u>io<u>n </u>En<u>ab</u>le<u>d°</u>ca<u>ro</u>us<u>el</u><br /><u>Sl</u>ow<u> M</u>ot<u>io</u>n <u>Di</u>sa<u>bl</u>ed</tt>
|}
{{divhide|end}}
|}


Another illegal sequence is (0xA3,0x89), which occurs only in the SUBTmessages entry xdash1 (five identical glyphs at the end of the string) and is somehow rendered as a ㈢, which would normally be encoded with (A2,E7). Such an improbable substitution is likely unintentional, and it is not known what the intended glyph was.
=====Debug printout and console=====
Oni has a well-hidden [[Developer Mode]] in which it can print informational output directly to the screen instead of writing to a text file. There are fully automatic warnings from the engine (e.g. about too many visible polygons or too many particles), or more or less regular printout (e.g., about a character's current animation status) that can be toggled through [[BSL:Variables|script variables]], or custom "dprint" messages that the developers used for visual feedback while testing [[BSL|scripts]]. Dev mode also has a togglable command line ("CMD: ") for entering script commands in real time. Both the debug printout and the command line use the main glyph-rendering pipeline (intercepted by xfhsm_oni.dll), with a small font size. This makes Dev mode essentially unusable in Chinese Oni, as most if not all of the debug printout or console output will be plain ASCII.  


At the time of writing the apparent wraparound behavior has not been investigated thoroughly, but it is established that some illegal code points are not recovered to a valid glyph at all, and instead result in garbled text or a crash. Possibly invalid EUC-CN input is what is causing most Chapters of the Chinese Oni version to crash on modern Windows systems (through varying degrees of memory corruption), although this has not been investigated thoroughly either.
Interestingly, Oni ''does'' have some primitive debug printout that is not intercepted by xfhsm_oni.dll and thus is displayed normally using the smallest-sized TSFT from level0_Final's TSFFTahoma. All (most?) of the primitive printout is available without Dev mode. There is the Ctrl+Shift+Y hotkey (FPS display), some HUD-like overlays toggled by [[BSL:Variables|script variables]] (e.g., chr_debug_characters), and finally some 3D sprites added to the game (e.g., health indicators or name labels displayed above a character's head).


===Over-tall text===
===Over-tall text===
Although not strictly speaking a font issue, some of Oni's text fails to render because it doesn't fit vertically into a fixed-size frame (such as a [[:File:DATA_CONSOLE.png|text console]]). This is known to happen for [[Quotes/Consoles/level_1e|These]] [[Quotes/Consoles/level_8b|Two]] consoles in the English version, and possibly for other screens in other language versions.
Although not strictly speaking a font issue, some of Oni's text fails to render because it doesn't fit vertically into a fixed-size frame (such as a [[:Image:DATA_CONSOLE.png|text console]]). This is known to happen for [[Quotes/Consoles/level_1e|These]] [[Quotes/Consoles/level_8b|Two]] consoles in the English version, and possibly for other screens in other language versions.


===Over-long text===
===Over-long text===
====Screen resolutions in Chinese Oni====
Although Chinese text strings typically have a much smaller number of glyphs than English originals, this is not always the case. The Chinese glyphs are also much wider on average, with each glyph taking up 16x16 pixels, and so there are situations where the rendered Chinese line is much wider than the English original, no longer fitting on one line as intended by the context.
Although Chinese text strings typically have a much smaller number of glyphs than English originals, this is not always the case. The Chinese glyphs are also much wider on average, with each glyph taking up 16x16 pixels, and so there are situations where the rendered Chinese line is much wider than the English original, no longer fitting on one line as intended by the context.


This is only known to cause a problem for the "resolution" item in the Options menu (a WMM_ generated at runtime). The actual dropdown list is wide enough to accommodate even the longest resolution strings, but the currently selected resolution appears in a small window that is only 150 pixels wide, too narrow even for the shortest resolution string "640×480×16位" (which needs 176 pixels). As a result the active resolution is always displayed on two lines, no longer fitting into the frame vertically and thus unreadable.
This is known to cause a problem for the "resolution" item in the Options menu (a WMM_ generated at runtime). The actual dropdown list is wide enough to accommodate even the longest resolution strings, but the currently selected resolution appears in a small window that is only 150 pixels wide, too narrow even for the shortest resolution string "640×480×16位" (which needs 176 pixels). As a result the active resolution is always displayed on two lines, no longer fitting into the frame vertically and thus unreadable.
 
The Japanese version displays screen resolutions as US-ASCII using TSFFTahoma (variable-width).
 
====Long subtitles in Chinese Oni====
Some of Oni's subtitles are tirades that are too long to fit on the screen horizontally. In such situations the Chinese engine will displays the subtitle on multiple lines, placing line breaks at arbitrary positions instead of at ideographic spaces or after punctuation (if any). For not-so-clear reasons, automatic line breaks will often appear near the ''start'' of a long subtitle string, right in the middle of a speaker's name, which looks particularly awkward since there is usually a space nearby (after the colon).
 
The Japanese Oni displays long subtitles with arbitrary breaks as well, but consistently aligns the beginning of the string to the left of the screen, achieving an ordinary/intuitive paragraph-like look. (Since the Japanese engine has support for US-ASCII, an ASCII space could theoretically be inserted into Japanese text, allowing for line breaks at specific positions, e.g., after punctuation or between semantic groups.)
 
====Long UI text in Chinese Oni====
Oni's ingame UI is stylized as a futuristic computer screen (it is supposed to be Konoko's "Data Comlink") and has fixed-width frames reserved for text display. Large amounts of text can appear in the text console frame, or in the upper and lower sections of the Help menu (F1). It turns out that these frames have enough width to accommodate 26.5 16x16 glyphs (text console frame) or 19.5 glyphs (Help menu frame), therefore one would expect the Chinese text renderer to wrap lines around at 26 or 19 characters, respectively. Unfortunately the lines are wrapped around at 27 and 20, so the right half of the last glyph on every long line is cut off.
 
The Japanese Oni consistently adjusts the carriage return depending on the glyph dimensions (font size), so that the last glyph in a wrapped-around line always fits into the frame and is displayed completely. (The Japanese engine also allows for variable-width US-ASCII characters, and seems to correctly handle the carriage return for any mix of JIS and ASCII.)


===Chinese SUBT issues===
===Chinese SUBT issues===
Line 906: Line 1,315:
!Original of culprit text (null char ° included)
!Original of culprit text (null char ° included)
!Victim handle
!Victim handle
!Original of victim text (null char ° included)
|-
|-
!01_01_11
!01_01_11
|Kerr:&nbsp;&nbsp;Good luck Konoko.°
|Kerr:&nbsp;&nbsp;Good luck Konoko.°
!<strike>0</strike>1_02_01
!<strike>0</strike>1_02_01
|Konoko:&nbsp;&nbsp;[[Chung]]...°
|-
|-
!01_03_07
!01_03_07
|Griffin:&nbsp;&nbsp;All right Konoko. I'm giving you a shot at this.°
|Griffin:&nbsp;&nbsp;All right Konoko. I'm giving you a shot at this.°
!<strike>0</strike>1_03_07
!<strike>0</strike>1_03_08
|Konoko:&nbsp;&nbsp;Thank you Sir. I won't let you down.°
|-
|-
!02_05_04
!02_05_04
|Muro:&nbsp;&nbsp;Let me know when things start to get messy.°
|Muro:&nbsp;&nbsp;Let me know when things start to get messy.°
!<strike>0</strike>2_05_05
!<strike>0</strike>2_05_05
|Barabas:&nbsp;&nbsp;Count on it. When I get through with them they're...°
|-
|-
!02_06_02
!02_06_02
|Griffin:&nbsp;&nbsp;Explain.°
|Griffin:&nbsp;&nbsp;Explain.°
!<strike>0</strike>2_06_03
!<strike>0</strike>2_06_03
|Konoko:&nbsp;&nbsp;This whole place - it's a trap. They planted that datapad.°
|-
|-
!02_09_03
!02_09_03
|Griffin:&nbsp;&nbsp;So she's still stable?°
|Griffin:&nbsp;&nbsp;So she's still stable?°
!<strike>0</strike>2_09_04
!<strike>0</strike>2_09_04
|Kerr:&nbsp;&nbsp;So far as we can tell - yes, but prolonged stress could be dangerous.°
|-
|-
!03_10_01
!03_10_01
|Barabas:&nbsp;&nbsp;Let's get it on!°
|Barabas:&nbsp;&nbsp;Let's get it on!°
!<strike>0</strike>3_10_02
!<strike>0</strike>3_10_02
|Barabas:&nbsp;&nbsp;You're strong, but this isn't over. Not by a long shot!°
|-
|-
!03_11_01
!03_11_01
|Barabas:&nbsp;&nbsp;She's with them.°
|Barabas:&nbsp;&nbsp;She's with them.°
!<strike>0</strike>3_11_02
!<strike>0</strike>3_11_02
|Muro:&nbsp;&nbsp;Who is with them?°
|-
|-
!04_17_03
!04_17_03
|Muro:&nbsp;&nbsp;I can't allow that.°
|Muro:&nbsp;&nbsp;I can't allow that.°
!<strike>0</strike>4_17_04
!<strike>0</strike>4_17_04
|Civilian:&nbsp;&nbsp;You are with the TCTF, right? I wish there was something I could do to help.°
|-
|-
!07_22_01
!07_22_01
|Konoko:&nbsp;&nbsp;Showtime...°
|Konoko:&nbsp;&nbsp;Showtime...°
!<strike>0</strike>7_23_01
!<strike>0</strike>7_23_01
|Syndicate Henchman:&nbsp;&nbsp;We've disabled the power substations. Static defenses are down. All units converge.°
|-
|-
!07_26_15
!07_26_15
|Konoko:&nbsp;&nbsp;Thanks.°
|Konoko:&nbsp;&nbsp;Thanks.°
!<strike>0</strike>7_26_16
!<strike>0</strike>7_26_16
|Konoko:&nbsp;&nbsp;Things aren't looking good.°
|-
|-
!07_26_17
!07_26_17
|Cop:&nbsp;&nbsp;No, we haven't secured a single area -- not even our armory.°
|Cop:&nbsp;&nbsp;No, we haven't secured a single area -- not even our armory.°
!<strike>0</strike>7_26_18
!<strike>0</strike>7_26_18
|Civilian:&nbsp;&nbsp;I told them we'd just make things worse if we tried to use these guns.  You should be the one using this!°
|-
|-
!08_27_03
!08_27_03
|Konoko:&nbsp;&nbsp;This is personal.°
|Konoko:&nbsp;&nbsp;This is personal.°
!<strike>0</strike>8_27_04
!<strike>0</strike>8_27_04
|Griffin:&nbsp;&nbsp;Which is precisely why you should have nothing to do with it.°
|-
|-
!09_31_02
!09_31_02
|Shinatama:&nbsp;&nbsp;You are not who you think you are.°
|Shinatama:&nbsp;&nbsp;You are not who you think you are.°
!<strike>0</strike>9_31_03
!<strike>0</strike>9_31_03
|Konoko:&nbsp;&nbsp;What?°
|-
|-
!09_31_03
!09_31_03
|Konoko:&nbsp;&nbsp;What?°
|Konoko:&nbsp;&nbsp;What?°
!<strike>0</strike>9_31_04
!<strike>0</strike>9_31_04
|Shinatama:&nbsp;&nbsp;You are Mai Hasegawa. Your father...°
|-
|-
!09_31_24
!09_31_24
|Konoko:&nbsp;&nbsp;No!°
|Konoko:&nbsp;&nbsp;No!°
!<strike>0</strike>9_31_25
!<strike>0</strike>9_31_25
|Shinatama:&nbsp;&nbsp;Detonation in twenty, nineteen, eighteen...°
|-
|-
!11_40_07
!11_40_07
|Mukade:&nbsp;&nbsp;We shall see...°
|Mukade:&nbsp;&nbsp;We shall see...°
!<strike>c</strike>11_41_01konoko
!<strike>c</strike>11_41_01konoko
|Konoko:&nbsp;&nbsp;My father's file. Griffin encouraged me not to look too deeply into my past. It seems like there's a lot he didn't want me to know... I could feel the Ninja and I knew he could feel me. Why? What am I becoming? Are we the same?°
|-
|-
!12_46_02
!12_46_02
|Konoko:&nbsp;&nbsp;What?°
|Konoko:&nbsp;&nbsp;What?°
!<strike>1</strike>2_46_03
!<strike>1</strike>2_46_03
|Konoko:&nbsp;&nbsp;Mai Hasegawa. That's your name, isn't it?°
|-
|-
!12_46_06
!12_46_06
|Konoko:&nbsp;&nbsp;Leave me alone...°
|Konoko:&nbsp;&nbsp;Leave me alone...°
!<strike>1</strike>2_46_07
!<strike>1</strike>2_46_07
|Konoko:&nbsp;&nbsp;Can't do that. You are a Rogue Agent. I'm under orders to bring you in.°
|-
|-
!13_65_05
!13_65_05
|Kerr:&nbsp;&nbsp;This may sting a bit...°
|Kerr:&nbsp;&nbsp;This may sting a bit...°
!<strike>1</strike>3_65_06
!<strike>1</strike>3_65_06
|Konoko:&nbsp;&nbsp;What are you doing?°
|-
|-
!13_65_20
!13_65_20
|Konoko:&nbsp;&nbsp;Griffin? But why?°
|Konoko:&nbsp;&nbsp;Griffin? But why?°
!<strike>1</strike>3_65_21
!<strike>1</strike>3_65_21
|Kerr:&nbsp;&nbsp;Your father and I were criminals, funded by the Syndicate. We couldn't get backing from any legitimate source. They left us alone for the most part. We didn't think they were interested in our work. We were wrong.°
|-
|-
!13_65_25
!13_65_25
|Kerr:&nbsp;&nbsp;Muro.°
|Kerr:&nbsp;&nbsp;Muro.°
!<strike>1</strike>3_65_26
!<strike>1</strike>3_65_26
|Konoko:&nbsp;&nbsp;You've got to be kidding.°
|-
|-
!13_65_36
!13_65_36
|Konoko:&nbsp;&nbsp;What?°
|Konoko:&nbsp;&nbsp;What?°
!<strike>1</strike>3_65_37
!<strike>1</strike>3_65_37
|Kerr:&nbsp;&nbsp;You are changing...into a more powerful, resilient version of yourself. But whatever your final form, it is an expression of your true nature...°
|-
|-
!13_66_03
!13_66_03
|Konoko:&nbsp;&nbsp;The crane controls...°
|Konoko:&nbsp;&nbsp;The crane controls...°
!<strike>1</strike>3_66_04
!<strike>1</strike>3_66_04
|Konoko:&nbsp;&nbsp;I just hope the Chrysalis can keep me alive...°
|-
|-
!14_52_02
!14_52_02
|Konoko:&nbsp;&nbsp;Gotcha.°
|Konoko:&nbsp;&nbsp;Gotcha.°
!<strike>1</strike>4_52_03
!<strike>1</strike>4_52_03
|Griffin:&nbsp;&nbsp;This is a bad idea. Put the gun down.°
|-
|-
!14_52_06
!14_52_06
|Konoko:&nbsp;&nbsp;For you? Badly?°
|Konoko:&nbsp;&nbsp;For you? Badly?°
!<strike>1</strike>4_52_07
!<strike>1</strike>4_52_07
|Griffin:&nbsp;&nbsp;That's your call. You can pull the trigger or you can walk away. It's up to you.°
|-
|-
!00_01_09
!00_01_09
|Shinatama:&nbsp;&nbsp;Super!°
|Shinatama:&nbsp;&nbsp;Super!°
!<strike>c</strike>00_01_10Shinatama
!<strike>c</strike>00_01_10Shinatama
|Shinatama:&nbsp;&nbsp;Great!°
|-
|-
!c00_01_10Shinatama
!c00_01_10Shinatama
|Shinatama:&nbsp;&nbsp;Great!°
|Shinatama:&nbsp;&nbsp;Great!°
!<strike>0</strike>0_01_11
!<strike>0</strike>0_01_11
|Shinatama:&nbsp;&nbsp;O.K.!°
|-
|-
!civmale3_trigger
!civmale3_trigger
|Civilian:&nbsp;&nbsp;Hi there!°
|Civilian:&nbsp;&nbsp;Hi there!°
!<strike>c</strike>00_01_100shinatama
!<strike>c</strike>00_01_100shinatama
|Shinatama:&nbsp;Kill me Konoko...please!°
|-
|-
!c00_01_101shinatama
!c00_01_101shinatama
|Shinatama:&nbsp;I'm sorry...so sorry! °
|Shinatama:&nbsp;I'm sorry...so sorry! °
!<strike>c</strike>00_01_102shinatama
!<strike>c</strike>00_01_102shinatama
|Shinatama:&nbsp;Forgive me...°
|}
|}
{{divhide|end}}
{{divhide|end}}
|}
|}
The systematic nature of this anomaly suggests that the Chinese team were careful not to exceed the string length of the original, and merely overlooked the extra null char (and of course didn't check the ingame rendition of the subtitles all that thoroughly).  
The systematic nature of this anomaly suggests that the Chinese team were careful not to exceed the string length of the original, and merely overlooked the extra null char (and of course didn't check the ingame rendition of the subtitles all that thoroughly).
 
The good news (for anyone who cares about Chinese subtitles) is that the double-null-char is actually not needed, and strings terminate just fine if the affected handles are restored. Of course this still leaves the issue of invalid EUC-CN code points.




{{OBD}}
{{OBD}}

Latest revision as of 23:41, 26 March 2024

Originally created in English, Oni has been translated into the following seven languages: French, Italian, Spanish, German, Russian, Japanese and Chinese.

(An overview of the known language versions can be found HERE, whereas localized content is detailed HERE.)

Depending on the language version, vanilla Oni uses one of the following five encodings to render text:

  • The original US version uses a trimmed-down Mac OS Roman code page that is effectively limited to US-ASCII (96 code points used, 256 available).
  • European localizations (UK English, French, Italian, Spanish, German) use a custom version of Mac OS Roman (192 code points used, 256 available).
  • The Russian localization uses a (nearly) full implementation of the Windows-1251 (Cyrillic) code page (224 code points used, 256 available).
  • The Chinese localization uses the EUC-CN implementation of GB 2312 (7,668 code points used, 8,836 available).
  • The Japanese localization uses 1,357 code points mostly conforming to the Shift JIS implementation of JIS X 0208.

Properties of the fonts that are eventually used to render the text (via the encoding) are briefly described throughout the page.

(A more thorough overview of the glyphs can be found HERE.)

Encodings

US English

Below is the code page implemented by TSFFTahoma in the US English version of Oni. It is based on Mac OS Roman ("MacRoman" for short), but with two differences:

  • Of the 223 printable glyphs provided by MacRoman, 42 are not implemented in TSFFTahoma (shown as grey-on-black).
  • Control point 0x7F (a typically non-printable "delete" character) has a visible box-like glyph (◻) in this implementation.
  ...0 ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...A ...B ...C ...D ...E ...F
0x2... SP ! " # $ % & ' ( ) * + , - . /
0x3... 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0x4... @ A B C D E F G H I J K L M N O
0x5... P Q R S T U V W X Y Z [ \ ] ^ _
0x6... ` a b c d e f g h i j k l m n o
0x7... p q r s t u v w x y z { | } ~
0x8... Ä Å Ç É Ñ Ö Ü á à â ä ã å ç é è
0x9... ê ë í ì î ï ñ ó ò ô ö õ ú ù û ü
0xA... ° £ § ß ® © ´ ¨ Æ Ø
0xB... ± Ұ µ π ª º Ω æ ø
0xC... ¿ ¡ ¬ ƒ « »
NB
SP
À Ã Õ Œ œ
0xD... ÷ ÿ Ÿ ¤
0xE... · Â Ê Á Ë È Í Î Ï Ì Ó Ô
0xF... Platform-Mac.png Ò Ú Û Ù ı ˆ ˜ ¯ ̆ ̇ ̊ ̧ ̋ ̨ ̌
Minor notes
  • The MacRoman layout was apparently "borrowed" before 1998, when Mac OS 8.5 came out and the international currency sign a.k.a. scarab (¤), at 0xDB, was replaced with the euro symbol (€).
  • The actual font (see HERE) has some unusual typographical features, such as a single-stroke Yen/Yuan symbol (Ұ) and a vertical-stroke cent symbol similar to Unicode's Fullwidth Cent Sign (¢) character as seen in Windows Arial (note to Mac users: don't be confused, as this character will appear with a diagonal stroke on your system like the regular '¢' character).
Major notes
  • Some of the removed glyphs (most importantly ß, ù and û, but also Ê, Ú and ú) occur in common European languages. This made the US TSFFTahoma unsuitable for EFIGS localizations, requiring the creation of a new version (see below).
  • The US engine actually cannot interpret any code points beyond the US-ASCII range (first 6 rows, white background), notably failing on 0xC9's "…". This is because of a nominal but unused provision for Asian text encodings. See "Ellipsis issue" below for details.



European

The code page used by the five Western European versions (UK English, French, German, Spanish and Italian) is slightly different from the trimmed-down Mac OS Roman.

  • It tends to the needs of European localizations by adding back the following characters:
    German ß; French Ê and û; French/Italian ù; Spanish/Italian Ú and ú (relatively rare).
N.B. The characters Æ and ÿ are not reinstated, despite their (very rare) occurrence in French script.
  • Awkwardly enough, the six characters are not restored in their original positions (grey-on-black), but take the place of math symbols.
    Four more "math" positions are inexplicably filled with three duplicate characters (œ, ¡ and ª) and a truly enigmatic ʖ̇ , which doesn't seem to occur in any known language and has no dedicated code point in Unicode (the character you see here was constructed from Unicode's U+0296 Latin Letter Inverted Glottal Stop (ʖ) plus U+0307 Combining Dot Above.
N.B. The broken italic font variants (see HERE) do not fully implement the 10 new glyphs and use a regular question mark instead of the ʖ̇.
  ...0 ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...A ...B ...C ...D ...E ...F
0x2... SP ! " # $ % & ' ( ) * + , - . /
0x3... 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0x4... @ A B C D E F G H I J K L M N O
0x5... P Q R S T U V W X Y Z [ \ ] ^ _
0x6... ` a b c d e f g h i j k l m n o
0x7... p q r s t u v w x y z { | } ~
0x8... Ä Ç É Ñ Ö Ü á à â ä ã å ç é è
0x9... ê ë í ì î ï ñ ó ò ô ö ú ù û ü
0xA... £ § ß ® © ´ ¨ Ø
0xB... ± Ұ µ Ê Ú ù ú û ª ß œ æ ø
0xC... ¿ ¡ ¬ ¡ ƒ ʖ̇ ª « » À Õ Œ œ
0xD... ÷ Ÿ ¤
0xE... Â Ê Á Ë È Í Î Ï Ì Ó Ô
0xF... Ò Ú Û Ù ˆ ˜ ¯

Coincidentally, with the 10 new glyphs, the European code page has exactly 96 glyphs in the US-ASCII half and 96 in the extension half (blue).

N.B. Unlike the US version, all five Western European versions (including UK English) are able to render the full extended ASCII set.



Cyrillic

In the Russian version of Oni, TSFFTahoma implements the Windows-1251 (Cyrillic) code page, with some deviations.

  • The character 0x98, normally non-printable, is implemented as a visible box glyph (☐), slightly larger than 0x7F.
  • The character 0x81, normally a "Ѓ" glyph, is replaced with a thin space of inconsistent size (2px wide for all fonts, 3px for 13pt regular and 16pt regular).
  • The character 0xA0, normally a non-breaking space, is a space of not-so-consistent size (anywhere from single to triple width, depending on the font).
  • The character 0xAD, normally a soft hyphen, is a visible hyphen (similar to the hyphen-minus, 0x2D) for 7pt fonts, and an inconsistently sized space for other fonts.
    (Oni's engine could in theory reserve a special treatment for soft hyphens and non-breaking spaces, specified in TSFLRoman, but in practice there is no such functionality.)
  ...0 ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...A ...B ...C ...D ...E ...F
0x2... SP ! " # $ % & ' ( ) * + , - . /
0x3... 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0x4... @ A B C D E F G H I J K L M N O
0x5... P Q R S T U V W X Y Z [ \ ] ^ _
0x6... ` a b c d e f g h i j k l m n o
0x7... p q r s t u v w x y z { | } ~
0x8... Ђ

 P
ѓ Љ Њ Ќ Ћ Џ
0x9... ђ љ њ ќ ћ џ
0xA...
NB
SP
Ў ў Ј ¤ Ґ ¦ § Ё © Є « ¬ ® Ї
0xB... ° ± І і ґ µ · ё є » ј Ѕ ѕ ї
0xC... А Б В Г Д Е Ж З И Й К Л М Н О П
0xD... Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я
0xE... а б в г д е ж з и й к л м н о п
0xF... р с т у ф х ц ч ш щ ъ ы ь э ю я
Italic fonts
The Russian version only provides an implementation of Windows-1251 for regular and bold fonts. The five italic fonts (7pt, 9pt, 10pt, 12pt and 14pt) have exactly the same data (pixels and glyph descriptors) as for the European iteration of Mac OS Roman. This makes sense because italic fonts are inherently broken (see HERE) and thus not used by any text in vanilla Oni.
14pt bold font
Somewhat surprisingly, the 14pt bold TSFT in the Russian version of TSFFTahoma does not have a complete Windows-1251 code page either. Instead it is limited to the US-ASCII character set (including the "printable delete" box at code point 0x7F), i.e., the upper section of the above table (white background). This causes no issue in vanilla Oni, but only because there is no text that uses 14pt bold.
Incomplete transparency
A unique "feature" of the Russian/Cyrillic TSFFTahoma is that all the characters in the extended ASCII range (0x80-0xFF) have a slightly opaque background (about 3% opacity) in the regular (non-bold) font variant. This isn't visible ingame, but only because the engine (re)posterizes all the glyphs into 4-bit grayscale when rendering (so that only opacities above 6% are visible).
Glyph alignment and spacing
Last but not least, some fonts in the Russian TSFFTahoma have inconsistent vertical alignment, the most blatant example being 12pt bold: some glyphs are one pixel shorter or taller than the full line height (ascender+descender), without a properly compensated vertical glyph offset; others simply have pixels that are not properly aligned within a glyph's rectangle. Besides, many glyphs have excessive padding to the left and/or right of a character, which affects readability.
N.B. There are other examples of poor alignment, e.g., for 12pt bold, the character 0x9C (њ) has its right side cut off and is thus unusable (luckily it doesn't occur in Russian script).



Chinese

The Chinese version of Oni is unique in how the main game code resides in Oni.dat, a renamed copy of the original Oni.exe from the US version that is executed indirectly by a wrapper app called oni.exe, alongside a custom text engine, xfhsm_oni.dll. The latter DLL intercepts any text about to be displayed by "Oni.dat", first reducing it to a set of two-byte control sequences, and then (if all goes well) to a set of custom glyphs, with pixel data coming from an external font file, xf_font.dat.

Unlike for the original US engine or the Japanese one, xfhsm_oni.dll does not expect any single-byte characters in the input, does not interpret US-ASCII strings in any meaningful way and never resorts to level0_Final's TSFFTahoma for text display. The pixel data comes exclusively from xf_font.dat and the expected control sequences are exclusively two-byte code points (this includes string termination; instead of a single null char xfhsm_oni.dll expects a string to end with a pair of null chars).

Unlike for other versions of Oni, the Chinese font does not have a table listing the valid code points along with their "glyph descriptors" (i.e., instructions on how to extract a glyph from the raw pixel data). Instead all the glyphs are stored as fixed-size bitmaps (16x16 pixels each) and there are exactly 94x94=8,836 glyphs, filling up a standard GB 2312 plane (qūwèi), indexed through a compact numbering scheme known as EUC-CN: each of the 94x94 code points is indexed by a pair of bytes that are both in the 0xA1-0xFE range. Code points that are not assigned under GB 2312 (e.g., rows 10-15 and 90-94) simply have blank pixel data in the corresponding regions of xf_font.dat.

Two glyph sizes are available: 16x16 glyphs are stored in the first half of xf_font.dat, and 12x12 glyphs in the second half. Each 12x12 glyph is stored in the top left corner of a 16x16 bitmap, so the row/glyph alignment is the same in both cases: 2 bytes per pixel row and 32 bytes per glyph. The pixel packing is 1-bit black-and-white (i.e., without antialiasing), much more space-efficient than the 8-bit grayscale storage used in Oni's TSFT. Another gain comes from not having any glyph descriptors (TSGAs), and from having only two fonts instead of Oni's typical 15.

All the GB 2312 glyphs listed HERE and HERE are implemented, except for the euro sign (row 2) and the ten glyphs from Vertical Forms (row 6). Thus of the 8,836 available code points only 7,668 (including the ideographic space A1,A1) correspond to actual glyphs, whereas the other 1,168 correspond to blank pixel data (indistinguishable from a space). In terms of space efficiency, only 74,752 bytes are thus wasted on blank pixel data (2 x 1,168 x 32), whereas trimmed-down pixel data would require at least 2x7,668 + 2 = 15,338 bytes for an indexation of the available glyphs, as well as additional lookup logic.

N.B.

Unlike for other versions of Oni, an invalid code point does not interrupt the interpretation/rendering of a text string by xfhsm_oni.dll and can lead to a wide range of unexpected behavior: at best, a blank or otherwise unintended glyph will be displayed; at worst the rendered text will be garbled (memory corruption most likely), or the game may simply crash.

The current understanding is that xfhsm_oni.dll simply turns any two-byte code point QQ WW into the offset [(QQ-A1)*5E + (WW-A1)]*0x20, relative either to the start of the xf_font.dat data (for the 16x16 font) or to the middle of the data (for the small 12x12 font). Depending on the values of QQ and WW, both components of the offset can fall outside the intended 0-93 range, with values as high as 94 and as low as -161. There doesn't seem to be any sanity check, and the only special handling is for QQ=00 (in this case WW is ignored and the string is terminated).

A valid EUC-CN code point (with both bytes in the 0xA1-0xFE range) results in a valid offset pointing to an actual glyph for the relevant font, whereas illegal bytes or byte pairs may point to a different glyph within the same font, or to a glyph of the other font, or to a completely unrelated memory region. In the worst case scenario, pixel data will be read at 486,432 bytes (~475 kB) ahead of the actual pixel data (if displaying the code point 01,00 for the large font) or at 3008-3040 bytes (~3 kB) past the actual pixel data (if displaying the code point FF,FF for the small font).

Reading garbage pixel data shouldn't be causing memory corruption per se (merely nonsensical/garbled text), but if similar out-of-bounds pointers occur for glyph rendering, then xfhsm_oni.dll may occasionally overwrite its own memory or even Oni's. This has not been thoroughly investigated, but it seems advisable to ensure that all text consists of valid EUC-CN code points (which is unfortunately not the case, see "Invalid EUC-CN input" below).



Japanese

Japanese Oni uses a custom two-byte encoding that is mostly consistent with Shift JIS but with some of the control sequences rearranged in seemingly non-standard ways. Like Chinese Oni, the glyph data is stored in new, external files; in this case they are .fnt files stored in GameDataFolder. Three font sizes are available, with pixel sizes 11x11 (JPN_SMALL.fnt), 12x12 (JPN_MIDDLE.fnt) and 14x14 (JPN_BIG.fnt). The 14x14 font has a bold-face variant (JPN_BOLD.fnt). All four fonts are fixed-width, i.e. all glyphs have a square bounding box.

Unlike the Chinese version, the TSFFTahoma contained in the Japanese game data is not limited to the ASCII code page. There are a total of 154 double-byte code points (Romaji, punctuation, kana and kanji) across 19 code pages (TSGA) each corresponding to a different "lead byte" (0x81, 0x82, 0x83, 0x88, 0x89, 0x8A, 0x8B, 0x8C, 0x8D, 0x8E, 0x8F, 0x90, 0x91, 0x92, 0x93, 0x95, 0x96, 0x97 and 0x98).

As for the first code page of the Japanese TSFFTahoma, it implements only the 0x20-0x7F range of characters, i.e., is limited to US-ASCII. This is consistent with the simplified logic used by the Japanese engine, where any high-bit byte (in the 0x80-0xFF range) is treated as the start of a two-byte sequence. (In actual Shift JIS some high-bit bytes are interpreted as half-width kana, a feature that isn't supported by Oni's engine.)

It must be noted that, as compared to the separate .fnt files, the Japanese TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1,357) and is essentially useless/unusable except for its US-ASCII part.

  • The Japanese engine requires all four .fnt files to be present (bails out if any of them are missing) and uses them for any double-byte code points, resorting to TSFFTahoma only for the rare occurrences of US-ASCII (resolution strings, the "On" labels in the Options menu, etc).
  • If the US engine is used on the Japanese game data, then the .fnt files are ignored (obviously), and the incomplete TSFFTahoma is used to render both US-ASCII and Japanese glyphs. Due to the limited character set (154 glyphs instead of 1,357), many strings end up broken in this situation.

It appears that the Japanese localization team initially tried to put Oni's code page system to use, and to fill in all the required JIS glyphs into TSFT and TSGA. As the number of kanji increased, supposedly, the TSFT grew prohibitively large due to the use of 8-bit grayscale storage for the pixel data, and the size taken up by the sparsely populated TSGA also increased out of proportion with the rest of the game data. At some point the engine switched to separate .fnt files, and somehow no one bothered to clean up the incomplete code pages in TSFFTahoma.

At the time of writing, the code points and pixel data in the Japanese .fnt files have not been thoroughly analyzed and compared with JIS X 0208. We know that 1,357 glyphs are implemented, across 27 "lead bytes" (roughly 50 kuten rows). This is much smaller than the full kuten plane, and makes sense in terms of space efficiency. We also know that some code points are non-standard (rearranged) as compared to regular Shift JIS, although we do not yet know if this rearrangement is consistent with any common variation of Shift JIS. As long as Japanese game data contains text strings that match the game's encoding, non-standard code points are not a problem (but should be kept in mind).

Text anomalies

Ellipsis issue

Unlike other Western versions (UK English, French, German, Italian, Spanish, Russian), the US engine treats characters above 0x7F as part of a two-byte control sequence (an unused provision for Asian encodings), and therefore fails to render any character from the extended ASCII range. This happens twice in English Oni because the ellipsis character (…), encoded as 0xC9, was accidentally used in these two text consoles instead of three consecutive periods (probably auto-substituted by a text editor). The result is that the two lines using a "…" are cut off at the offending character.

Invalid EUC-CN input

Unlike the Japanese version, where non-standard Shift JIS sequences are explicitly allowed in the .fnt files, the Chinese version does not have a code table and relies on a standard EUC-CN encoding, with exactly 8,836 code points (94x94). A proper EUC-CN control sequence consists of two bytes that are both in the range 0xA1-0xFE and anything else is technically illegal (single US-ASCII characters could occur in theory, but are not handled properly by the custom text engine, xfhsm_oni.dll).

The text strings in the Chinese version mostly conform to the EUC-CN scheme, but there are two recurrent invalid characters, as well as some instances of non-translated US-ASCII (!!!).

(A3,89)

The illegal sequence (A3,0x89) occurs only in the SUBTmessages entry xdash1, the original English text being "Face the center of the room and [c.tap the forward key just before pressing and holding it down again (tap W then press and hold W)].".

There are five identical (A3,0x89) glyphs at the end of the string, just before the (double) null. All of them end up rendered as ㈢. What happens under the hood is that xfhsm_oni.dll is simply subtracting 161 from both bytes, ending up with (2,-24), which is equivalent to (1,70) and produces the GB 2312 glyph ㈢. The correct EUC-CN code for ㈢ would be (A2,E7), although it is unlikely that this is what the translator meant to write. It is not currently known what the intended glyph was, as it doesn't seem to be a valid control sequence under any common extension of EUC-CN.

(A3,A0)

The illegal sequence (A3,A0) is much more common than (A3,0x89). It occurs in SUBT entries (both in actual subtitles and in "messages"), as well as in the IGSt resources of multiple TxtC (text consoles), two WPge (weapon pages) and one OPge (objective page). Lists of occurrences are provided below.

Like for (A3,0x89), the pixel data addressed by the invalid code point remains within the same font, in this case at the (A2,FE) slot, which happens to be blank (and thus indistinguishable from an intentional space glyph).

Unlike for (A3,0x89), there are multiple examples to look at, so we can make an informed guess as for what the intended glyph was: either an ordinary ideographic space, (A1,A1), or some variant thereof (such as a non-breaking space).

Without a proper sanity check, some illegal code points will clearly result in pixel data being loaded not from a valid glyph region, but from irrelevant memory that belongs either to xfhsm_oni.dll or to the main Oni engine, resulting in garbled text. Memory corruption or segmentation fault (access violation) may occur if similar out-of-bounds pointers are used when rendering glyph textures. Possibly invalid EUC-CN input is what is causing most Chapters of the Chinese Oni version to crash on modern Windows systems, although this has not been investigated thoroughly.

Non-translated US-ASCII

ASCII strings are much more harmful when handled by xfhsm_oni.dll, as compared to the two invalid code points (A3,A0) and (A3,0x89), because pairs of US-ASCII bytes, misinterpreted as EUC-CN code points, end up referencing completely strange memory regions (outside the region occupied by xf_font.dat). Unfortunately, there are a few ASCII strings that xfhsm_oni.dll can come across even during regular gameplay, and many more arise if one allows for modding.

Count on it

The following string in SUBTsubtitles has not been translated into Chinese:

Barabas:  Count on it. When I get through with them they're...

Being encoded as plain US-ASCII, this string is entirely illegal considering the limited implementation of EUC-CN by xfhsm_oni.dll, which does not detect US-ASCII as single-byte code points and keeps interpreting pairs of ASCII bytes as (invalid) quwei indices. Through lucky coincidence, the string has an even number of printable bytes, so that the null character is still in a suitable place for terminating the string (the EUN-CN parser will see it as a null lead-byte and will not keep reading further data). However, the string still consists of 31 invalid two-byte code points (not counting the null). As a further lucky coincidence, this string is never read by Oni's engine, because the subtitle's handle (02_05_05) is one of those that have been clobbered by the spurious double-null (see "Chinese_SUBT_issues" below). If it wasn't for the clobbering, the game would crash upon displaying this subtitle.

Pre-beta ONLDs

The "level definitions" (ONLDs) of pre-beta levels are never seen in vanilla Oni, but would appear in the "Load Game" dialog if a valid level#_Final.dat were to be supplied at startup (e.g. by a mod) and unlocked in persist.dat. Since xfhsm_oni.dll does not actually support US-ASCII, any untranslated ONLDs are potentially disruptive.

The following 8 pre-beta ONLDs were fully translated: "The Airport Part Deux" (level_05), "Obsolete" (level_07), "The Arena of Pain" (level_30), "Crossing Zone" (level_31), "Pit" (level_32), "Crossing Zone Too" (level_33), "Capture" (level_34), "Territories" (level_35).

The following 8 pre-beta ONLDs remained as US-ASCII: "Test_Stuff" (level_36), "AlexTestSite" (level_55), "Experimental_II" (level_66), "MARTY'S SOUND CORRIDOR" (level_68), "FiringRange" (level_71), "One Room" (level_77), "One Room 2" (level_88) and "Test Barn II" (level_99).

The most awkward case is that of the string "BGI HQ" (ONLDlevel_16), which was translated only partly: "HQ" was replaced with a pair of GB 2312 glyphs, but the first four characters "BGI " remained as plain ASCII (i.e., as two illegal EUC-CN code points).

Cheat messages

None of the 38 cheat messages was translated into Chinese (!!!), so that means 38 more strings entirely made of illegal EUC-CN code points. Any time a cheat is entered, xfhsm_oni.dll attempts to display one of the following strings, which almost always causes a crash on modern Windows systems. Note how the null byte does not interrupt the input if it occurs in a trail-byte position.

Debug printout and console

Oni has a well-hidden Developer Mode in which it can print informational output directly to the screen instead of writing to a text file. There are fully automatic warnings from the engine (e.g. about too many visible polygons or too many particles), or more or less regular printout (e.g., about a character's current animation status) that can be toggled through script variables, or custom "dprint" messages that the developers used for visual feedback while testing scripts. Dev mode also has a togglable command line ("CMD: ") for entering script commands in real time. Both the debug printout and the command line use the main glyph-rendering pipeline (intercepted by xfhsm_oni.dll), with a small font size. This makes Dev mode essentially unusable in Chinese Oni, as most if not all of the debug printout or console output will be plain ASCII.

Interestingly, Oni does have some primitive debug printout that is not intercepted by xfhsm_oni.dll and thus is displayed normally using the smallest-sized TSFT from level0_Final's TSFFTahoma. All (most?) of the primitive printout is available without Dev mode. There is the Ctrl+Shift+Y hotkey (FPS display), some HUD-like overlays toggled by script variables (e.g., chr_debug_characters), and finally some 3D sprites added to the game (e.g., health indicators or name labels displayed above a character's head).

Over-tall text

Although not strictly speaking a font issue, some of Oni's text fails to render because it doesn't fit vertically into a fixed-size frame (such as a text console). This is known to happen for These Two consoles in the English version, and possibly for other screens in other language versions.

Over-long text

Screen resolutions in Chinese Oni

Although Chinese text strings typically have a much smaller number of glyphs than English originals, this is not always the case. The Chinese glyphs are also much wider on average, with each glyph taking up 16x16 pixels, and so there are situations where the rendered Chinese line is much wider than the English original, no longer fitting on one line as intended by the context.

This is known to cause a problem for the "resolution" item in the Options menu (a WMM_ generated at runtime). The actual dropdown list is wide enough to accommodate even the longest resolution strings, but the currently selected resolution appears in a small window that is only 150 pixels wide, too narrow even for the shortest resolution string "640×480×16位" (which needs 176 pixels). As a result the active resolution is always displayed on two lines, no longer fitting into the frame vertically and thus unreadable.

The Japanese version displays screen resolutions as US-ASCII using TSFFTahoma (variable-width).

Long subtitles in Chinese Oni

Some of Oni's subtitles are tirades that are too long to fit on the screen horizontally. In such situations the Chinese engine will displays the subtitle on multiple lines, placing line breaks at arbitrary positions instead of at ideographic spaces or after punctuation (if any). For not-so-clear reasons, automatic line breaks will often appear near the start of a long subtitle string, right in the middle of a speaker's name, which looks particularly awkward since there is usually a space nearby (after the colon).

The Japanese Oni displays long subtitles with arbitrary breaks as well, but consistently aligns the beginning of the string to the left of the screen, achieving an ordinary/intuitive paragraph-like look. (Since the Japanese engine has support for US-ASCII, an ASCII space could theoretically be inserted into Japanese text, allowing for line breaks at specific positions, e.g., after punctuation or between semantic groups.)

Long UI text in Chinese Oni

Oni's ingame UI is stylized as a futuristic computer screen (it is supposed to be Konoko's "Data Comlink") and has fixed-width frames reserved for text display. Large amounts of text can appear in the text console frame, or in the upper and lower sections of the Help menu (F1). It turns out that these frames have enough width to accommodate 26.5 16x16 glyphs (text console frame) or 19.5 glyphs (Help menu frame), therefore one would expect the Chinese text renderer to wrap lines around at 26 or 19 characters, respectively. Unfortunately the lines are wrapped around at 27 and 20, so the right half of the last glyph on every long line is cut off.

The Japanese Oni consistently adjusts the carriage return depending on the glyph dimensions (font size), so that the last glyph in a wrapped-around line always fits into the frame and is displayed completely. (The Japanese engine also allows for variable-width US-ASCII characters, and seems to correctly handle the carriage return for any mix of JIS and ASCII.)

Chinese SUBT issues

The Chinese (Windows) version of Oni is unique in that no game content was actually localized except for text. Because of the relative simplicity of the task, the Chinese team did not build a new set of game data files, and merely modified the original .dat and .raw from the US version. WMDD, WMM_ and IGSt instances were patched inside each level's .dat, whereas the two SUBT files were patched in level0_Final.raw. In the case of an IGSt, text is stored in a fixed-size array (384 bytes), which has more than enough space for any translated text. WMDD and WMM_ also have fixed-size arrays (256 and 64 bytes, respectively) with at least some spare space. SUBT files, however, have a much more compact storage.

The text strings of a SUBT file (stored in level0_Final.raw and indexed from the .dat part of the SUBT) are typically packed right next to each other, separated only by a single null char. Chinese text typically uses fewer glyphs, but each glyph is taking up two bytes instead of one, including punctuation and the trailing null. Thus for short sentences or interjections it is possible for a Chinese translation to completely fill up the space used by the original string and even extend into the next entry.

None of the Chinese translations in SUBTmessages or SUBTsubtitles are actually longer than the original English text, and it is only the extra null byte that intrudes on the next entry's handle on several occasions. The affected handle essentially becomes a null string, and the corresponding subtitle is never found and displayed.

In SUBTmessages this happens only once (the message corresponding to "xf1" overwrites the first character of "xreload", so Konoko is never prompted to reload her gun in the last training room). In SUBTsubtitles there are as many as 29 anomalies, summed up in the following table.

The systematic nature of this anomaly suggests that the Chinese team were careful not to exceed the string length of the original, and merely overlooked the extra null char (and of course didn't check the ingame rendition of the subtitles all that thoroughly).

The good news (for anyone who cares about Chinese subtitles) is that the double-null-char is actually not needed, and strings terminate just fine if the affected handles are restored. Of course this still leaves the issue of invalid EUC-CN code points.