OBD:Text encoding

Originally created in English, Oni has been translated into the following seven languages: French, Italian, Spanish, German, Russian, Japanese and Chinese.

(An overview of the known language versions can be found HERE, whereas localized content is detailed HERE.)

Depending on the language version, vanilla Oni uses one of the following five encodings to render text:

  • The original US version uses a trimmed-down Mac OS Roman code page that is effectively limited to US-ASCII (96 code points).
  • European localizations (UK English, French, Italian, Spanish, German) use a custom version of Mac OS Roman (192 code points).
  • The Russian localization uses a full implementation of the Windows-1251 (Cyrillic) code page (224 code points).
  • The Chinese localization uses the EUC-CN implementation of GB 2312 (8,836 code points).
  • The Japanese localization uses 1,357 code points mostly conforming to the Shift JIS implementation of JIS X 0208.

Properties of the fonts that are eventually used to render the text (via the encoding) are briefly described throughout the page.

(A more thorough overview of the glyphs can be found HERE.)


Encodings

US English

Below is the code page implemented by TSFFTahoma in the US English version of Oni. It is based on Mac OS Roman ("MacRoman" for short), but with two differences:

  • Of the 223 printable glyphs provided by MacRoman, 42 are not implemented in TSFFTahoma (shown as grey-on-black).
  • Control point 0x7F (a typically non-printable "delete" character) has a visible box-like glyph (◻) in this implementation.
  ...0 ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...A ...B ...C ...D ...E ...F
0x2... SP ! " # $ % & ' ( ) * + , - . /
0x3... 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0x4... @ A B C D E F G H I J K L M N O
0x5... P Q R S T U V W X Y Z [ \ ] ^ _
0x6... ` a b c d e f g h i j k l m n o
0x7... p q r s t u v w x y z { | } ~
0x8... Ä Å Ç É Ñ Ö Ü á à â ä ã å ç é è
0x9... ê ë í ì î ï ñ ó ò ô ö õ ú ù û ü
0xA... ° £ § ß ® © ´ ¨ Æ Ø
0xB... ± Ұ µ π ª º Ω æ ø
0xC... ¿ ¡ ¬ ƒ « »
NB
SP
À Ã Õ Œ œ
0xD... ÷ ÿ Ÿ ¤
0xE... · Â Ê Á Ë È Í Î Ï Ì Ó Ô
0xF...   Ò Ú Û Ù ı ˆ ˜ ¯ ̆ ̇ ̊ ̧ ̋ ̨ ̌
Minor notes
  • The MacRoman layout was apparently "borrowed" before 1998, when Mac OS 8.5 came out and the international currency sign a.k.a. scarab (¤), at 0xDB, was replaced with the euro symbol (€).
  • The actual font (see HERE) has some unusual typographical features, such as a single-stroke Yen/Yuan symbol (Ұ) and a vertical-stroke cent symbol (¢).
Major notes
  • Some of the removed glyphs (most importantly ß, ù and û, but also Ê, Ú and ú) occur in common European languages. This made the US TSFFTahoma unsuitable for EFIGS localizations, requiring the creation of a new version (see below).
  • The US engine actually cannot interpret any code points beyond the US-ASCII range (first 6 rows, white background), notably failing on "…" (see "Ellipsis issue" below). This is because of a provision for Asian encoding systems (EUC-CN and Shift JIS), which use two-byte sequences starting with a high-bit byte.



European

The code page used by the five Western European versions (UK English, French, German, Spanish and Italian) is slightly different from the trimmed-down Mac OS Roman.

  • It tends to the needs of European localizations by adding back the following characters:
    German ß; French Ê and û; French/Italian ù; Spanish/Italian Ú and ú (relatively rare).
N.B. The characters Æ and ÿ are not reinstated, despite their (very rare) occurrence in French script.
  • Awkwardly enough, the six characters are not restored in their original positions (grey-on-black), but take the place of math symbols.
    Four more "math" positions are inexplicably filled with three duplicate characters (œ, ¡ and ª) and a truly enigmatic ʖ̇ , which doesn't seem to occur in any known language and has no dedicated code point in Unicode.
N.B. The broken italic font variants (see HERE) do not fully implement the 10 new glyphs and use a regular question mark instead of the ʖ̇.
  ...0 ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...A ...B ...C ...D ...E ...F
0x2... SP ! " # $ % & ' ( ) * + , - . /
0x3... 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0x4... @ A B C D E F G H I J K L M N O
0x5... P Q R S T U V W X Y Z [ \ ] ^ _
0x6... ` a b c d e f g h i j k l m n o
0x7... p q r s t u v w x y z { | } ~
0x8... Ä Ç É Ñ Ö Ü á à â ä ã å ç é è
0x9... ê ë í ì î ï ñ ó ò ô ö ú ù û ü
0xA... £ § ß ® © ´ ¨ Ø
0xB... ± Ұ µ Ê Ú ù ú û ª ß œ æ ø
0xC... ¿ ¡ ¬ ¡ ƒ ʖ̇ ª « » À Õ Œ œ
0xD... ÷ Ÿ ¤
0xE... Â Ê Á Ë È Í Î Ï Ì Ó Ô
0xF... Ò Ú Û Ù ˆ ˜ ¯

Coincidentally, with the 10 new glyphs, the European code page has exactly 96 glyphs in the US-ASCII half and 96 in the extension half (blue).

N.B. Unlike the US version, all five Western European versions (including UK English) are able to render the full extended ASCII set.



Cyrillic

In the Russian version of Oni, TSFFTahoma implements the Windows-1251 (Cyrillic) code page, with some deviations.

  • The character 0x98, normally non-printable, is implemented as a visible box glyph (☐), slightly larger than 0x7F.
  • The character 0x81, normally a "Ѓ" glyph, is replaced with a thin space of inconsistent size (2px wide for all fonts, 3px for 13pt regular and 16pt regular).
  • The character 0xA0, normally a non-breaking space, is a space of not-so-consistent size (anywhere from single to triple width, depending on the font).
  • The character 0xAD, normally a soft hyphen, is a visible hyphen (similar to the hyphen-minus, 0x2D) for 7pt fonts, and an inconsistently sized space for other fonts.
    (Oni's engine could in theory reserve a special treatment for soft hyphens and non-breaking spaces, specified in TSFLRoman, but in practice there is no such functionality.)
  ...0 ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...A ...B ...C ...D ...E ...F
0x2... SP ! " # $ % & ' ( ) * + , - . /
0x3... 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
0x4... @ A B C D E F G H I J K L M N O
0x5... P Q R S T U V W X Y Z [ \ ] ^ _
0x6... ` a b c d e f g h i j k l m n o
0x7... p q r s t u v w x y z { | } ~
0x8... Ђ

 P
ѓ Љ Њ Ќ Ћ Џ
0x9... ђ љ њ ќ ћ џ
0xA...
NB
SP
Ў ў Ј ¤ Ґ ¦ § Ё © Є « ¬ ® Ї
0xB... ° ± І і ґ µ · ё є » ј Ѕ ѕ ї
0xC... А Б В Г Д Е Ж З И Й К Л М Н О П
0xD... Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я
0xE... а б в г д е ж з и й к л м н о п
0xF... р с т у ф х ц ч ш щ ъ ы ь э ю я
Italic fonts
The Russian version only provides an implementation of Windows-1251 for regular and bold fonts. The five italic fonts (7pt, 9pt, 10pt, 12pt and 14pt) have exactly the same data (pixels and glyph descriptors) as for the European iteration of Mac OS Roman. This makes sense because italic fonts are inherently broken (see HERE) and thus not used by any text in vanilla Oni.
Bold 14 font
Somewhat surprisingly, the size-14 TSFT in the Russian version of TSFFTahoma does not have a complete Windows-1251 code page either. Instead it is limited to the US-ASCII character set (including the "printable delete" box at code point 0x7F), i.e., the upper section of the above table (white background). This causes no issue in vanilla Oni, but only because there is no text that uses bold 14.
Incomplete transparency
A unique "feature" of the Russian/Cyrillic TSFFTahoma is that all the characters in the extended ASCII range (0x80-0xFF) have a slightly opaque background (about 3% opacity) in the regular (non-bold) font variant. This isn't visible ingame, but only because the engine (re)posterizes all the glyphs into 4-bit grayscale when rendering (so that only opacities above 6% are visible).
Glyph alignment and spacing
Last but not least, some fonts in the Russian TSFFTahoma have inconsistent vertical alignment, the most blatant example being 12 bold: some glyphs are one pixel shorter or taller than the full line height (ascender+descender), without a properly compensated vertical glyph offset; others simply have pixels that are not properly aligned within a glyph's rectangle. Besides, many glyphs have excessive padding to the left and/or right of a character, which affects readability.
N.B. There are other examples of poor alignment, e.g. for 12 bold, the character 0x9C (њ) has its right side cut off and is thus unusable (luckily it doesn't occur in Russian script).



Chinese

The Chinese version of Oni is unique in how the main game code resides in Oni.dat, a renamed copy of the original Oni.exe from the US version that is executed indirectly by a wrapper app called oni.exe, alongside a custom text engine, xfhsm_oni.dll. The latter DLL intercepts any text about to be displayed by "Oni.dat", first reducing it to a set of two-byte control sequences, and then (if all goes well) to a set of custom glyphs, with pixel data coming from an external font file, xf_font.dat.

Unlike for the original US engine or the Japanese one, xfhsm_oni.dll does not expect any single-byte characters in the input, does not interpret US-ASCII strings in any meaningful way and never resorts to level0_Final's TSFFTahoma for text display. The pixel data comes exclusively from xf_font.dat and the expected control sequences are exclusively two-byte code points (this includes string termination; instead of a single null char xfhsm_oni.dll expects a string to end with a pair of null chars).

Unlike for other versions of Oni, the Chinese font does not have a table listing the valid code points along with their "glyph descriptors" (i.e., instructions on how to extract a glyph from the raw pixel data). Instead all the glyphs are stored as fixed-size bitmaps (16x16 pixels each) and there are exactly 94x94=8,836 glyphs, filling up a standard GB 2312 plane (qūwèi), indexed through a compact numbering scheme known as EUC-CN: each of the 94x94 code points is indexed by a pair of bytes that are both in the 0xA1-0xFE range. Code points that are not assigned under GB 2312 (e.g., rows 10-15 and 90-94) simply have blank pixel data in the corresponding regions of xf_font.dat.

Two glyph sizes are available: 16x16 glyphs are stored in the first half of xf_font.dat, and 12x12 glyphs in the second half. Each 12x12 glyph is stored in the top left corner of a 16x16 bitmap, so the row/glyph alignment is the same in both cases: 2 bytes per pixel row and 32 bytes per glyph. The pixel packing is 1-bit black-and-white (i.e., without antialiasing), much more space-efficient than the 8-bit grayscale storage used in Oni's TSFT. Another gain comes from not having any glyph descriptors (TSGAs), and from having only two fonts instead of Oni's typical 15.

All the GB 2312 glyphs listed HERE and HERE are implemented, except for the euro sign and the ten glyphs from Vertical Forms.

Unlike for other versions of Oni, an invalid code point does not interrupt the interpretation/rendering of a text string by xfhsm_oni.dll and can lead to a wide range of unexpected behavior: at best, a blank or otherwise unintended glyph will be displayed; at worst the rendered text will be garbled (memory corruption most likely), or the game may simply crash.



Japanese

Japanese Oni uses a custom two-byte encoding that is mostly consistent with Shift JIS but with some of the control sequences rearranged in seemingly non-standard ways. Like Chinese Oni, the glyph data is stored in new, external files; in this case they are .fnt files stored in GameDataFolder. Three font sizes are available, with pixel sizes 11x11 (JPN_SMALL.fnt), 12x12 (JPN_MIDDLE.fnt) and 14x14 (JPN_BIG.fnt). The 14x14 font has a bold-face variant (JPN_BOLD.fnt). All four fonts are fixed-width, i.e. all glyphs have a square bounding box.

Unlike the Chinese version, the TSFFTahoma contained in the Japanese game data is not limited to the ASCII code page. There are a total of 154 double-byte code points (Romaji, punctuation, kana and kanji) across 19 code pages (TSGA) each corresponding to a different "lead byte" (0x81, 0x82, 0x83, 0x88, 0x89, 0x8A, 0x8B, 0x8C, 0x8D, 0x8E, 0x8F, 0x90, 0x91, 0x92, 0x93, 0x95, 0x96, 0x97 and 0x98).

 0x81 (8 glyphs, 1 anomaly) - punctuation
N.B. 0x8130 is not a legal Shift JIS sequence. The standard code for a prolonged sound mark is 0x815B.
Glyph Shift JIS Unicode Designation
0x8130 U+30FC prolonged sound mark
2SP 0x8140 U+3000 ideographic space
0x8141 U+3001 ideographic comma
0x8142 U+3002 ideographic full stop
0x8145 U+30FB katakana middle point
0x8146 U+003A colon
0x8149 U+0021 exclamation mark
0x8193 U+0025 percent sign
 0x82 (42 glyphs) - numbers, letters, hiragana
N.B. There is no clear reason why numerals are limited to 2 and 6, and Roman letters are limited to A, C, D, F, S, T, W - or why these glyphs are needed at all, seeing as US-ASCII is still available. But for what it's worth, the 9 redundant glyphs come from a serifed font, whereas the US-ASCII font is sans serif.
N.B. It is also not clear why (in this font/encoding) the TU hiragana has a "lowercase" version while many other hiragana are missing.
Glyph Shift JIS Unicode Designation
2 0x8251 U+0032 digit 2
6 0x8255 U+0036 digit 6
A 0x8260 U+0041 letter A
C 0x8262 U+0043 letter C
D 0x8263 U+0044 letter D
F 0x8265 U+0046 letter F
S 0x8272 U+0053 letter S
T 0x8273 U+0054 letter T
W 0x8276 U+0057 letter W
0x82A0 U+3042 hiragana A
0x82A2 U+3044 hiragana I
0x82A4 U+3046 hiragana U
0x82A6 U+3048 hiragana E
0x82A9 U+304B hiragana KA
0x82AA U+304C hiragana GA
0x82AB U+304D hiragana KI
0x82AD U+304F hiragana KU
0x82B1 U+3053 hiragana KO
0x82B3 U+3055 hiragana SA
0x82B5 U+3057 hiragana SI
0x82B6 U+3058 hiragana ZI
0x82B7 U+3059 hiragana SU
0x82BD U+305F hiragana TA
0x82BE U+3060 hiragana DA
0x82BF U+3061 hiragana TI
0x82C1 U+3063 hiragana tu
0x82C2 U+3064 hiragana TU
0x82C4 U+3066 hiragana TE
0x82C6 U+3068 hiragana TO
0x82C8 U+306A hiragana NA
0x82C9 U+306B hiragana NI
0x82CC U+306E hiragana NO
0x82CD U+306F hiragana HA
0x82DC U+307E hiragana MA
0x82DF U+3081 hiragana ME
0x82E1 U+307E hiragana MA
0x82E6 U+3081 hiragana ME
0x82E9 U+308B hiragana RU
0x82EA U+308C hiragana RE
0x82ED U+308F hiragana WA
0x82F0 U+3092 hiragana WO
0x82F1 U+3093 hiragana N
 0x83 (41 glyphs, 2 anomalies) - katakana
N.B. 0x8332 is not a legal Shift JIS sequence. The standard code for the BO katakana ボ is 0x837B.
N.B. 0x8333 is not a legal Shift JIS sequence. The standard code for the MA katakana マ is 0x837D.
N.B. It is not clear why (in this font/encoding) the I and O katakana have "lowercase" versions while many other katakana are missing. Also, the TU, YA, YU and YO katakana have only a lowercase version.
Glyph Shift JIS Unicode Designation
0x8332 U+30DC katakana BO
0x8333 U+30DE katakana MA
0x8342 U+30A3 katakana ı
0x8343 U+30A4 katakana I
0x8345 U+30A6 katakana U
0x8348 U+30A9 katakana o
0x8349 U+30AA katakana O
0x834C U+30AD katakana KI
0x834E U+30AF katakana KU
0x834F U+30B0 katakana GU
0x8350 U+30B1 katakana KE
0x8351 U+30B2 katakana GE
0x8352 U+30B3 katakana KO
0x8354 U+30B5 katakana SA
0x8356 U+30B7 katakana SI
0x8357 U+30B8 katakana ZI
0x8358 U+30B9 katakana SU
0x835A U+30BB katakana SE
0x835E U+30BF katakana TA
0x8362 U+30C3 katakana tu
0x8365 U+30C6 katakana TE
0x8367 U+30C8 katakana TO
0x8368 U+30C9 katakana DO
0x8369 U+30CA katakana NA
0x836A U+30CB katakana NI
0x836D U+30CE katakana NO
0x8374 U+30D5 katakana HU
0x8375 U+30D6 katakana BU
0x8376 U+30D7 katakana PU
0x8378 U+30D9 katakana BE
0x837C U+30DD katakana PO
0x8380 U+30E0 katakana MU
0x8383 U+30E3 katakana ya
0x8385 U+30E5 katakana yu
0x8387 U+30E7 katakana yo
0x8389 U+30E9 katakana RA
0x838A U+30EA katakana RI
0x838B U+30EB katakana RU
0x838C U+30EC katakana RE
0x838D U+30ED katakana RO
0x8393 U+30F3 katakana N
 0x88 (4 glyphs), 0x89 (7 glyphs), 0x8A (3 glyphs), 0x8B (1 glyph) - kanji
Glyph Shift JIS Unicode Designation
0x88C3 U+6697 kanji AN
0x88D5 U+6613 kanji EKI
0x88DA U+79FB kanji I
0x88F3 U+5370 kanji IN
0x899F U+62BC kanji Ō
0x89A9 U+2EE9
U+9EC4
kanji KI
0x89BA U+4E0B kanji SHITA
0x89C2 U+53EF kanji KA
0x89E6 U+753B
U+FAA3
kanji GA
0x89F0 U+89E3 kanji KAI
0x89F1 U+56DE kanji KAI2
0x8A65 U+5404 kanji ONOONO
0x8AAF U+5B98 kanji KAN
0x8AEE U+57FA kanji MOTO
0x8B96 U+8A31 kanji MOTO2
 0x8C (3 glyphs), 0x8D (4 glyphs), 0x8E (8 glyphs), 0x8F (3 glyphs) - kanji
Glyph Shift JIS Unicode Designation
0x8C6F U+7D4C kanji KYŌ
0x8CC9 U+5EAB kanji KO
0x8CFC U+5411 kanji MU
0x8D73 U+2F8F
U+884C
U+FA08
kanji GYŌ
0x8D82 U+2FBC
U+9AD8
kanji TAKA
0x8D87 U+5408 kanji
0x8DEC U+4F5C kanji SAKU
使 0x8E67 U+4F7F kanji SHI
0x8E69 U+53F8 kanji TSUKASA
0x8E6E U+59CB kanji SHI
0x8E84 U+79C1 kanji WATASHI
0x8E8E U+8A66 kanji SHI2
0x8E9A U+5B57 kanji JI
0x8E9E U+6642 kanji TOKI
0x8ECE U+659C kanji SHA
0x8F49 U+7D42 kanji TSUI
0x8F8A U+6240 kanji SHO
0x8FEA U+5834 kanji BA
 0x90 (3 glyphs), 0x91 (7 glyphs), 0x92 (2 glyphs), 0x93 (5 glyphs) - kanji
Glyph Shift JIS Unicode Designation
0x9046 U+2F8A
U+8272
kanji IRO
0x9056 U+65B0 kanji SHIN
0x905F U+795E
U+FA19
kanji KAMI
0x914F U+524D kanji MAE
0x9171 U+5009 kanji KURA
0x918B U+7A93 kanji MADO
0x919C U+50CF
U+2F80B
kanji
0x91B1 U+7D9A kanji ZOKU
0x91CC U+4F53 kanji TAI
0x91D6 U+66FF kanji TEI
0x9286 U+4E2D kanji CHU
0x92E1 U+4F4E kanji HIKU
0x9378 U+5EA6 kanji TABI
0x93AE U+52D5 katakana DO
0x93AF U+540C kanji DO2
0x93EF U+96E3
U+FA68
U+FAC7
kanji NAN
0x93FC U+2F0A
U+5165
kanji JU
 0x95 (3 glyphs), 0x96 (5 glyphs, 1 anomaly), 0x97 (4 glyphs), 0x98 (1 glyph) - kanji
N.B. 0x9632 is not a legal Shift JIS sequence. The standard code for the MOTO kanji 本 is 0x967B.
Glyph Shift JIS Unicode Designation
0x95C2 U+9589 kanji HEI
0x95CF U+5909 kanji HEN
0x95E0 U+6B69 kanji HO
0x9632 U+672C kanji MOTO
0x968B U+5E55 kanji MAKU
0x96BE U+660E kanji MEI
0x96CA U+2FAF
U+9762
kanji MEN
0x96DA U+2F6C
U+76EE
kanji MOKU
0x9770 U+2F64
U+7528
kanji
0x97A7 U+2F74
U+7ACB
kanji RITSU
0x97B9 U+4E86
U+F9BA
kanji RYŌ
0x97DF U+4EE4
U+F9A8
kanji REI
0x9848 U+8DEF kanji JI

As for the first code page of the Japanese TSFFTahoma, it implements only the 0x20-0x7F range of characters, i.e., is limited to US-ASCII. This is consistent with the simplified logic used by the Japanese engine, where any high-bit byte (in the 0x80-0xFF range) is treated as the start of a two-byte sequence. (In actual Shift JIS some high-bit bytes are interpreted as half-width kana, a feature that isn't supported by Oni's engine.)

It must be noted that, as compared to the separate .fnt files, the Japanese TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1,357) and is essentially useless/unusable.

  • The Japanese engine requires all four .fnt files to be present (bails out if any of them are missing) and uses them for all of the vanilla text strings, which only contain double-byte control codes. Thus, under normal conditions, TSFFTahoma remains completely unused in the Japanese version, and would only be used for (artificially added) US-ASCII input.
  • If the US engine is used on the Japanese game data, then the .fnt files are ignored (obviously), and the incomplete TSFFTahoma is used to render the Japanese text strings as well as the few English strings supplied by the EXE. Due to the limited character set, many strings end up broken.

It appears that the Japanese localization team initially tried to put Oni's code page system to use, and to fill in all the required JIS glyphs into TSFT and TSGA. As the number of kanji increased, supposedly, the TSFT grew prohibitively large due to the use of 8-bit grayscale storage for the pixel data, and the size taken up by the sparsely populated TSGA also increased out of proportion with the rest of the game data. At some point the engine switched to separate .fnt files, and somehow no one bothered to clean up the incomplete code pages in TSFFTahoma.

At the time of writing, the code points and pixel data in the Japanese .fnt files have not been thoroughly analyzed and compared with JIS X 0208. We know that 1,357 glyphs are implemented, across 27 "lead bytes" (roughly 50 kuten rows). This is much smaller than the full kuten plane, and makes sense in terms of space efficiency. We also know that some code points are non-standard (rearranged) as compared to regular Shift JIS, although we do not yet know if this rearrangement is consistent with any common variation of Shift JIS. As long as Japanese game data contains text strings that match the game's encoding, non-standard code points are not a problem (but should be kept in mind).

Text anomalies

Ellipsis issue

Unlike other Western versions (UK English, French, German, Italian, Spanish, Russian), the US engine treats high-bit characters as part of a two-byte control sequence (a provision for Asian encodings), and therefore fails to render any character from the extended ASCII range. This happens twice in English Oni, because the ellipsis character (…), encoded as 0xC9, was accidentally used in These Two text consoles in place of three consecutive dots (probably auto-substituted by a text editor). The result is that the two lines using a "…" are cut off at the offending character.

Invalid EUC-CN input

Unlike the Japanese version, where non-standard Shift JIS sequences are explicitly allowed in the .fnt files, the Chinese version does not have a code table and relies on a standard EUC-CN encoding, with exactly 8,836 code points (94x94). A proper EUC-CN control sequence consists of two bytes that are both in the range 0xA1-0xFE and anything else is technically illegal (single US-ASCII characters could occur in theory, but are not handled properly by the custom text engine, xfhsm_oni.dll).

The text strings in the Chinese version mostly conform to the EUC-CN scheme, except for the (A1,A0) sequence, which occurs in a few subtitles and is rendered with a blank glyph (i.e., a space btween valid glyphs, undistinguishable from an ordinary ideographic space), apparently due to some kind of wraparound. At the time of writing it is not known what was meant by the (A1,A0) sequence, as is doesn't seem to be a valid control sequence under any common extension of EUC-CN.

Another illegal sequence is (0xA3,0x89), which occurs only in the SUBTmessages entry xdash1 (five identical glyphs at the end of the string) and is somehow rendered as a ㈢, which would normally be encoded with (A2,E7). Such an improbable substitution is likely unintentional, and it is not known what the intended glyph was.

At the time of writing the apparent wraparound behavior has not been investigated thoroughly, but it is established that some illegal code points are not recovered to a valid glyph at all, and instead result in garbled text or a crash. Possibly invalid EUC-CN input is what is causing most Chapters of the Chinese Oni version to crash on modern Windows systems (through varying degrees of memory corruption), although this has not been investigated thoroughly either.

Over-tall text

Although not strictly speaking a font issue, some of Oni's text fails to render because it doesn't fit vertically into a fixed-size frame (such as a text console). This is known to happen for These Two consoles in the English version, and possibly for other screens in other language versions.

Over-long text

Although Chinese text strings typically have a much smaller number of glyphs than English originals, this is not always the case. The Chinese glyphs are also much wider on average, with each glyph taking up 16x16 pixels, and so there are situations where the rendered Chinese line is much wider than the English original, no longer fitting on one line as intended by the context.

This is only known to cause a problem for the "resolution" item in the Options menu (a WMM_ generated at runtime). The actual dropdown list is wide enough to accommodate even the longest resolution strings, but the currently selected resolution appears in a small window that is only 150 pixels wide, too narrow even for the shortest resolution string "640×480×16位" (which needs 176 pixels). As a result the active resolution is always displayed on two lines, no longer fitting into the frame vertically and thus unreadable.

Chinese SUBT issues

The Chinese (Windows) version of Oni is unique in that no game content was actually localized except for text. Because of the relative simplicity of the task, the Chinese team did not build a new set of game data files, and merely modified the original .dat and .raw from the US version. WMDD, WMM_ and IGSt instances were patched inside each level's .dat, whereas the two SUBT files were patched in level0_Final.raw. In the case of an IGSt, text is stored in a fixed-size array (384 bytes), which has more than enough space for any translated text. WMDD and WMM_ also have fixed-size arrays (256 and 64 bytes, respectively) with at least some spare space. SUBT files, however, have a much more compact storage.

The text strings of a SUBT file (stored in level0_Final.raw and indexed from the .dat part of the SUBT) are typically packed right next to each other, separated only by a single null char. Chinese text typically uses fewer glyphs, but each glyph is taking up two bytes instead of one, including punctuation and the trailing null. Thus for short sentences or interjections it is possible for a Chinese translation to completely fill up the space used by the original string and even extend into the next entry.

None of the Chinese translations in SUBTmessages or SUBTsubtitles are actually longer than the original English text, and it is only the extra null byte that intrudes on the next entry's handle on several occasions. The affected handle essentially becomes a null string, and the corresponding subtitle is never found and displayed.

In SUBTmessages this happens only once (the message corresponding to "xf1" overwrites the first character of "xreload", so Konoko is never prompted to reload her gun in the last training room). In SUBTsubtitles there are as many as 29 anomalies, summed up in the following table.

 List of corrupt subtitle handles in the Chinese version
Culprit handle Original of culprit text (null char ° included) Victim handle
01_01_11 Kerr:  Good luck Konoko.° 01_02_01
01_03_07 Griffin:  All right Konoko. I'm giving you a shot at this.° 01_03_07
02_05_04 Muro:  Let me know when things start to get messy.° 02_05_05
02_06_02 Griffin:  Explain.° 02_06_03
02_09_03 Griffin:  So she's still stable?° 02_09_04
03_10_01 Barabas:  Let's get it on!° 03_10_02
03_11_01 Barabas:  She's with them.° 03_11_02
04_17_03 Muro:  I can't allow that.° 04_17_04
07_22_01 Konoko:  Showtime...° 07_23_01
07_26_15 Konoko:  Thanks.° 07_26_16
07_26_17 Cop:  No, we haven't secured a single area -- not even our armory.° 07_26_18
08_27_03 Konoko:  This is personal.° 08_27_04
09_31_02 Shinatama:  You are not who you think you are.° 09_31_03
09_31_03 Konoko:  What?° 09_31_04
09_31_24 Konoko:  No!° 09_31_25
11_40_07 Mukade:  We shall see...° c11_41_01konoko
12_46_02 Konoko:  What?° 12_46_03
12_46_06 Konoko:  Leave me alone...° 12_46_07
13_65_05 Kerr:  This may sting a bit...° 13_65_06
13_65_20 Konoko:  Griffin? But why?° 13_65_21
13_65_25 Kerr:  Muro.° 13_65_26
13_65_36 Konoko:  What?° 13_65_37
13_66_03 Konoko:  The crane controls...° 13_66_04
14_52_02 Konoko:  Gotcha.° 14_52_03
14_52_06 Konoko:  For you? Badly?° 14_52_07
00_01_09 Shinatama:  Super!° c00_01_10Shinatama
c00_01_10Shinatama Shinatama:  Great!° 00_01_11
civmale3_trigger Civilian:  Hi there!° c00_01_100shinatama
c00_01_101shinatama Shinatama: I'm sorry...so sorry! ° c00_01_102shinatama

The systematic nature of this anomaly suggests that the Chinese team were careful not to exceed the string length of the original, and merely overlooked the extra null char (and of course didn't check the ingame rendition of the subtitles all that thoroughly).