OBD:Text encoding: Difference between revisions
m (→US English: oops) |
m (→Chinese) |
||
Line 322: | Line 322: | ||
The Chinese version of Oni has the same TSFFTahoma as the original US version (trimmed-down Mac OS Roman), but the engine cannot interpret the extended ASCII range, and in fact does not use TSFFTahoma at all. Instead the wrapper mini-app called "oni.exe" loads "Oni.dat" (the game itself, a duplicate of the original Oni.exe from the US version), along with a custom "text engine" '''xfhsm_oni.dll''' and a font file '''xf_font.dat'''. Text strings loaded by "Oni.dat" are then intercepted by '''xfhsm_oni.dll''', interpreted/rendered using pixel data from '''xf_font.dat''', and injected into Oni.dat's OpenGL context. | The Chinese version of Oni has the same TSFFTahoma as the original US version (trimmed-down Mac OS Roman), but the engine cannot interpret the extended ASCII range, and in fact does not use TSFFTahoma at all. Instead the wrapper mini-app called "oni.exe" loads "Oni.dat" (the game itself, a duplicate of the original Oni.exe from the US version), along with a custom "text engine" '''xfhsm_oni.dll''' and a font file '''xf_font.dat'''. Text strings loaded by "Oni.dat" are then intercepted by '''xfhsm_oni.dll''', interpreted/rendered using pixel data from '''xf_font.dat''', and injected into Oni.dat's OpenGL context. | ||
Unlike for other | Unlike for other versions of Oni, the Chinese font doesn't have a table listing the valid code points along with their "glyph descriptors" (i.e., instructions on how to extract a glyph from the raw pixel data). Instead all the glyphs have a standard size of 16x16 pixels and there are exactly 94x94=8836 glyphs, filling up a standard [[wp:GB_2312|GB 2312]] plane (kuten), indexed through a compact numbering scheme known as [[wp:Extended_Unix_Code#EUC-CN|EUC-CN]]: each of the 94x94 code points is indexed by a pair of bytes that are both in the 0xA1-0xFE range. Code points that are not assigned under GB 2312 (e.g. rows 10-15 and 90-94) simply have blank pixel data in the corresponding regions of '''xf_font.dat'''. | ||
The pixel packing used by '''xf_font.dat''' is 1-bit black-and-white (i.e., without antialiasing), which is much more space-efficient than the 8-bit grayscale storage used in Oni's [[TSFT]]. Another gain comes from not having any glyph descriptors ([[TSGA]]s). Both a regular and a bold typeface are available (but in one size only, fixed-width 16x16). | The pixel packing used by '''xf_font.dat''' is 1-bit black-and-white (i.e., without antialiasing), which is much more space-efficient than the 8-bit grayscale storage used in Oni's [[TSFT]]. Another gain comes from not having any glyph descriptors ([[TSGA]]s). Both a regular and a bold typeface are available (but in one size only, fixed-width 16x16). | ||
Line 332: | Line 332: | ||
---- | ---- | ||
===Japanese=== | ===Japanese=== | ||
Japanese Oni uses a custom two-byte encoding that is mostly consistent with [[wp:Shift_JIS|Shift JIS]] but with some rearranged control sequences. Three font sizes are available, with pixel sizes 11x11 (JP_SMALL.fnt), 12x12 (JPN_MIDDLE.fnt) and 14x14 (JPN_BIG.fnt). The 14x14 font has a bold-faced variant (JPN_BOLD.fnt). All four fonts are fixed-width, i.e. all glyphs have a square bounding box. | Japanese Oni uses a custom two-byte encoding that is mostly consistent with [[wp:Shift_JIS|Shift JIS]] but with some rearranged control sequences. Three font sizes are available, with pixel sizes 11x11 (JP_SMALL.fnt), 12x12 (JPN_MIDDLE.fnt) and 14x14 (JPN_BIG.fnt). The 14x14 font has a bold-faced variant (JPN_BOLD.fnt). All four fonts are fixed-width, i.e. all glyphs have a square bounding box. |
Revision as of 00:58, 29 December 2021
Depending on the language version, Vanilla Oni uses one of the following five encodings to render text.
- The original US version uses a trimmed-down Mac OS Roman code page that is effectively limited to US-ASCII (96 code points).
- European localizations (UK English, French, Italian, Spanish, German) use a custom version of Mac OS Roman (192 code points).
- The Russian localization uses a full implementation of the Windows-1251 (Cyrillic) code page (224 code points).
- The Chinese localization uses the EUC-CN implementation of GB 2312 (8836 code points).
- The Japanese localization uses 1357 code points mostly conforming to the Shift JIS implementation of JIS X 0208.
Properties of the fonts that are eventually used to render the text (via the encoding) are briefly described throughout the page.
Encoding
US English
Below is the code page implemented by TSFFTahoma in the US English version of Oni.
It is based on Mac_OS_Roman code page, but with two differences:
- Of the 223 printable glyphs provided by Mac OS Roman, 42 are missing (shown as grey-on-black).
- Control point 0x7F (a typically non-printable "delete" character) is available as a box-like glyph ◻.
- Minor notes
- The layout was apparently "borrowed" before Mac OS 8.5, so the glyph at 0xDB is a "currency sign" ¤, not a euro sign €.
- Non-standard features of the actual font include a single-stroke Yen/Yuan symbol, Ұ, and a vertical-stroke cent symbol, ¢.
- The five glyphs marked in orange (¢, £, ©, ± and µ) are in coincidental agreement with the Windows-1252 code page.
- Major notes
- Some of the removed glyphs (most importantly ß, Ê, ù and û, but also Ú and ú) occur in common European languages.
This made the US font/encoding unsuitable for EFIGS localizations, and prompted the edition of a new version (see below). - The US engine actually cannot interpret any code points beyond the US-ASCII range (first 6 rows, white background), such as "…" (see BELOW).
This is because of a provision for Asian encoding systems (EUC-CN and Shift JIS), which use two-byte sequences starting with a high-bit byte.
European
The code page used by the five Western European versions (UK English, French, German, Spanish and Italian) is slightly different from the trimmed-down Mac OS Roman.
- It tends to the needs of European localizations by adding back the following characters:
German ß; French Ê and û; French/Italian ù; Spanish/Italian Ú and ú (relatively rare).
- N.B. The characters Æ and ÿ are not reinstated, despite their (very rare) occurrence in French script.
- Awkwardly enough, the six characters are not restored in their original positions (grey), but take the place of math symbols.
Four more "math" positions are inexplicably filled with three duplicate characters (œ, ¡ and ª) and the very exotic, non-Unicode ʖ̇ .
- N.B. The broken italic font variants do not fully implement the 10 new glyphs and use, e.g., a regular question mark instead of the ʖ̇ .
...0 | ...1 | ...2 | ...3 | ...4 | ...5 | ...6 | ...7 | ...8 | ...9 | ...A | ...B | ...C | ...D | ...E | ...F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x2... | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
0x3... | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
0x4... | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
0x5... | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
0x6... | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
0x7... | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | ◻ |
0x8... | Ä | Ç | É | Ñ | Ö | Ü | á | à | â | ä | ã | å | ç | é | è | |
0x9... | ê | ë | í | ì | î | ï | ñ | ó | ò | ô | ö | ú | ù | û | ü | |
0xA... | † | ¢ | £ | § | • | ß | ® | © | ™ | ´ | ¨ | Ø | ||||
0xB... | ± | Ұ | µ | Ê | Ú | ù | ú | û | ª | ß | œ | æ | ø | |||
0xC... | ¿ | ¡ | ¬ | ¡ | ƒ | ʖ̇ | ª | « | » | … | À | Õ | Œ | œ | ||
0xD... | – | — | ‟ | ” | ‛ | ’ | ÷ | Ÿ | ¤ | ‹ | › | |||||
0xE... | ‡ | ‚ | „ | ‰ | Â | Ê | Á | Ë | È | Í | Î | Ï | Ì | Ó | Ô | |
0xF... | Ò | Ú | Û | Ù | ˆ | ˜ | ¯ |
Coincidentally, with the 10 new glyphs the European code page has exactly 96 glyphs in the US-ASCII half, and 96 in the extension half (blue).
- N.B. Unlike the US version, all five Western European versions (including UK English) are able to render the full extended ASCII set.
Cyrillic
In the Russian version of Oni, TSFFTahoma fully implements the Windows-1251 (Cyrillic) code page.
- All the Windows-1251 characters are present, although only 66 (purple) are used by Russian script.
- The character 0x98 is normally non-printable, but in this font has a box-like glyph ☐ (not unlike 0x7F).
- Apart from 0x20, there are two whitespace characters: the non-breakable space ⍽ and the soft hyphen (–).
...0 | ...1 | ...2 | ...3 | ...4 | ...5 | ...6 | ...7 | ...8 | ...9 | ...A | ...B | ...C | ...D | ...E | ...F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0x2... | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
0x3... | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
0x4... | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
0x5... | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
0x6... | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
0x7... | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | ◻ |
0x8... | Ђ | Ѓ | ‚ | ѓ | „ | … | † | ‡ | € | ‰ | Љ | ‹ | Њ | Ќ | Ћ | Џ |
0x9... | ђ | ‘ | ’ | “ | ” | • | – | — | ☐ | ™ | љ | › | њ | ќ | ћ | џ |
0xA... | ⍽ | Ў | ў | Ј | ¤ | Ґ | ¦ | § | Ё | © | Є | « | ¬ | (–) | ® | Ї |
0xB... | ° | ± | І | і | ґ | µ | ¶ | · | ё | № | є | » | ј | Ѕ | ѕ | ї |
0xC... | А | Б | В | Г | Д | Е | Ж | З | И | Й | К | Л | М | Н | О | П |
0xD... | Р | С | Т | У | Ф | Х | Ц | Ч | Ш | Щ | Ъ | Ы | Ь | Э | Ю | Я |
0xE... | а | б | в | г | д | е | ж | з | и | й | к | л | м | н | о | п |
0xF... | р | с | т | у | ф | х | ц | ч | ш | щ | ъ | ы | ь | э | ю | я |
Chinese
The Chinese version of Oni has the same TSFFTahoma as the original US version (trimmed-down Mac OS Roman), but the engine cannot interpret the extended ASCII range, and in fact does not use TSFFTahoma at all. Instead the wrapper mini-app called "oni.exe" loads "Oni.dat" (the game itself, a duplicate of the original Oni.exe from the US version), along with a custom "text engine" xfhsm_oni.dll and a font file xf_font.dat. Text strings loaded by "Oni.dat" are then intercepted by xfhsm_oni.dll, interpreted/rendered using pixel data from xf_font.dat, and injected into Oni.dat's OpenGL context.
Unlike for other versions of Oni, the Chinese font doesn't have a table listing the valid code points along with their "glyph descriptors" (i.e., instructions on how to extract a glyph from the raw pixel data). Instead all the glyphs have a standard size of 16x16 pixels and there are exactly 94x94=8836 glyphs, filling up a standard GB 2312 plane (kuten), indexed through a compact numbering scheme known as EUC-CN: each of the 94x94 code points is indexed by a pair of bytes that are both in the 0xA1-0xFE range. Code points that are not assigned under GB 2312 (e.g. rows 10-15 and 90-94) simply have blank pixel data in the corresponding regions of xf_font.dat.
The pixel packing used by xf_font.dat is 1-bit black-and-white (i.e., without antialiasing), which is much more space-efficient than the 8-bit grayscale storage used in Oni's TSFT. Another gain comes from not having any glyph descriptors (TSGAs). Both a regular and a bold typeface are available (but in one size only, fixed-width 16x16).
At the time of writing, the pixel data in xf_font.dat has not been thoroughly analyzed and compared with GB 2312, so we do not know for sure if all the GB 2312 glyphs are implemented or if there are some additional blanks. The encoding may also be one of several extensions of EUC-CN, although it should be kept in mind that control bytes need to remain inside the 0xA1-0xFE range for the raw 94x94 kuten layout to work.
In theory, EUC-CN allows for single-byte control codes, which are interpreted as US-ASCII (and would be rendered using Oni's own TSFFTahoma). In practice, all of the strings in Vanilla game data (Chinese version) use only two-byte control sequences.
Japanese
Japanese Oni uses a custom two-byte encoding that is mostly consistent with Shift JIS but with some rearranged control sequences. Three font sizes are available, with pixel sizes 11x11 (JP_SMALL.fnt), 12x12 (JPN_MIDDLE.fnt) and 14x14 (JPN_BIG.fnt). The 14x14 font has a bold-faced variant (JPN_BOLD.fnt). All four fonts are fixed-width, i.e. all glyphs have a square bounding box.
Unlike for the Chinese version, the TSFFTahoma contained in Japanese game data (level0_Final) is not limited to the ASCII code page. There is a total of 154 double-byte code points (Romaji, punctuation, kana and kanji) across 19 code pages (TSGA) each corresponding to a different "lead byte" (0x81, 0x82, 0x83, 0x88, 0x89, 0x8A, 0x8B, 0x8C, 0x8D, 0x8E, 0x8F, 0x90, 0x91, 0x92, 0x93, 0x95, 0x96, 0x97 and 0x98).
- N.B. 0x8130 is not a legal Shift JIS sequence. The standard code for a prolonged sound mark is 0x815B.
Glyph | Shift JIS | Unicode | Designation |
---|---|---|---|
ー | 0x8130 | U+30FC | prolonged sound mark |
2SP | 0x8140 | U+3000 | ideographic space |
、 | 0x8141 | U+3001 | ideographic comma |
。 | 0x8142 | U+3002 | ideographic full stop |
・ | 0x8145 | U+30FB | katakana middle point |
: | 0x8146 | U+003A | colon |
! | 0x8149 | U+0021 | exclamation mark |
% | 0x8193 | U+0025 | percent sign |
- N.B. There is no clear reason why numerals are limited to 2 and 6, and Roman letters are limited to A, C, D, F, S, T, W - or why these glyphs are needed at all, seeing as US-ASCII is still available.
- (For what it's worth, the 9 redundant glyphs come from a serifed font, whereas the US-ASCII font is sans serif.)
- N.B. It is also not clear why (in this font/encoding) the TU hiragana has a "lowercase" version while many other hiragana are missing.
Glyph | Shift JIS | Unicode | Designation |
---|---|---|---|
2 | 0x8251 | U+0032 | digit 2 |
6 | 0x8255 | U+0036 | digit 6 |
A | 0x8260 | U+0041 | letter A |
C | 0x8262 | U+0043 | letter C |
D | 0x8263 | U+0044 | letter D |
F | 0x8265 | U+0046 | letter F |
S | 0x8272 | U+0053 | letter S |
T | 0x8273 | U+0054 | letter T |
W | 0x8276 | U+0057 | letter W |
あ | 0x82A0 | U+3042 | hiragana A |
い | 0x82A2 | U+3044 | hiragana I |
う | 0x82A4 | U+3046 | hiragana U |
え | 0x82A6 | U+3048 | hiragana E |
か | 0x82A9 | U+304B | hiragana KA |
が | 0x82AA | U+304C | hiragana GA |
き | 0x82AB | U+304D | hiragana KI |
く | 0x82AD | U+304F | hiragana KU |
こ | 0x82B1 | U+3053 | hiragana KO |
さ | 0x82B3 | U+3055 | hiragana SA |
し | 0x82B5 | U+3057 | hiragana SI |
じ | 0x82B6 | U+3058 | hiragana ZI |
す | 0x82B7 | U+3059 | hiragana SU |
た | 0x82BD | U+305F | hiragana TA |
だ | 0x82BE | U+3060 | hiragana DA |
ち | 0x82BF | U+3061 | hiragana TI |
っ | 0x82C1 | U+3063 | hiragana tu |
つ | 0x82C2 | U+3064 | hiragana TU |
て | 0x82C4 | U+3066 | hiragana TE |
と | 0x82C6 | U+3068 | hiragana TO |
な | 0x82C8 | U+306A | hiragana NA |
に | 0x82C9 | U+306B | hiragana NI |
の | 0x82CC | U+306E | hiragana NO |
は | 0x82CD | U+306F | hiragana HA |
ま | 0x82DC | U+307E | hiragana MA |
め | 0x82DF | U+3081 | hiragana ME |
ゃ | 0x82E1 | U+307E | hiragana MA |
よ | 0x82E6 | U+3081 | hiragana ME |
る | 0x82E9 | U+308B | hiragana RU |
れ | 0x82EA | U+308C | hiragana RE |
わ | 0x82ED | U+308F | hiragana WA |
を | 0x82F0 | U+3092 | hiragana WO |
ん | 0x82F1 | U+3093 | hiragana N |
- N.B. 0x8332 is not a legal Shift JIS sequence. The standard code for the BO katakana ボ is 0x837B.
- N.B. 0x8333 is not a legal Shift JIS sequence. The standard code for the MA katakana マ is 0x837D.
- N.B. It is not clear why (in this font/encoding) the I and O katakana have "lowercase" versions while many other katakana are missing. Also, the TU, YA, YU and YO katakana have only a lowercase version.
Glyph | Shift JIS | Unicode | Designation |
---|---|---|---|
ボ | 0x8332 | U+30DC | katakana BO |
マ | 0x8333 | U+30DE | katakana MA |
ィ | 0x8342 | U+30A3 | katakana ı |
イ | 0x8343 | U+30A4 | katakana I |
ウ | 0x8345 | U+30A6 | katakana U |
ォ | 0x8348 | U+30A9 | katakana o |
オ | 0x8349 | U+30AA | katakana O |
キ | 0x834C | U+30AD | katakana KI |
ク | 0x834E | U+30AF | katakana KU |
グ | 0x834F | U+30B0 | katakana GU |
ケ | 0x8350 | U+30B1 | katakana KE |
ゲ | 0x8351 | U+30B2 | katakana GE |
コ | 0x8352 | U+30B3 | katakana KO |
サ | 0x8354 | U+30B5 | katakana SA |
シ | 0x8356 | U+30B7 | katakana SI |
ジ | 0x8357 | U+30B8 | katakana ZI |
ス | 0x8358 | U+30B9 | katakana SU |
セ | 0x835A | U+30BB | katakana SE |
タ | 0x835E | U+30BF | katakana TA |
ッ | 0x8362 | U+30C3 | katakana tu |
テ | 0x8365 | U+30C6 | katakana TE |
ト | 0x8367 | U+30C8 | katakana TO |
ド | 0x8368 | U+30C9 | katakana DO |
ナ | 0x8369 | U+30CA | katakana NA |
ニ | 0x836A | U+30CB | katakana NI |
ノ | 0x836D | U+30CE | katakana NO |
フ | 0x8374 | U+30D5 | katakana HU |
ブ | 0x8375 | U+30D6 | katakana BU |
プ | 0x8376 | U+30D7 | katakana PU |
ベ | 0x8378 | U+30D9 | katakana BE |
ポ | 0x837C | U+30DD | katakana PO |
ム | 0x8380 | U+30E0 | katakana MU |
ャ | 0x8383 | U+30E3 | katakana ya |
ュ | 0x8385 | U+30E5 | katakana yu |
ョ | 0x8387 | U+30E7 | katakana yo |
ラ | 0x8389 | U+30E9 | katakana RA |
リ | 0x838A | U+30EA | katakana RI |
ル | 0x838B | U+30EB | katakana RU |
レ | 0x838C | U+30EC | katakana RE |
ロ | 0x838D | U+30ED | katakana RO |
ン | 0x8393 | U+30F3 | katakana N |
Glyph | Shift JIS | Unicode | Designation |
---|---|---|---|
暗 | 0x88C3 | U+6697 | kanji AN |
易 | 0x88D5 | U+6613 | kanji EKI |
移 | 0x88DA | U+79FB | kanji I |
印 | 0x88F3 | U+5370 | kanji IN |
押 | 0x899F | U+62BC | kanji Ō |
黄 | 0x89A9 | U+2EE9 U+9EC4 |
kanji KI |
下 | 0x89BA | U+4E0B | kanji SHITA |
可 | 0x89C2 | U+53EF | kanji KA |
画 | 0x89E6 | U+753B U+FAA3 |
kanji GA |
解 | 0x89F0 | U+89E3 | kanji KAI |
回 | 0x89F1 | U+56DE | kanji KAI2 |
各 | 0x8A65 | U+5404 | kanji ONOONO |
官 | 0x8AAF | U+5B98 | kanji KAN |
基 | 0x8AEE | U+57FA | kanji MOTO |
許 | 0x8B96 | U+8A31 | kanji MOTO2 |
Glyph | Shift JIS | Unicode | Designation |
---|---|---|---|
経 | 0x8C6F | U+7D4C | kanji KYŌ |
庫 | 0x8CC9 | U+5EAB | kanji KO |
向 | 0x8CFC | U+5411 | kanji MU |
行 | 0x8D73 | U+2F8F U+884C U+FA08 |
kanji GYŌ |
高 | 0x8D82 | U+2FBC U+9AD8 |
kanji TAKA |
合 | 0x8D87 | U+5408 | kanji GŌ |
作 | 0x8DEC | U+4F5C | kanji SAKU |
使 | 0x8E67 | U+4F7F | kanji SHI |
司 | 0x8E69 | U+53F8 | kanji TSUKASA |
始 | 0x8E6E | U+59CB | kanji SHI |
私 | 0x8E84 | U+79C1 | kanji WATASHI |
試 | 0x8E8E | U+8A66 | kanji SHI2 |
字 | 0x8E9A | U+5B57 | kanji JI |
時 | 0x8E9E | U+6642 | kanji TOKI |
斜 | 0x8ECE | U+659C | kanji SHA |
終 | 0x8F49 | U+7D42 | kanji TSUI |
所 | 0x8F8A | U+6240 | kanji SHO |
場 | 0x8FEA | U+5834 | kanji BA |
Glyph | Shift JIS | Unicode | Designation |
---|---|---|---|
色 | 0x9046 | U+2F8A U+8272 |
kanji IRO |
新 | 0x9056 | U+65B0 | kanji SHIN |
神 | 0x905F | U+795E U+FA19 |
kanji KAMI |
前 | 0x914F | U+524D | kanji MAE |
倉 | 0x9171 | U+5009 | kanji KURA |
窓 | 0x918B | U+7A93 | kanji MADO |
像 | 0x919C | U+50CF U+2F80B |
kanji ZŌ |
続 | 0x91B1 | U+7D9A | kanji ZOKU |
体 | 0x91CC | U+4F53 | kanji TAI |
替 | 0x91D6 | U+66FF | kanji TEI |
中 | 0x9286 | U+4E2D | kanji CHU |
低 | 0x92E1 | U+4F4E | kanji HIKU |
度 | 0x9378 | U+5EA6 | kanji TABI |
動 | 0x93AE | U+52D5 | katakana DO |
同 | 0x93AF | U+540C | kanji DO2 |
難 | 0x93EF | U+96E3 U+FA68 U+FAC7 |
kanji NAN |
入 | 0x93FC | U+2F0A U+5165 |
kanji JU |
- N.B. 0x9632 is not a legal Shift JIS sequence. The standard code for the MOTO kanji 本 is 0x967B.
Glyph | Shift JIS | Unicode | Designation |
---|---|---|---|
閉 | 0x95C2 | U+9589 | kanji HEI |
変 | 0x95CF | U+5909 | kanji HEN |
歩 | 0x95E0 | U+6B69 | kanji HO |
本 | 0x9632 | U+672C | kanji MOTO |
幕 | 0x968B | U+5E55 | kanji MAKU |
明 | 0x96BE | U+660E | kanji MEI |
面 | 0x96CA | U+2FAF U+9762 |
kanji MEN |
目 | 0x96DA | U+2F6C U+76EE |
kanji MOKU |
用 | 0x9770 | U+2F64 U+7528 |
kanji YŌ |
立 | 0x97A7 | U+2F74 U+7ACB |
kanji RITSU |
了 | 0x97B9 | U+4E86 U+F9BA |
kanji RYŌ |
令 | 0x97DF | U+4EE4 U+F9A8 |
kanji REI |
路 | 0x9848 | U+8DEF | kanji JI |
As for the first code page of the Japanese TSFFTahoma, it implements only the 0x20-0x7F range of characters, i.e., is limited to US-ASCII. This is consistent with the simplified logic used by the Japanese engine, where any high-bit byte (in the 0x80-0xFF range) is treated as the start of a two-byte sequence (in actual Shift JIS some high-bit bytes are interpreted as half-width kana).
It must be noted that, as compared to the separate .fnt files, TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1357) and is essentially useless/unusable.
- The Japanese engine requires all four .fnt files to be present (bails out if any of them are missing) and uses them for all of the Vanilla text strings, which only contain double-byte control codes. Thus, under normal conditions, TSFFTahoma remains completely unused in the Japanese version.
- If the US engine is used on the Japanese game data, then the .fnt files are ignored (obviously), and the incomplete TSFFTahoma is used to render the Japanese text strings as well as the few English strings supplied by the EXE. Due to the limited character set, many strings end up broken.
Possibly the incomplete Shift JIS code pages present in the Japanese TSFFTahoma correspond to an early atttempt to implement all the glyphs at Oni level. As the number of kanji increased, supposedly, the TSFT grew prohibitively large due to the use of 8-bit grayscale storage for the pixel data, and the size taken up by the sparsely populated TSGA also increased out of proportion with the rest of the game data. It is not clear why TSFFTahoma wasn't cleaned up after the engine switched to separate .fnt files.
At the time of writing, the code points and pixel data in the Japanese .fnt files have not been thoroughly analyzed and compared with JIS X 0208. We know that 1357 glyphs are implemented, across 27 "lead bytes" (roughly 50 kuten rows). This is much smaller than the full kuten plane, and makes sense in terms of space efficiency. We also know that some code points are non-standard (rearranged) as compared to regular Shift JIS, although we do not yet know if this rearrangement is consistent with any common variation of Shift JIS. As long as Japanese game data contains text strings that match the encoding, non-standard code points are not a problem (but should be kept in mind).
Text anomalies
Ellipsis issue
Unlike for other Western versions (UK English, French, German, Italian, Spanish, Russian), the US engine treats high-bit characters as part of a two-byte control sequence (a provision for Asian encodings), and therefore fails to render any character from the extended ASCII range. The Vanilla English Oni only has one such character, the ellipsis "…" (0xC9), accidentally used in These Two text consoles in place of three consecutive dots. The two lines including a "…" are cut off at the offending character.
(A1,A0) issue
Unlike for the Japanese version, where non-standard Shift JIS sequences are explicitly allowed in the .fnt files, the Chinese version does not have a code table and relies on a standard EUC-CN encoding, with exactly 8836 code points (94x94). A proper EUC-CN control sequence consists of two bytes that are both in the range 0xA1-0xFE (single US-ASCII characters are also allowed in theory, but do not occur in Vanilla game data).
The text strings in the Vanilla game data of the Chinese version mostly conforms to the EUC-CN scheme, except for the rare occurrence of the (A1,A0) sequence. This is not a valid control sequence under any common extensions of EUC-CN, and in any case it does not correspond to any pixel data within xf_font.dat, which only covers the standard 94x94 kuten plane, corresponding to a strict 0xA1-0xFE range for the two encoding bytes.
Overtall text
Although not strictly speaking a font issue, some of Oni's text fails to render because it doesn't fit vertically into a fixed-size frame (such as a text consoles). This is known to happen for These Two consoles in the English version, and possibly for other screens in other language versions.
Overlong text
Chinese glyphs have a fixed size of 16x16 pixels and do not fit horizontally into the drop-down lists (Vanilla Oni has only two such lists, in the Options menu, for Resolution and Difficulty).