19,494
edits
(→Over-long text: the more i study the Chinese Oni, the more I like Japanese ^_^) |
m (applied SectionLink template) |
||
(13 intermediate revisions by 3 users not shown) | |||
Line 3: | Line 3: | ||
:(An overview of the known language versions can be found [[OBD:Versions|HERE]], whereas localized content is detailed [[OBD:Localization|HERE]].) | :(An overview of the known language versions can be found [[OBD:Versions|HERE]], whereas localized content is detailed [[OBD:Localization|HERE]].) | ||
Depending on the language version, vanilla Oni uses one of the following five encodings to render text: | Depending on the language version, vanilla Oni uses one of the following five encodings to render text: | ||
*The original US version uses a trimmed-down [[wp:Mac_OS_Roman|Mac OS Roman]] code page that is effectively limited to US-ASCII (96 code points used, 256 available). | *The original US version uses a trimmed-down [[wp:Mac_OS_Roman|Mac OS Roman]] code page that is effectively limited to [[wp:ASCII|US-ASCII]] (96 code points used, 256 available). | ||
*European localizations (UK English, French, Italian, Spanish, German) use a custom version of Mac OS Roman (192 code points used, 256 available). | *European localizations (UK English, French, Italian, Spanish, German) use a custom version of Mac OS Roman (192 code points used, 256 available). | ||
*The Russian localization uses a (nearly) full implementation of the [[wp:Windows-1251|Windows-1251]] (Cyrillic) code page (224 code points used, 256 available). | *The Russian localization uses a (nearly) full implementation of the [[wp:Windows-1251|Windows-1251]] (Cyrillic) code page (224 code points used, 256 available). | ||
Line 10: | Line 10: | ||
Properties of the fonts that are eventually used to render the text (via the encoding) are briefly described throughout the page. | Properties of the fonts that are eventually used to render the text (via the encoding) are briefly described throughout the page. | ||
:(A more thorough overview of the glyphs can be found [[/Fonts|HERE]].) | :(A more thorough overview of the glyphs can be found [[/Fonts|HERE]].) | ||
==Encodings== | ==Encodings== | ||
Line 115: | Line 114: | ||
|-bgcolor=orange | |-bgcolor=orange | ||
!bgcolor=silver|0xF... | !bgcolor=silver|0xF... | ||
!bgcolor=black|[[ | !bgcolor=black|[[Image:Platform-Mac.png|12px]] | ||
!Ò | !Ò | ||
!bgcolor=black|<span style="color:darkslategray">Ú</span> | !bgcolor=black|<span style="color:darkslategray">Ú</span> | ||
Line 130: | Line 129: | ||
|} | |} | ||
;Minor notes | ;Minor notes | ||
*The MacRoman layout was apparently "borrowed" before 1998, when Mac OS 8.5 came out and the [[wp:Currency sign ( | *The MacRoman layout was apparently "borrowed" before 1998, when Mac OS 8.5 came out and the [[wp:Currency sign (generic)|international currency sign]] a.k.a. scarab (¤), at 0xDB, was replaced with the euro symbol (€). | ||
*The actual font (see [[/Fonts|HERE]]) has some unusual typographical features, such as a single-stroke Yen/Yuan symbol (Ұ) and a vertical-stroke cent symbol (¢). | *The actual font (see [[/Fonts|HERE]]) has some unusual typographical features, such as a single-stroke Yen/Yuan symbol (Ұ) and a vertical-stroke cent symbol similar to Unicode's Fullwidth Cent Sign (¢) character as seen in Windows Arial (note to Mac users: don't be confused, as this character will appear with a diagonal stroke on your system like the regular '¢' character). | ||
;Major notes | ;Major notes | ||
*Some of the removed glyphs (most importantly ß, ù and û, but also Ê, Ú and ú) occur in [[wp:Languages of the European Union#Knowledge|common European languages]]. This made the US TSFFTahoma unsuitable for [[wikt:EFIGS|EFIGS]] localizations, requiring the creation of a new version (see below). | *Some of the removed glyphs (most importantly ß, ù and û, but also Ê, Ú and ú) occur in [[wp:Languages of the European Union#Knowledge|common European languages]]. This made the US TSFFTahoma unsuitable for [[wikt:EFIGS|EFIGS]] localizations, requiring the creation of a new version (see below). | ||
*The US engine actually cannot interpret any code points beyond the US-ASCII range (first 6 rows, white background), notably failing on "…" | *The US engine actually cannot interpret any code points beyond the US-ASCII range (first 6 rows, white background), notably failing on 0xC9's "…". This is because of a nominal but unused provision for Asian text encodings. See {{SectionLink||Ellipsis issue}} for details. | ||
---- | ---- | ||
===European=== | ===European=== | ||
The code page used by the five Western European versions (UK English, French, German, Spanish and Italian) is slightly different from the trimmed-down Mac OS Roman. | The code page used by the five Western European versions (UK English, French, German, Spanish and Italian) is slightly different from the trimmed-down Mac OS Roman. | ||
*It tends to the needs of European localizations by adding back the following characters:<br>German ß; French Ê and û; French/Italian ù; Spanish/Italian Ú and ú (relatively rare). | *It tends to the needs of European localizations by adding back the following characters:<br>German ß; French Ê and û; French/Italian ù; Spanish/Italian Ú and ú (relatively rare). | ||
:'''N.B.''' The characters Æ and ÿ are not reinstated, despite their (very rare) occurrence in French script. | :'''N.B.''' The characters Æ and ÿ are not reinstated, despite their (very rare) occurrence in French script. | ||
*Awkwardly enough, the six characters are not restored in their original positions (grey-on-black), but take the place of math symbols.<br/>Four more "math" positions are inexplicably filled with three duplicate characters (œ, ¡ and ª) and a truly enigmatic ʖ̇ , which doesn't seem to occur in any known language and has no dedicated code point in Unicode. | *Awkwardly enough, the six characters are not restored in their original positions (grey-on-black), but take the place of math symbols.<br/>Four more "math" positions are inexplicably filled with three duplicate characters (œ, ¡ and ª) and a truly enigmatic ʖ̇ , which doesn't seem to occur in any known language and has no dedicated code point in Unicode (the character you see here was constructed from Unicode's U+0296 Latin Letter Inverted Glottal Stop (ʖ) plus U+0307 Combining Dot Above. | ||
:'''N.B.''' The broken italic font variants (see [[/Fonts | :'''N.B.''' The broken italic font variants (see "Italic" section of [[/Fonts]] once it exists) do not fully implement the 10 new glyphs and use a regular question mark instead of the ʖ̇. | ||
{|border=1 cellpadding=3 cellspacing=0 | {|border=1 cellpadding=3 cellspacing=0 | ||
|-bgcolor=silver | |-bgcolor=silver | ||
Line 260: | Line 260: | ||
---- | ---- | ||
===Cyrillic=== | ===Cyrillic=== | ||
In the Russian version of Oni, TSFFTahoma implements the [[wp:Windows-1251|Windows-1251]] (Cyrillic) code page, with some deviations. | In the Russian version of Oni, TSFFTahoma implements the [[wp:Windows-1251|Windows-1251]] (Cyrillic) code page, with some deviations. | ||
Line 318: | Line 319: | ||
|} | |} | ||
;Italic fonts | ;Italic fonts | ||
:The Russian version only provides an implementation of Windows-1251 for regular and bold fonts. The five italic fonts (7pt, 9pt, 10pt, 12pt and 14pt) have exactly the same data (pixels and glyph descriptors) as for the European iteration of Mac OS Roman. This makes sense because italic fonts are inherently broken (see [[/Fonts | :The Russian version only provides an implementation of Windows-1251 for regular and bold fonts. The five italic fonts (7pt, 9pt, 10pt, 12pt and 14pt) have exactly the same data (pixels and glyph descriptors) as for the European iteration of Mac OS Roman. This makes sense because italic fonts are inherently broken (see "Italic" section of [[/Fonts]] once it exists) and thus not used by any text in vanilla Oni. | ||
; | ;14pt bold font | ||
:Somewhat surprisingly, the | :Somewhat surprisingly, the 14pt bold TSFT in the Russian version of TSFFTahoma does not have a complete Windows-1251 code page either. Instead it is limited to the US-ASCII character set (including the "printable delete" box at code point 0x7F), i.e., the upper section of the above table (white background). This causes no issue in vanilla Oni, but only because there is no text that uses 14pt bold. | ||
;Incomplete transparency | ;Incomplete transparency | ||
:A unique "feature" of the Russian/Cyrillic TSFFTahoma is that all the characters in the extended ASCII range (0x80-0xFF) have a slightly opaque background (about 3% opacity) in the regular (non-bold) font variant. This isn't visible ingame, but only because the engine (re)posterizes all the glyphs into 4-bit grayscale when rendering (so that only opacities above 6% are visible). | :A unique "feature" of the Russian/Cyrillic TSFFTahoma is that all the characters in the extended ASCII range (0x80-0xFF) have a slightly opaque background (about 3% opacity) in the regular (non-bold) font variant. This isn't visible ingame, but only because the engine (re)posterizes all the glyphs into 4-bit grayscale when rendering (so that only opacities above 6% are visible). | ||
;Glyph alignment and spacing | ;Glyph alignment and spacing | ||
:Last but not least, some fonts in the Russian TSFFTahoma have inconsistent vertical alignment, the most blatant example being | :Last but not least, some fonts in the Russian TSFFTahoma have inconsistent vertical alignment, the most blatant example being 12pt bold: some glyphs are one pixel shorter or taller than the full line height (ascender+descender), without a properly compensated vertical glyph offset; others simply have pixels that are not properly aligned within a glyph's rectangle. Besides, many glyphs have excessive padding to the left and/or right of a character, which affects readability.<br />'''N.B.''' There are other examples of poor alignment, e.g., for 12pt bold, the character 0x9C (њ) has its right side cut off and is thus unusable (luckily it doesn't occur in Russian script). | ||
---- | ---- | ||
===Chinese=== | ===Chinese=== | ||
The Chinese version of Oni is unique in how the main game code resides in '''Oni.dat''', a renamed copy of the original Oni.exe from the US version that is executed indirectly by a wrapper app called '''oni.exe''', alongside a custom text engine, '''xfhsm_oni.dll'''. The latter DLL intercepts any text about to be displayed by "Oni.dat", first reducing it to a set of two-byte control sequences, and then (if all goes well) to a set of custom glyphs, with pixel data coming from an external font file, '''xf_font.dat'''. | The Chinese version of Oni is unique in how the main game code resides in '''Oni.dat''', a renamed copy of the original Oni.exe from the US version that is executed indirectly by a wrapper app called '''oni.exe''', alongside a custom text engine, '''xfhsm_oni.dll'''. The latter DLL intercepts any text about to be displayed by "Oni.dat", first reducing it to a set of two-byte control sequences, and then (if all goes well) to a set of custom glyphs, with pixel data coming from an external font file, '''xf_font.dat'''. | ||
Line 346: | Line 348: | ||
A valid EUC-CN code point (with both bytes in the 0xA1-0xFE range) results in a valid offset pointing to an actual glyph for the relevant font, whereas illegal bytes or byte pairs may point to a different glyph within the same font, or to a glyph of the other font, or to a completely unrelated memory region. In the worst case scenario, pixel data will be read at 486,432 bytes (~475 kB) ahead of the actual pixel data (if displaying the code point 01,00 for the large font) or at 3008-3040 bytes (~3 kB) past the actual pixel data (if displaying the code point FF,FF for the small font). | A valid EUC-CN code point (with both bytes in the 0xA1-0xFE range) results in a valid offset pointing to an actual glyph for the relevant font, whereas illegal bytes or byte pairs may point to a different glyph within the same font, or to a glyph of the other font, or to a completely unrelated memory region. In the worst case scenario, pixel data will be read at 486,432 bytes (~475 kB) ahead of the actual pixel data (if displaying the code point 01,00 for the large font) or at 3008-3040 bytes (~3 kB) past the actual pixel data (if displaying the code point FF,FF for the small font). | ||
Reading garbage pixel data shouldn't be causing memory corruption per se (merely nonsensical/garbled text), but if similar out-of-bounds pointers occur for glyph rendering, then xfhsm_oni.dll may occasionally overwrite its own memory or even Oni's. This has not been thoroughly investigated, but it seems advisable to ensure that all text consists of valid EUC-CN code points (which is unfortunately not the case, see | Reading garbage pixel data shouldn't be causing memory corruption per se (merely nonsensical/garbled text), but if similar out-of-bounds pointers occur for glyph rendering, then xfhsm_oni.dll may occasionally overwrite its own memory or even Oni's. This has not been thoroughly investigated, but it seems advisable to ensure that all text consists of valid EUC-CN code points (which is unfortunately not the case, see {{SectionLink||Invalid EUC-CN input}}). | ||
Line 867: | Line 869: | ||
{{divhide|end}} | {{divhide|end}} | ||
As for the first code page of the Japanese TSFFTahoma, it implements only the 0x20-0x7F range of characters, i.e., is limited to | As for the first code page of the Japanese TSFFTahoma, it implements only the 0x20-0x7F range of characters, i.e., is limited to US-ASCII. This is consistent with the simplified logic used by the Japanese engine, where any high-bit byte (in the 0x80-0xFF range) is treated as the start of a two-byte sequence. (In actual Shift JIS some high-bit bytes are interpreted as half-width kana, a feature that isn't supported by Oni's engine.) | ||
It must be noted that, as compared to the separate .fnt files, the Japanese TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1,357) and is essentially useless/unusable except for its US-ASCII part. | It must be noted that, as compared to the separate .fnt files, the Japanese TSFFTahoma provides a very rudimentary implementation of JIS X 0208 (only coding for 154 double-byte glyphs, whereas the .fnt files implement 1,357) and is essentially useless/unusable except for its US-ASCII part. | ||
Line 879: | Line 881: | ||
==Text anomalies== | ==Text anomalies== | ||
===Ellipsis issue=== | ===Ellipsis issue=== | ||
Unlike other Western versions (UK English, French, German, Italian, Spanish, Russian), the US engine treats | Unlike other Western versions (UK English, French, German, Italian, Spanish, Russian), the US engine treats characters above 0x7F as part of a two-byte control sequence (an unused provision for Asian encodings), and therefore fails to render any character from the extended ASCII range. This happens twice in English Oni because the ellipsis character (…), encoded as 0xC9, was accidentally used in <u>[[Quotes/Consoles/level_19d|these]]</u> <u>[[Quotes/Consoles/level_19e|two]]</u> text consoles instead of three consecutive periods (probably auto-substituted by a text editor). The result is that the two lines using a "…" are cut off at the offending character. | ||
===Invalid EUC-CN input=== | ===Invalid EUC-CN input=== | ||
Line 1,006: | Line 1,008: | ||
|bytes 0-1 | |bytes 0-1 | ||
|-valign=top | |-valign=top | ||
![[Quotes/Weapons# | ![[Quotes/Weapons#vdg|WPgew6_vdg]] | ||
! | ! | ||
|Hint: Shots disable one or more enemies at close range. Attack or escape while victims are disoriented.° | |Hint: Shots disable one or more enemies at close range. Attack or escape while victims are disoriented.° | ||
|bytes 6-7 | |bytes 6-7 | ||
|-valign=top | |-valign=top | ||
![[Quotes/Weapons# | ![[Quotes/Weapons#scream|WPgew9_scr]] | ||
! | ! | ||
|Hint: The cannon masks its wielder's lifeforce from the entity, but any life that ventures too near it will be drained.° | |Hint: The cannon masks its wielder's lifeforce from the entity, but any life that ventures too near it will be drained.° | ||
Line 1,186: | Line 1,188: | ||
The following string in SUBTsubtitles has not been translated into Chinese: | The following string in SUBTsubtitles has not been translated into Chinese: | ||
:Barabas: Count on it. When I get through with them they're... | :Barabas: Count on it. When I get through with them they're... | ||
Being encoded as plain US-ASCII, this string is entirely illegal considering the limited implementation of EUC-CN by xfhsm_oni.dll, which does not detect US-ASCII as single-byte code points and keeps interpreting pairs of ASCII bytes as (invalid) quwei indices. Through lucky coincidence, the string has an even number of printable bytes, so that the null character is still in a suitable place for terminating the string (the EUN-CN parser will see it as a null lead-byte and will not keep reading further data). However, the string still consists of 31 invalid two-byte code points (not counting the null). As a further lucky coincidence, this string is never read by Oni's engine, because the subtitle's handle (02_05_05) is one of those that have been clobbered by the spurious double-null (see | Being encoded as plain US-ASCII, this string is entirely illegal considering the limited implementation of EUC-CN by xfhsm_oni.dll, which does not detect US-ASCII as single-byte code points and keeps interpreting pairs of ASCII bytes as (invalid) quwei indices. Through lucky coincidence, the string has an even number of printable bytes, so that the null character is still in a suitable place for terminating the string (the EUN-CN parser will see it as a null lead-byte and will not keep reading further data). However, the string still consists of 31 invalid two-byte code points (not counting the null). As a further lucky coincidence, this string is never read by Oni's engine, because the subtitle's handle (02_05_05) is one of those that have been clobbered by the spurious double-null (see {{SectionLink||Chinese SUBT issues}}). If it wasn't for the clobbering, the game would crash upon displaying this subtitle. | ||
=====Pre-beta ONLDs===== | =====Pre-beta ONLDs===== | ||
The "level definitions" ([[ONLD]]s) of [[Pre-beta_content#Cut_levels|pre-beta levels]] are never seen in vanilla Oni, but would appear in the "Load Game" dialog if a valid level#_Final.dat were to be supplied at startup (e.g. by a mod). Since xfhsm_oni.dll does not actually support US-ASCII, any untranslated ONLDs are potentially disruptive. | The "level definitions" ([[ONLD]]s) of [[Pre-beta_content#Cut_levels|pre-beta levels]] are never seen in vanilla Oni, but would appear in the "Load Game" dialog if a valid level#_Final.dat were to be supplied at startup (e.g. by a mod) and unlocked in persist.dat. Since xfhsm_oni.dll does not actually support US-ASCII, any untranslated ONLDs are potentially disruptive. | ||
The following 8 pre-beta ONLDs were fully translated: "The Airport Part Deux" (level_05), "Obsolete" (level_07), "The Arena of Pain" (level_30), "Crossing Zone" (level_31), "Pit" (level_32), "Crossing Zone Too" (level_33), "Capture" (level_34), "Territories" (level_35). | The following 8 pre-beta ONLDs were fully translated: "The Airport Part Deux" (level_05), "Obsolete" (level_07), "The Arena of Pain" (level_30), "Crossing Zone" (level_31), "Pit" (level_32), "Crossing Zone Too" (level_33), "Capture" (level_34), "Territories" (level_35). | ||
Line 1,278: | Line 1,280: | ||
===Over-tall text=== | ===Over-tall text=== | ||
Although not strictly speaking a font issue, some of Oni's text fails to render because it doesn't fit vertically into a fixed-size frame (such as a [[: | Although not strictly speaking a font issue, some of Oni's text fails to render because it doesn't fit vertically into a fixed-size frame (such as a [[:Image:DATA_CONSOLE.png|text console]]). This is known to happen for [[Quotes/Consoles/level_1e|These]] [[Quotes/Consoles/level_8b|Two]] consoles in the English version, and possibly for other screens in other language versions. | ||
===Over-long text=== | ===Over-long text=== | ||
Line 1,294: | Line 1,296: | ||
====Long UI text in Chinese Oni==== | ====Long UI text in Chinese Oni==== | ||
Oni's ingame UI is stylized as a futuristic computer screen (it is supposed to be Konoko's "Data Comlink") and has fixed-width frames reserved for text display. Large amounts of text can appear in the text console frame, or in the upper and lower sections of the Help menu (F1). It turns out that these frames have enough width to accommodate 26.5 16x16 glyphs (text console frame) or 19.5 glyphs (Help menu frame), | Oni's ingame UI is stylized as a futuristic computer screen (it is supposed to be Konoko's "Data Comlink") and has fixed-width frames reserved for text display. Large amounts of text can appear in the text console frame, or in the upper and lower sections of the Help menu (F1). It turns out that these frames have enough width to accommodate 26.5 16x16 glyphs (text console frame) or 19.5 glyphs (Help menu frame), therefore one would expect the Chinese text renderer to wrap lines around at 26 or 19 characters, respectively. Unfortunately the lines are wrapped around at 27 and 20, so the right half of the last glyph on every long line is cut off. | ||
The Japanese Oni consistently adjusts the carriage return depending on the glyph dimensions (font size), so that the last glyph in a wrapped-around line always fits into the frame and is displayed completely. (The Japanese engine also allows for variable-width US-ASCII characters, and seems to correctly handle the carriage return for any mix of JIS and ASCII.) | The Japanese Oni consistently adjusts the carriage return depending on the glyph dimensions (font size), so that the last glyph in a wrapped-around line always fits into the frame and is displayed completely. (The Japanese engine also allows for variable-width US-ASCII characters, and seems to correctly handle the carriage return for any mix of JIS and ASCII.) |