Jump to content

OBD:Text encoding: Difference between revisions

→‎Non-translated US-ASCII: this is getting out of control...
(→‎Non-translated US-ASCII: this is getting out of control...)
Line 886: Line 886:
The text strings in the Chinese version mostly conform to the EUC-CN scheme, but there are two recurrent invalid characters, as well as one instance of non-translated US-ASCII (!!!).
The text strings in the Chinese version mostly conform to the EUC-CN scheme, but there are two recurrent invalid characters, as well as one instance of non-translated US-ASCII (!!!).
====Non-translated US-ASCII====
====Non-translated US-ASCII====
ASCII strings are much more harmful when handled by xfhsm_oni.dll, as compared to the two invalid code points (A3,A0) and (A3,0x89), because pairs of US-ASCII bytes, misinterpreted as EUC-CN code points, end up referencing completely strange memory regions (outside the region occupied by xf_font.dat). Unfortunately, there are a few ASCII strings that xfhsm_oni.dll can come across even during regular gameplay, and many more arise if one allows for modding.
=====Count on it=====
The following string in SUBTsubtitles has not been translated into Chinese:
The following string in SUBTsubtitles has not been translated into Chinese:
:Barabas:  Count on it. When I get through with them they're...
:Barabas:  Count on it. When I get through with them they're...
Being encoded as plain US-ASCII, this string is entirely illegal considering the limited implementation of EUC-CN by xfhsm_oni.dll, which does not detect US-ASCII as single-byte code points and keeps interpreting pairs of ASCII bytes as (invalid) quwei indices. Through lucky coincidence, the string has an even number of printable bytes, so that the null character is still in a suitable place for terminating the string (the EUN-CN parser will see it as a null lead-byte and will not keep reading further data). However, the string still consists of 31 invalid two-byte code points (not counting the null). As a further lucky coincidence, this string is never read by Oni's engine, because the subtitle's handle (02_05_05) is one of those that have been clobbered by the spurious double-null (see [[#Chinese_SUBT_issues|"Chinese_SUBT_issues"]] below).
Being encoded as plain US-ASCII, this string is entirely illegal considering the limited implementation of EUC-CN by xfhsm_oni.dll, which does not detect US-ASCII as single-byte code points and keeps interpreting pairs of ASCII bytes as (invalid) quwei indices. Through lucky coincidence, the string has an even number of printable bytes, so that the null character is still in a suitable place for terminating the string (the EUN-CN parser will see it as a null lead-byte and will not keep reading further data). However, the string still consists of 31 invalid two-byte code points (not counting the null). As a further lucky coincidence, this string is never read by Oni's engine, because the subtitle's handle (02_05_05) is one of those that have been clobbered by the spurious double-null (see [[#Chinese_SUBT_issues|"Chinese_SUBT_issues"]] below). If it wasn't for the clobbering, this subtitle would likely cause a crash.
=====Pre-beta ONLDs=====
The "level definitions" ([[ONLD]]s) of [[Pre-beta_content#Cut_levels|pre-beta levels]] are never seen in vanilla Oni, but would appear in the "Load Game" dialog if a valid level#_Final.dat were to be supplied at startup (e.g. by a mod). Since xfhsm_oni.dll does not actually support US-ASCII, any untranslated ONLDs are potentially disruptive.
 
The following 8 pre-beta ONLDs were fully translated: "The Airport Part Deux" (level_05), "Obsolete" (level_07), "The Arena of Pain" (level_30), "Crossing Zone" (level_31), "Pit" (level_32), "Crossing Zone Too" (level_33), "Capture" (level_34), "Territories" (level_35).
 
The following 8 pre-beta ONLDs remained as US-ASCII: "Test_Stuff" (level_36), "AlexTestSite" (level_55), "Experimental_II" (level_66), "MARTY'S SOUND CORRIDOR" (level_68), "FiringRange" (level_71), "One Room" (level_77), "One Room 2" (level_88) and "Test Barn II" (level_99).
 
The most awkward case is that of the string "BGI HQ" ([[ONLD]]level_16), which was translated only partly: "HQ" was replaced with a pair of GB 2312 glyphs, but the first four characters "BGI " remained as plain ASCII (i.e., as two illegal EUC-CN code points).
=====Cheat messages=====
None of the 38 cheat messages was translated into Chinese (!!!), so that means 38 more strings entirely made of illegal EUC-CN code points. Any time a cheat is entered, xfhsm_oni.dll attempts to display one of the following strings, which almost always causes a crash on modern Windows systems. Note how the null byte does not interrupt the input if it occurs in a trail-byte position.
{|
|
{{divhide| List of invalid EUC-CN strings triggered by cheats|align=left}}
{|border=1 cellspacing=0 cellpadding=3
!Cheat
!Invalid double-byte arrays (ASCII)
|-valign=top
!shapeshifter
|<tt>Ch<u>an</u>ge<u> C</u>ha<u>ra</u>ct<u>er</u>s <u>En</u>ab<u>le</u>d°<br /><u>Ch</u>an<u>ge</u> C<u>ha</u>ra<u>ct</u>er<u>s </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!liveforever
|<tt>In<u>vi</u>nc<u>ib</u>il<u>it</u>y <u>En</u>ab<u>le</u>d°<br /><u>In</u>vi<u>nc</u>ib<u>il</u>it<u>y </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!touchofdeath
|<tt>Om<u>ni</u>po<u>te</u>nc<u>e </u>En<u>ab</u>le<u>d°</u>to<u>uc</u>ho<u>fd</u>ea<u>th</u><br /><u>Om</u>ni<u>po</u>te<u>nc</u>e <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!canttouchthis
|<tt>Un<u>st</u>op<u>pa</u>bl<u>e </u>En<u>ab</u>le<u>d°</u>ca<u>nt</u>to<u>uc</u>ht<u>hi</u>s°<br /><u>Un</u>st<u>op</u>pa<u>bl</u>e <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!fatloot
|<tt>Fa<u>t </u>Lo<u>ot</u> R<u>ec</u>ei<u>ve</u>d°</tt>
|-valign=top
!glassworld
|<tt>Gl<u>as</u>s <u>Fu</u>rn<u>it</u>ur<u>e </u>En<u>ab</u>le<u>d°</u>gl<u>as</u>sw<u>or</u>ld<br /><u>Gl</u>as<u>s </u>Fu<u>rn</u>it<u>ur</u>e <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!winlevel
|<tt>In<u>st</u>an<u>tl</u>y <u>Wi</u>n <u>Le</u>ve<u>l°</u>wi<u>nl</u>ev<u>el</u></tt>
|-valign=top
!loselevel
|<tt>In<u>st</u>an<u>tl</u>y <u>Lo</u>se<u> L</u>ev<u>el</u></tt>
|-valign=top
!bighead
|<tt>Bi<u>g </u>He<u>ad</u> E<u>na</u>bl<u>ed</u><br /><u>Bi</u>g <u>He</u>ad<u> D</u>is<u>ab</u>le<u>d°</u></tt>
|-valign=top
!minime
|<tt>Mi<u>ni</u> M<u>od</u>e <u>En</u>ab<u>le</u>d°<br /><u>Mi</u>ni<u> M</u>od<u>e </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!superammo
|<tt>Su<u>pe</u>r <u>Am</u>mo<u> M</u>od<u>e </u>En<u>ab</u>le<u>d°</u>su<u>pe</u>ra<u>mm</u>o°<br /><u>Su</u>pe<u>r </u>Am<u>mo</u> M<u>od</u>e <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!reservoirdogs
|<tt>La<u>st</u> M<u>an</u> S<u>ta</u>nd<u>in</u>g <u>En</u>ab<u>le</u>d°<br /><u>La</u>st<u> M</u>an<u> S</u>ta<u>nd</u>in<u>g </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!roughjustice
|<tt>Ga<u>tl</u>in<u>g </u>Gu<u>ns</u> E<u>na</u>bl<u>ed</u><br /><u>Ga</u>tl<u>in</u>g <u>Gu</u>ns<u> D</u>is<u>ab</u>le<u>d°</u></tt>
|-valign=top
!chenille
|<tt>Da<u>od</u>an<u> P</u>ow<u>er</u> E<u>na</u>bl<u>ed</u><br /><u>Da</u>od<u>an</u> P<u>ow</u>er<u> D</u>is<u>ab</u>le<u>d°</u></tt>
|-valign=top
!behemoth
|<tt>Go<u>dz</u>il<u>la</u> M<u>od</u>e <u>En</u>ab<u>le</u>d°<br /><u>Go</u>dz<u>il</u>la<u> M</u>od<u>e </u>Di<u>sa</u>bl<u>ed</u></tt>
|-valign=top
!elderrune
|<tt>Re<u>ge</u>ne<u>ra</u>ti<u>on</u> E<u>na</u>bl<u>ed</u><br /><u>Re</u>ge<u>ne</u>ra<u>ti</u>on<u> D</u>is<u>ab</u>le<u>d°</u></tt>
|-valign=top
!moonshadow
|<tt>Ph<u>as</u>e <u>Cl</u>oa<u>k </u>En<u>ab</u>le<u>d°</u>mo<u>on</u>sh<u>ad</u>ow<br /><u>Ph</u>as<u>e </u>Cl<u>oa</u>k <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!munitionfrenzy
|<tt>We<u>ap</u>on<u>s </u>Lo<u>ck</u>er<u> C</u>re<u>at</u>ed</tt>
|-valign=top
!fistsoflegend
|<tt>Fi<u>st</u>s <u>Of</u> L<u>eg</u>en<u>d </u>En<u>ab</u>le<u>d°</u>fi<u>st</u>so<u>fl</u>eg<u>en</u>d°<br /><u>Fi</u>st<u>s </u>Of<u> L</u>eg<u>en</u>d <u>Di</u>sa<u>bl</u>ed</tt>
|-valign=top
!killmequick
|<tt>Ul<u>tr</u>a <u>Mo</u>de<u> E</u>na<u>bl</u>ed<br /><u>Ul</u>tr<u>a </u>Mo<u>de</u> D<u>is</u>ab<u>le</u>d°<u>Ul</u>tr<u>a </u>Mo<u>de</u> E<u>na</u>bl<u>ed</u></tt>
|-valign=top
!carousel
|<tt>Sl<u>ow</u> M<u>ot</u>io<u>n </u>En<u>ab</u>le<u>d°</u>ca<u>ro</u>us<u>el</u><br /><u>Sl</u>ow<u> M</u>ot<u>io</u>n <u>Di</u>sa<u>bl</u>ed</tt>
|}
{{divhide|end}}
|}
 
=====Debug printout and console=====
Oni has a well-hidden [[Developer Mode]] in which it can print informational output directly to the screen instead of writing to a text file. There are fully automatic warnings from the engine (e.g. about too many visible polygons or too many particles), or more or less regular printout (e.g., about a character's current animation status) that can be toggled through [[BSL:Variables|script variables]], or custom "dprint" messages that the developers used for visual feedback while testing [[BSL|scripts]]. Dev mode also has a togglable command line ("CMD: ") for entering script commands in real time. Both the debug printout and the command line use the main glyph-rendering pipeline (intercepted by xfhsm_oni.dll), with a small font size. This makes Dev mode essentially unusable in Chinese Oni, as most if not all of the debug printout or console output will be plain ASCII.
 
Interestingly, there are ''some'' types of primitive debug printout that is not intercepted by xfhsm_oni.dll and thus is displayed normally using the smallest-sized TSFT from level0_Final's TSFFTahoma. All (most?) of the primitive printout is available without Dev mode. There is the Ctrl+Shift+Y hotkey (FPS display), some HUD-like overlays toggled by [BSL:Variables|script variables]] (e.g., chr_debug_characters), and finally some 3D sprites added to the game (e.g., health indicators or name labels displayed above a character's head).


====(A3,89)====
====(A3,89)====