OBD:Instance file format: Difference between revisions

OBD:Instance file format (view source)

Revision as of 03:32, 19 July 2014

105 bytes removed , 19 July 2014

major wording

Iritscen

Bureaucrats, Interface administrators, Suppressors, Administrators

19,518

edits

@@ Line 1: / Line 1: @@
 {{OBD Home}}
 :''For other files ending in ".dat", see [[Oni (folder)]].''
-:''You should read the [[OBD:Terminology]] page before this one.''
+:''You should read the [[OBD:Terminology|OBD Terminology]] page before this one.''
 Files in GameDataFolder/ named "level[0-19]_Final.dat", together with ".raw" and sometimes ".sep" counterparts, contain the game data for Oni. The [[Raw|documentation for raw and separate files]] can be read after this page.
@@ Line 9: / Line 9: @@
 ==Backwards and garbage data==
-During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory, and thus when when we read the data in the files with a hex editor, we can see various eccentricities such as blank space and garbage data that represented various RAM contents from the developer's PC.
+During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory, and thus when when we read the data in the files with a hex editor, we can see eccentricities such as blank space and garbage data that represented various RAM contents from the developer's PC such as padding and pointers.
-Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte, which looks "backwards" from the standpoint of a culture that reads left-to-right. When Macs, which were big-endian at the time due to their PowerPC architecture, read these files, they then had to flip each sequence of bytes in memory before they could be understood.
+Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte, which is backwards from how we typically write numbers. When Macs, which were big-endian at the time due to their PowerPC architecture, would read these files, they then had to flip each sequence of bytes in memory before they could be understood.
 An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, so they retain their left-to-right order. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a 32-bit integer. Any sequence of four characters can be represented as such a number. Writing the integer 1,448,227,633 to disk results in the bytes 0x31, 0x33, 0x52, and 0x56, which produce the ASCII codes for '1', '3', 'R' and 'V' (the computer would have had to be big-endian to be able to naturally write them in the left-to-right order we would prefer to see). This practice of Bungie's provided a combination of convenient storage of a tag in memory as a number, and human-readability when organizing game assets by tag.
@@ Line 76: / Line 76: @@
 ===Unnamed and empty resources===
-You'll notice that the level file header lists fewer names (7,124) than instances (9,347). That's because there are 3 types of instance descriptors:
+You'll notice that the level file header lists fewer names (7,124) than instances (9,347). That's because there are 3 types of instance:
 *Unnamed and not empty - they are only referenced by other instances in the same file, generally as child data (e.g., 3D geometry elements like ABNA are "contained" by AKEV, a level's environment).
 *Named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them.
-*Named and empty - "empty" instances are used in level-specific instance files (i.e. not in '''level0_Final.dat''') to associate an instance ID with a name. For every empty resource, there's another one with a matching name in '''level0_Final.dat''' that has data in it. The empty resource in the instance file is (usually) looked up by ID, then the engine searches all the loaded files for a non-empty instance with the same name, causing it to find the actual file in the global data in '''level0_Final.dat'''.
+*Named and empty - "empty" instances are used in level-specific instance files (i.e. not in level0_Final.dat) to associate an instance ID with a name. For every empty resource, there's another one with a matching name in level0_Final.dat that has data in it. The empty resource in the instance file is (usually) looked up by ID, then the engine searches all the loaded files for a non-empty instance with the same name, causing it to find the actual file in the global data in level0_Final.dat.
 ===Peeking at instance name===
-If you want to see the name of this resource, let's look at address 0xCB01 + 0x28F240 (the file header's address for the name table) = 0x29BD41. There we find the string "SUBTsubtitles". The actual subtitle data should be found at the address 0x2230C8 + 0x03BCA0 (the file header's address for the data table) = 0x25ED68. Let's go there now....
+Before we talk about the name table in depth, we can peek ahead at the name of this resource using the offset we've just been given. Let's add the offset 0xCB01 to 0x28F240, the file header's address for the name table. This gives us the address 0x29BD41. There we find the string "SUBTsubtitles".
 ===Peeking at instance data===
-For some reason, the addresses we calculate from the descriptor data offsets are all off by eight bytes, so we need to subtract 8 from 0x25ED68 and go to 0x25ED60. Compare what you see here to the documentation for the [[SUBT]] type. Below is the data you should actually see for the English Oni SUBT file at this address. Note that we still haven't found the actual subtitle data, because SUBT stores its data in the raw file. The princess is in another castle:
+The actual subtitle data should be found by adding the offset 0x2230C8 to 0x03BCA0, the file header's address for the data table, to get 0x25ED68. We're going to leave the full details of the data table for later, but below is the data you should actually see for the English Oni SUBT file at this address. You have to consult the [[SUBT]] page to know how to read this data.
 {{Table}}
 {{OBDth}}
-{{OBDtr| 0x00 | res_id   | | 01 F4 12 00 | 4852        | 04852-subtitles.SUBT }}
-{{OBDtr| 0x04 | lev_id   | | 01 00 00 00 | 0           | level 0 }}
 {{OBDtr| 0x08 | char[16] | | AD DE       | dead        | unused }}
 {{OBDtr| 0x18 | offset   | | 80 44 44 01 | 0x01444480  | raw file data address }}
@@ Line 96: / Line 94: @@
 |}
-The first two words, or 32-bit sequences, are the standard resource header. The second and third bytes of the first word are the '''resource ID'''. The second and third bytes of the second word are the '''level number''' where this resource is found.
+After '''padding''' of 16 unused bytes, we find that, instead of data, there's an address of the actual data: it's in the level's raw file. Open level0_Final.raw and jump to address 0x01444480, and you should see "01_01_01 Griffin: Give me another reading.", and the rest of some very familiar dialogue continuing from there.
-After a '''buffer''' of 16 unused bytes, we find the address of the actual data: it's in the level's raw file. Open level0_Final.raw and jump to address 0x01444480, and you should see "01_01_01 Griffin: Give me another reading.", and the rest of some very familiar dialogue continuing from there.
+The '''array size''' of 609 tells the part of the engine that reads SUBT data to expect a chunk of 609 subtitled lines of dialogue.
-The '''array size''' of 609 tells us that this is a chunk of 609 subtitled lines of dialogue.
 ==Name descriptors==
-The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In level 0 there are 9347 instance descriptors, so 20 * 9347 = 186940. In hex, that's 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.
+The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In this case, that means 20 * 9347 = 186940, or 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.
 The name descriptor array stores the numbers of all named instances in alphabetical order. This allows the engine to do a binary search to quickly find instances by name. It is also used when finding instances by type. However the addresses of these instances in memory cannot be known until the file is loaded into RAM, so a space of 32 bits is reserved for that runtime pointer.
@@ Line 114: / Line 110: @@
 ==Template descriptors==
-Likewise, the template descriptor array starts directly after the name descriptors. Since name descriptors are 8 bytes, 8 * 7124 (taken from the header) = 56992, or 0xDEA0, and adding that to the name descriptor array's start address (0x02DA7C) gives us 0x03B91C: the start of the template descriptors.
+Likewise, the template descriptor array starts directly after the name descriptors. Since name descriptors are 8 bytes, 8 * 7124 (taken from the header) = 56992, or 0xDEA0, and adding that to the name descriptor array's start address (0x02DA7C) gives us 0x03B91C as the start of the template descriptors.
-The template descriptor array contains information about all templates (that is, resource types, AKA tags), used in the file (56 in this case). So any resource occurring in this instance file has to have its type listed here.
+The template descriptor array contains information about all templates (that is, resource types, AKA tags), used in the file (56 in this case, as we learned from the file header). Any resource occurring in this instance file has to have its type listed here.
-The template checksum is used to prevent loading of instance files that are not compatible with the current engine version. The number of resources is self-explanatory. Note that "TBUS" has a usage number of 2, which corresponds to what we learned earlier about Oni having only two subtitle files, "SUBTsubtitles" and "SUBTmessages".
 {{Table}}
@@ Line 127: / Line 121: @@
 |}
-;Template checksum
+The '''template checksum''' is used to prevent loading of instance files that are not compatible with the current engine version. The '''tag''' is the same kind of number-written-as-backwards-ASCII that we discussed in the "Backwards and garbage data" section. The '''number of resources''' is self-explanatory.
-An instance can have pointers to other related instances, but since pointers are only valid in memory, they cannot be stored on disk. They must be set when the level data is loaded into memory. To be able to do this, the engine must know where pointers are kept in an instance's data, and this is done using "templates". This template info is hard-coded into the game:
-*a checksum of the data contained by the template (the checksum algorithm is unknown, but the checksum stored in Oni's code for a given tag must match the one in the template descriptors array for that tag)
+You might wonder how Oni knows how to read each type of data, such as a SUBT or an ABNA. The simple answer is that this information is hard-coded into Oni. In fact, the information on each instance type, as stored in Oni's code, is actually the real "template". The file data only gives the tag and checksum that refer to a certain template. Which types of data fields are encountered in which order is already known by Oni. These hardcoded templates also tell Oni which parts of the file data are reserved for pointers.
-*a 4-letter tag used to identify the template (ABNA, ONCC, WMDD, etc.)
-*a short description of the data structure, e.g. "BSP Tree Node Array"
+That's because an instance may have pointers to other related instances, but pointers are only valid in memory; they cannot be stored on disk. They must be set when the level data is loaded into memory and the address in RAM has been determined. So one type of data field in Oni's templates is a raw pointer; on Macs and the Windows demo, there is an additional "separate offset" type. The pointer and offset are 32 bits in length, as one must expect since Oni was compiled for 32-bit PCs.
-*a list of all the instance's data fields and their types (see [[OBD:Data types]])
-*other data that appears to be unused, like the size of the fixed part and the size of an array element for data structures that contain variable-length arrays
+Incidentally, the templates in Oni's code have not just the familiar four-character tags attached to them, but also a descriptive string, e.g. "BSP Tree Node Array". This is the source of the names on the [[OBD:File types|File types]] page.
 ==Data table==
@@ Line 140: / Line 134: @@
 The start of each instance's record, the ID number, is always 32 byte-aligned. Thus, even though the template descriptors ended at 0x03BC9C, there are four empty bytes here so that the data table can begin at 0x03BCA0, which divides evenly by 32. This alignment also means that the instance-specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.
-The instance ID and file ID are not actually part of the instance data. The engine always keeps pointers to the start of the type-specific data, and the instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance, given a pointer to it).
+The instance ID and file ID are not actually part of the instance data, but are considered to be the resource header. The engine always keeps pointers to the start of the type-specific data itself; we saw this before when we jumped to 0x25ED68 and saw the data for the SUBT rather than the header for this data. The instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance, given a pointer to it).
 {{Table}}
@@ Line 149: / Line 143: @@
 |}
-;Instance ID
+The '''instance's ID''' is computed as:
-The ID of an instance is computed as:
+  (instance_descriptor_index << 8) <nowiki>|</nowiki> 1
-  (instance_descriptor_index << 8) <nowiki>|</nowiki> 1.
+The 1 allows the engine to know which IDs have already been converted to pointers (a instance pointer will always be 8 byte-aligned, so it will never have the zero bit already set).
-The 1 allows the engine to know which IDs have already been converted to pointers (a instance pointer will always be 8 byte aligned so it can never have the bit 0 set).
-;File ID
+The '''file ID''' is computed from the name of the instance file. For "_Final" files the file ID is computed as:
-The file ID is computed from the name of the instance file. For "_Final" files the file ID is computed as:
   (level_number << 25) <nowiki>|</nowiki> 1
-Again the 1 allows the engine to know which file IDs have already been converted to pointers.
+Again, the 1 allows the engine to know which file IDs have already been converted to pointers.
-As you can see, the size of a given instance's data can be almost anything. Thus, we cannot compute the end of this table in any simple way. That's why the instance file header explicitly gives us the address of the name table.
+As you can see, after the header, the size of the actual instance data can be almost anything. Thus, we cannot compute the end of the data table in any simple way. That's why the instance file header explicitly gives us the address of the name table.
-By the way, how do we know which resource's data we're looking at in the table? Let's look at the very first data, at 0x03BCA0. Noting that the first two numbers, the instance and file ID, do not count as data, and knowing that the instance descriptor gives the offset into the data table for the start of each instance's data, that means that there must be a resource with a data offset of 0x08, the lowest offset possible into the table. We can find this right at the start of the instance descriptor array:
+By the way, how do we know which resource's data we're looking at in the data table? Let's look at the very first data, at 0x03BCA0. Noting that the first two numbers, the instance and file ID, do not count as data, there must be a resource with a data offset of 0x08, the lowest offset possible into the table. We can find this offset listed right at the start of the instance descriptor array:
 {{Table}}
@@ Line 183: / Line 174: @@
 |}
-These names can be up to 63 characters long, counting the tag.
+These names can be up to 63 characters long, counting the tag. The instance file concludes with the end of the name table.
 {{OBD}}