Jump to content

OBD:Instance file format: Difference between revisions

more explaining
m (→‎Terminology: sometimes we mean the whole shebang, as on OniSplit)
(more explaining)
Line 1: Line 1:
:''For other files ending in ".dat", see [[Oni (folder)]].''
:''For other files ending in ".dat", see [[Oni (folder)]].''
Files named "level[0-19]_Final.dat", together with ".raw" and sometimes ".sep" counterparts, contain the game data for Oni. The same format was used for the tools files, named level0_Tools.dat/.raw/.sep; for the story behind the tools files, see [[Big Blue Box|HERE]].
Files named "level[0-19]_Final.dat", together with ".raw" and sometimes ".sep" counterparts, contain the game data for Oni. The same format was used for the tools files, named level0_Tools.dat/.raw/.sep; for the story behind the tools files, see [[Big Blue Box|HERE]].
==Terminology==
==Terminology==
Oni's level data is broken into two kinds of files in Windows retail Oni. One type ends in ".dat" and is called an instance file. An "instance" is essentially a resource, such as a texture. Initially, all resources would have been stored in the levelX_Final.dat file, so it was rightfully called an "instance file". However eventually much of the resource data was moved to a new type of file ending in ".raw", simply called a raw file. By the time that Oni for the Mac was finalized for release, some of the raw data was moved to a third file type which ends in ".sep", short for "separate". You can read about raw and separate files [[Raw|HERE]].
Oni's level data is broken into two kinds of files in Windows retail Oni. One type ends in ".dat" and is called an instance file. An "instance" is essentially a resource, such as a texture. Initially, all resources would have been stored in the levelX_Final.dat file, so it was rightfully called an "instance file". However eventually much of the resource data was moved to a new type of file ending in ".raw", simply called a raw file. By the time that Oni for the Mac was finalized for release, some of the raw data was moved to a third file type which ends in ".sep", short for "separate". You can read about raw and separate files [[Raw|HERE]].
Line 7: Line 8:


==Introduction==
==Introduction==
Instance files are the "main" type of data file in the sense that, when loading a level, Oni reads the instance file first, and this file serves as an index that allows it to find resources which are packed back-to-back into the raw and separate files. All instance files begin with a 64 byte header followed by 3 "descriptor" arrays, a data table and a name table. Among other things, the header contains the number of descriptors in each of the 3 arrays and the offset of the data and name tables (relative to the start of the file).
Instance files are the "main" type of data file in the sense that, when loading a level, Oni reads the instance file first, and this file serves as an index that allows it to find resources which are packed back-to-back into the raw and separate files.


During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory, and thus when when we read the data in the files with a hex editor, we can see various eccentricities such as blank space and garbage data that represented unused memory on the development machine.
During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory, and thus when when we read the data in the files with a hex editor, we can see various eccentricities such as blank space and garbage data that represented unused memory on the development machine.
Line 13: Line 14:
Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte, which looks "backwards" from the standpoint of a culture that reads left-to-right. When Macs, which were big-endian at the time due to their PowerPC architecture, read these files, they then had to flip each sequence of bytes in memory before they could be understood.
Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte, which looks "backwards" from the standpoint of a culture that reads left-to-right. When Macs, which were big-endian at the time due to their PowerPC architecture, read these files, they then had to flip each sequence of bytes in memory before they could be understood.


An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, so they retain their left-to-right order. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a number. You see, writing the number 1,448,227,633 to disk in little-endian order results in the bytes 0x31, 0x33, 0x52, and 0x56, which happens to produce the ASCII codes for '1', '3', 'R' and 'V'. This practice of Bungie's provided a combination of more convenient storage in memory as a number, and human-readability when on disk.  
An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, so they retain their left-to-right order. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a 32-bit integer. Any sequence of four characters can be represented as such a number. Writing the integer 1,448,227,633 to disk results in the bytes 0x31, 0x33, 0x52, and 0x56, which produce the ASCII codes for '1', '3', 'R' and 'V' (the computer would have had to be big-endian to be able to naturally write them in the left-to-right order we would prefer to see). This practice of Bungie's provided a combination of convenient storage of a tag in memory as a number, and human-readability when organizing game assets by tag.


==Walkthrough==
==Walkthrough==
Line 27: Line 28:
{{OBDtr| 0x1C | int32  | | 38 00 00 00 | 56        | template descriptor count }}
{{OBDtr| 0x1C | int32  | | 38 00 00 00 | 56        | template descriptor count }}
{{OBDtr| 0x20 | int32  | | A0 BC 03 00 | 0x03BCA0  | data table offset }}
{{OBDtr| 0x20 | int32  | | A0 BC 03 00 | 0x03BCA0  | data table offset }}
{{OBDtr| 0x24 | int32  | | A0 35 25 00 | 0x2535A0  | data table size }}
{{OBDtr| 0x24 | int32  | | A0 35 25 00 | 2438560  | data table size }}
{{OBDtr| 0x28 | int32  | | 40 F2 28 00 | 0x28F240  | name table offset }}
{{OBDtr| 0x28 | int32  | | 40 F2 28 00 | 0x28F240  | name table offset }}
{{OBDtr| 0x2C | int32  | | 04 4F 02 00 | 0x024F04  | name table size }}
{{OBDtr| 0x2C | int32  | | 04 4F 02 00 | 151300    | name table size }}
{{OBDtr| 0x30 | int32  | | 00 00 00 00 |          | used by OniSplit only: raw table offset }}
{{OBDtr| 0x30 | int32  | | 00 00 00 00 |          | used by OniSplit only: raw table offset }}
{{OBDtr| 0x34 | int32  | | 00 00 00 00 |          | used by OniSplit only: raw table size }}
{{OBDtr| 0x34 | int32  | | 00 00 00 00 |          | used by OniSplit only: raw table size }}
Line 44: Line 45:
The '''descriptor counts''' are the sizes of some arrays which are coming up soon in this file: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.
The '''descriptor counts''' are the sizes of some arrays which are coming up soon in this file: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.


Next we are told the addresses and sizes of the '''data and name tables''' in this file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, but that doesn't mean the name table offset is redundant; if its start was not 32-bit-aligned, it probably would be moved down to start at the next 32-bit word, but this is unnecessary because it just happens to be aligned already.
Next we are told the addresses and sizes of the '''data and name tables''' in this file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, but that doesn't mean the name table offset is redundant; if its start was not 32-bit-aligned, it probably would be moved down to start at the next 32-bit word, but this is unnecessary because it happens to be aligned already.


After this comes four "int"s of '''zeroes'''. Empty space like this is common in the data files, and indicates that something stored in memory at this relative position was not written to disk (probably pointers, or sometimes a space reserved for possible future use in a resource type).
After this comes four "int"s of '''zeroes'''. Empty space like this is common in the data files, and indicates that something stored in memory at this relative position was not written to disk (probably pointers, or sometimes a space reserved for possible future use in a resource type).
Line 51: Line 52:


===Instance descriptors===
===Instance descriptors===
The instance descriptors array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. The descriptors start at 0x40 in the .dat file, but below is a descriptor found at 0x017B50 in the file which makes a better example. In the table below, we use offsets relative to the start of this descriptor.
The instance descriptor array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. The descriptors start at 0x40 in the .dat file, but below is a descriptor found at 0x017B50 in the file which makes a better example. In the table below, we use offsets relative to the start of this descriptor.


{{Table}}
{{Table}}
Line 88: Line 89:
After a '''buffer''' of 16 unused bytes, we find the address of the actual data: it's in the level's raw file. Open level0_Final.raw and jump to address 0x01444480, and you should see "01_01_01 Griffin: Give me another reading.", and the rest of some very familiar dialogue continuing from there.
After a '''buffer''' of 16 unused bytes, we find the address of the actual data: it's in the level's raw file. Open level0_Final.raw and jump to address 0x01444480, and you should see "01_01_01 Griffin: Give me another reading.", and the rest of some very familiar dialogue continuing from there.


The '''array size''' of 609 tells us that there are 609 subtitled lines of dialogue to be found in the raw file as part of this resource.
The '''array size''' of 609 tells us that this is a chunk of 609 subtitled lines of dialogue.


===Name descriptors===
===Name descriptors===
The "name descriptors" array stores the numbers of all named instances in alphabetical order. This allows the engine to do a binary search to quickly find instances by name. It is also used when finding instances by type.
The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In level 0 there are 9347 instance descriptors, so 20 * 9347 = 186940. In hex, that's 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.
 
The name descriptor array stores the numbers of all named instances in alphabetical order. This allows the engine to do a binary search to quickly find instances by name. It is also used when finding instances by type.


{{Table}}
{{Table}}
Line 99: Line 102:
|}
|}


The "template descriptor" array contains information about all templates used in the file. The template checksum is used to prevent loading of instance files that are not compatible with the current engine version.
===Template descriptors===
Likewise, the template descriptor array starts directly after the name descriptors. Since name descriptors are 8 bytes, 8 * 7124 (taken from the header) = 56992, or 0xDEA0, and adding that to the name descriptor array's start address (0x02DA7C) gives us 0x03B91C: the start of the template descriptors.
 
The template descriptor array contains information about all templates (that is, resource types, AKA tags), used in the file (56 in this case). So any resource occurring in this instance file has to have its type listed here.
 
The template checksum is used to prevent loading of instance files that are not compatible with the current engine version. The number of resources is self-explanatory. Note that "TBUS" has a usage number of 2, which corresponds to what we learned earlier about Oni having only two subtitle files, "SUBTsubtitles" and "SUBTmessages".


{{Table}}
{{Table}}
Line 105: Line 113:
{{OBDtr| 0x00 | int64  | | A0 6D 12 00 00 00 00 00 | 0x126DA0    | template checksum }}
{{OBDtr| 0x00 | int64  | | A0 6D 12 00 00 00 00 00 | 0x126DA0    | template checksum }}
{{OBDtr| 0x0C | tag    | | 41 4E 42 41            | 'ABNA'      | template tag }}
{{OBDtr| 0x0C | tag    | | 41 4E 42 41            | 'ABNA'      | template tag }}
{{OBDtr| 0x08 | int32  | | 01 00 00 00            | 1          | number of instances that use this template }}
{{OBDtr| 0x08 | int32  | | 01 00 00 00            | 1          | number of resources in file that use this template }}
|}
|}


 
===Data table===
 
The data table stores all the instance data. Instance ID is always stored 32 byte aligned (thus the instance specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.). Instance ID and file ID are not actually part of the instance data. The engine always has pointers to "instance specific data" and instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance give a pointer to it).
The data table stores all the instance data. Instance ID is always stored 32 byte aligned (thus the instance specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.). Instance ID and file ID are not actually part of the instance data. The engine always has pointers to "instance specific data" and instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance give a pointer to it).