Jump to content

OBD:Instance file format: Difference between revisions

integrating misc. info from bottom of article
(more explaining)
(integrating misc. info from bottom of article)
Line 1: Line 1:
:''For other files ending in ".dat", see [[Oni (folder)]].''
:''For other files ending in ".dat", see [[Oni (folder)]].''
Files named "level[0-19]_Final.dat", together with ".raw" and sometimes ".sep" counterparts, contain the game data for Oni. The same format was used for the tools files, named level0_Tools.dat/.raw/.sep; for the story behind the tools files, see [[Big Blue Box|HERE]].
Files named "level[0-19]_Final.dat", together with ".raw" and sometimes ".sep" counterparts, contain the game data for Oni. The same format was used for the tools files, named level0_Tools.dat/.raw/.sep, however the retail Oni game application does not load tools files; for the story behind the tools files, see [[Big Blue Box|HERE]].
 
The level 0 files do not actually contain a level, but instances (resources) shared across all levels. Level 0 is loaded when the game starts, and never unloaded. All other level files are only loaded when the corresponding level starts and unloaded when it ends.


==Terminology==
==Terminology==
Line 8: Line 10:


==Introduction==
==Introduction==
Instance files are the "main" type of data file in the sense that, when loading a level, Oni reads the instance file first, and this file serves as an index that allows it to find resources which are packed back-to-back into the raw and separate files.
Instance files are the "main" type of data file in the sense that, when loading a level, Oni reads the instance file first, and this file serves as an index that allows it to find resources which are packed into the binary files ([[OniSplit]]-generated .oni files are Windows-format .dat files with all the data contained by the .raw/.sep files appended at the end). The binary files are the files ending in .raw, and, on Mac retail/demo and Windows demo Oni, .sep.
 
The binary files are basically used for large and unstructured data like textures and sounds. They have no file header, since the instance file serves as the table of contents for them. The only rule about binary files is that all data parts are stored 32 byte-aligned and the first 32 bytes of the file are always zero (reserved to represent null pointers). At load-time, the offsets given in the instance file are converted to pointers to the data in the binary files.


During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory, and thus when when we read the data in the files with a hex editor, we can see various eccentricities such as blank space and garbage data that represented unused memory on the development machine.
During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory, and thus when when we read the data in the files with a hex editor, we can see various eccentricities such as blank space and garbage data that represented unused memory on the development machine.
Line 16: Line 20:
An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, so they retain their left-to-right order. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a 32-bit integer. Any sequence of four characters can be represented as such a number. Writing the integer 1,448,227,633 to disk results in the bytes 0x31, 0x33, 0x52, and 0x56, which produce the ASCII codes for '1', '3', 'R' and 'V' (the computer would have had to be big-endian to be able to naturally write them in the left-to-right order we would prefer to see). This practice of Bungie's provided a combination of convenient storage of a tag in memory as a number, and human-readability when organizing game assets by tag.
An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, so they retain their left-to-right order. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a 32-bit integer. Any sequence of four characters can be represented as such a number. Writing the integer 1,448,227,633 to disk results in the bytes 0x31, 0x33, 0x52, and 0x56, which produce the ASCII codes for '1', '3', 'R' and 'V' (the computer would have had to be big-endian to be able to naturally write them in the left-to-right order we would prefer to see). This practice of Bungie's provided a combination of convenient storage of a tag in memory as a number, and human-readability when organizing game assets by tag.


==Walkthrough==
==File limits==
===Header===
*Max level number: 127
*Max number of instance files in GameDataFolder: 512 (Windows), 16 (Windows demo, Mac)
*Max number of simultaneously loaded instance files: 64
*Max number of instances in a file: 131071
*Max length of an instance file name: 31
*Max length of an instance name: 63 (including the 4 character template tag)
 
==Header==
Here is a walkthrough of an instance file using the level0_Final.dat in English Windows Oni. Follow along in a hex editor for maximum learnage. Each term will be explained in-depth when we fully consider the related data. First, here is how the file begins:
Here is a walkthrough of an instance file using the level0_Final.dat in English Windows Oni. Follow along in a hex editor for maximum learnage. Each term will be explained in-depth when we fully consider the related data. First, here is how the file begins:
{{Table}}
{{Table}}
Line 51: Line 62:
That concludes the header of the instance file. Immediately after this header, we find the instance descriptors array.
That concludes the header of the instance file. Immediately after this header, we find the instance descriptors array.


===Instance descriptors===
==Instance descriptors==
The instance descriptor array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. The descriptors start at 0x40 in the .dat file, but below is a descriptor found at 0x017B50 in the file which makes a better example. In the table below, we use offsets relative to the start of this descriptor.
The instance descriptor array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. The descriptors start at 0x40 in the .dat file, but below is a descriptor found at 0x017B50 in the file which makes a better example. In the table below, we use offsets relative to the start of this descriptor.
There are 3 types of instance descriptors:
*unnamed - they are referenced by other instances in the same file and the engine never reaches them directly
*named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them
*named and empty - the instance data is stored in a different file and they exist only to associate an instance id with a name; when an instance references such an instance id the engine searches all the loaded files for a non empty instance with the same name


{{Table}}
{{Table}}
Line 69: Line 85:
This descriptor tells us that a resource of type SUBT (a subtitle file for Oni; there are only two of these, one for speech subtitles and one for help messages) has data that can be found 0x2230C8 bytes into the data table, which we learned from the file header starts at 0x03BCA0. Its name can be found 0xCB01 bytes into the name table that starts, according to the file header, at 0x28F240. The data is 0x09C0, or 2,496, bytes long.
This descriptor tells us that a resource of type SUBT (a subtitle file for Oni; there are only two of these, one for speech subtitles and one for help messages) has data that can be found 0x2230C8 bytes into the data table, which we learned from the file header starts at 0x03BCA0. Its name can be found 0xCB01 bytes into the name table that starts, according to the file header, at 0x28F240. The data is 0x09C0, or 2,496, bytes long.


====Peeking at instance name====
===Peeking at instance name===
If you want to see the name of this resource, let's look at address 0xCB01 + 0x28F240 (the file header's address for the name table) = 0x29BD41. There we find the string "SUBTsubtitles". The actual subtitle data should be found at the address 0x2230C8 + 0x03BCA0 (the file header's address for the data table) = 0x25ED68. Let's go there now....
If you want to see the name of this resource, let's look at address 0xCB01 + 0x28F240 (the file header's address for the name table) = 0x29BD41. There we find the string "SUBTsubtitles". The actual subtitle data should be found at the address 0x2230C8 + 0x03BCA0 (the file header's address for the data table) = 0x25ED68. Let's go there now....


====Peeking at instance data====
===Peeking at instance data===
For some reason, the addresses we calculate from the descriptor data offsets are all off by eight bytes, so we need to subtract 8 from 0x25ED68 and go to 0x25ED60. Compare what you see here to the documentation for the [[SUBT]] type. Below is the data you should actually see for the English Oni SUBT file at this address. Note that we still haven't found the actual subtitle data, because SUBT stores its data in the raw file. The princess is in another castle:
For some reason, the addresses we calculate from the descriptor data offsets are all off by eight bytes, so we need to subtract 8 from 0x25ED68 and go to 0x25ED60. Compare what you see here to the documentation for the [[SUBT]] type. Below is the data you should actually see for the English Oni SUBT file at this address. Note that we still haven't found the actual subtitle data, because SUBT stores its data in the raw file. The princess is in another castle:


{{Table}}
{{Table}}
{{OBDth}}
{{OBDth}}
{{OBDtr| 0x00 | res_id  |FF0000| 01 F4 12 00 | 4852        | 04852-subtitles.SUBT }}
{{OBDtr| 0x00 | res_id  | | 01 F4 12 00 | 4852        | 04852-subtitles.SUBT }}
{{OBDtr| 0x04 | lev_id  |FFFF00| 01 00 00 00 | 0          | level 0 }}
{{OBDtr| 0x04 | lev_id  | | 01 00 00 00 | 0          | level 0 }}
{{OBDtr| 0x08 | char[16] |00FF00| AD DE      | dead        | unused }}
{{OBDtr| 0x08 | char[16] | | AD DE      | dead        | unused }}
{{OBDtr| 0x18 | offset  |00FFFF| 80 44 44 01 | 0x01444480  | raw file data address }}
{{OBDtr| 0x18 | offset  | | 80 44 44 01 | 0x01444480  | raw file data address }}
{{OBDtr| 0x1C | int32    |FF00FF| 61 02 00 00 | 609        | array size }}
{{OBDtr| 0x1C | int32    | | 61 02 00 00 | 609        | array size }}
|}
|}


Line 90: Line 106:
The '''array size''' of 609 tells us that this is a chunk of 609 subtitled lines of dialogue.
The '''array size''' of 609 tells us that this is a chunk of 609 subtitled lines of dialogue.


===Name descriptors===
==Name descriptors==
The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In level 0 there are 9347 instance descriptors, so 20 * 9347 = 186940. In hex, that's 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.
The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In level 0 there are 9347 instance descriptors, so 20 * 9347 = 186940. In hex, that's 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.


Line 101: Line 117:
|}
|}


===Template descriptors===
==Template descriptors==
Likewise, the template descriptor array starts directly after the name descriptors. Since name descriptors are 8 bytes, 8 * 7124 (taken from the header) = 56992, or 0xDEA0, and adding that to the name descriptor array's start address (0x02DA7C) gives us 0x03B91C: the start of the template descriptors.
Likewise, the template descriptor array starts directly after the name descriptors. Since name descriptors are 8 bytes, 8 * 7124 (taken from the header) = 56992, or 0xDEA0, and adding that to the name descriptor array's start address (0x02DA7C) gives us 0x03B91C: the start of the template descriptors.


Line 115: Line 131:
|}
|}


===Data table===
;Template checksum
An instance can have pointers to other instances but since pointers are only valid in memory they are converted to instance identifiers when the file is saved and converted back to pointers when the file is loaded into memory. To be able to do this, the engine must know where pointers are, and this is done using "templates". A template contains:
*a checksum of the data contained by the template (the checksum algorithm is unknown)
*a 4-letter tag used to identify the template (ABNA, ONCC, WMDD etc.)
*a short description of the data structure like "BSP Tree Node Array"
*a list of all data structure's fields and their types
*other data that appears to be unused like size of the fixed part and size of an array element for data structures that contain variable length arrays
 
==Data table==
The data table stores all the instance data. We peeked at this before when we looked at the instance descriptor for SUBTsubtitles.
The data table stores all the instance data. We peeked at this before when we looked at the instance descriptor for SUBTsubtitles.


Instance ID is always stored 32 byte-aligned. Thus, even though the template descriptors ended at 0x03BC9C, there are four empty bytes here so that the data table can begin at 0x03BCA0, which divides evenly by 32. This alignment also means that the instance-specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.
The start of each instance's record, the ID number, is always 32 byte-aligned. Thus, even though the template descriptors ended at 0x03BC9C, there are four empty bytes here so that the data table can begin at 0x03BCA0, which divides evenly by 32. This alignment also means that the instance-specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.


The instance ID and file ID are not actually part of the instance data. The engine always keeps pointers to the start of the type-specific data, and the instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance, given a pointer to it).
The instance ID and file ID are not actually part of the instance data. The engine always keeps pointers to the start of the type-specific data, and the instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance, given a pointer to it).
Line 128: Line 152:
{{OBDtr| 0x08 | ...    | | ...        | ... | [[OBD:File_types|type-specific data]]... }}
{{OBDtr| 0x08 | ...    | | ...        | ... | [[OBD:File_types|type-specific data]]... }}
|}
|}
As you can see, the size of a given instance's data can be almost anything. Thus, we cannot compute the end of this table in any simple way. That's why the instance file header explicitly gives us the address of the name table.
===Name table===
The name table stores all the instance names as C-style strings (terminated by a zero byte). We peeked at this before when we looked at the instance descriptor for SUBTsubtitles.
{{Table}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | string  | | 53 55 ... 00 | "SUBTsubtitles" | name string (zero-terminated) }}
|}
;Instance (.dat) file
An instance file is a dump of engine's in memory data structures. It is acompanied by a .raw file and a .sep file (the .sep file is only used and present in the PC Demo and Mac versions of the game) which stores additional data (usually large and unstructured like texture or sound data) needed by some instance types. [[OniSplit]] generated .oni files are PC .dat files with all the data contained by the .raw/.sep files appended at the end.
;Binary (.raw, .sep) file
Binary files do not have any file header. The only rule about binary files is that all data parts are stored 32 byte aligned and the first 32 byte of the file are always 0 (reserved to represent NULL pointers). Instances store file offsets into binary files and at loaded time the offsets are converted to pointers.
;Instance file name
An instance file name has the following structure:
levelN_T.dat
where N is the level number (from 0 to 127) and T is the type of file. Known types are "Final" and "Tools". The original exe only loads "Final" files.
;Level 0 file
Level 0 file does not acually contain a level but instances shared across all levels. It is loaded first when the game starts and never unloaded. All other level files are only loaded when the coresponding level starts and unloaded when it ends.
;Instance descriptors
There are 3 types of instance descriptors:
*unnamed - they are referenced by other instances in the same file and the engine never reaches them directly
*named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them
*named and empty - the instance data is stored in a different file and they exist only to associate an instance id with a name; when an instance references such an instance id the engine searches all the loaded files for a non empty instance with the same name


;Instance ID
;Instance ID
Line 177: Line 164:
Again the 1 allows the engine to know which file IDs have already been converted to pointers.
Again the 1 allows the engine to know which file IDs have already been converted to pointers.


As you can see, the size of a given instance's data can be almost anything. Thus, we cannot compute the end of this table in any simple way. That's why the instance file header explicitly gives us the address of the name table.
By the way, how do we know which resource's data we're looking at in the table? Let's look at the very first data, at 0x03BCA0. Noting that the first two numbers, the instance and file ID, do not count as data, and knowing that the instance descriptor gives the offset into the data table for the start of each instance's data, that means that there must be a resource with a data offset of 0x08, the lowest offset possible into the table. We can find this right at the start of the instance descriptor array:
{{Table}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | tag    | | 53 47 4E 4F | 'ONGS'  | template tag }}
{{OBDtr| 0x04 | int32  | | 08 00 00 00 | 0x08    | data offset (relative to data table) }}
{{OBDtr| 0x08 | int32  | | 00 00 00 00 | 0x00    | name offset (relative to name table) }}
{{OBDtr| 0x0C | int32  | | 60 0F 00 00 | 3936    | data size }}
{{OBDtr| 0x10 | int32  | | 00 00 00 00 | 0      | flags
|}


;Templates
So this tells us that the first data in the data table belongs to the ONGS resource, and that it extends for 3,936 bytes.
An instance can have pointers to other instances but since pointers are only valid in memory they are converted to instance identifiers when the file is saved and converted back to pointers when the file is loaded into memory. To be able to do this the engine must know where pointers are and this is done using "templates". A template contains:
*a checksum of the data contained by the template (the checksum algorithm is unknown)
*a 4-letter tag used to identify the template (ABNA, ONCC, WMDD etc.)
*a short description of the data structure like "BSP Tree Node Array"
*a list of all data structure's fields and their types
*other data that appears to be unused like size of the fixed part and size of an array element for data structures that contain variable length arrays


==Name table==
The name table stores all the instance names as C-style strings (terminated by a zero byte). We peeked at this before when we looked at the instance descriptor for SUBTsubtitles.


;Absolute limits
{{Table}}
*Max level number: 127
{{OBD_Table_Header}}
*Max number of instance files in GameDataFolder: 512 (PC), 16 (PC Demo, Mac)
{{OBDtr| 0x00 | string  | | 53 55 ... 00 | "SUBTsubtitles" | name string (zero-terminated) }}
*Max number of simultaneously loaded instance files: 64
|}
*Max number of instances in a file: 131071
*Max length of an instance file name: 31
*Max length of an instance name: 63 (including the 4 character template tag)


These names can be up to 63 characters long, counting the tag.


{{OBD}}
{{OBD}}