OBD:Instance file format: Difference between revisions

m
Reverted edits by Iritscen (talk) to last revision by Neo
(I)
m (Reverted edits by Iritscen (talk) to last revision by Neo)
Line 1: Line 1:
{{OBD Home}}
All instance files begin with a 64 byte header followed by 3 "descriptor" arrays, a data table and a name table. Among other things the header contains the number of descriptors in each of the 3 arrays and the offset of the data and name tables (relative to the start of the file).
:''".dat" redirects here; for other files ending in ".dat", see [[Oni (folder)]].''
:''You should read the [[Game data terminology]] page before this one.''
:''The [[Raw|Raw and separate file formats]] page should be read after this one.''


Files in GameDataFolder/ named "level[0-19]_Final.dat", together with ".raw" and sometimes ".sep" counterparts, contain the game data for Oni.
The same format was used for the tool files, named level0_Tools.dat/.raw/.sep, however the retail Oni game application does not load tool files; for the story behind the tool files, see [[Big Blue Box|HERE]].
The level 0 files do not actually contain a level, but instances (resources) shared across all levels. Level 0 is loaded when the game starts, and never unloaded. All other level files, 1-19, are only loaded when their corresponding level starts, and unloaded when it ends. Since Oni can only hold these two levels in memory concurrently, resources have to be duplicated on disk whenever a character class, sound effect, etc. occurs in more than one level. For instance, although there are only 2,380 unique sounds in the game, there are 7,386 sounds stored across all level data files.
{{TOClimit}}
==Backwards and garbage data==
During development, Oni had an [[level0_Tools|in-game editor]] which presented a GUI for manipulating AIs, particles, etc. in a level. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. Thus, the structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory. So when we read the data in the files with a hex editor, we can see eccentricities such as blank space and garbage data that represent various RAM contents from the developer's PC such as padding and pointers.
Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte. [[wikipedia:FourCC|FourCCs]] in the data are stored "backwards", such as "13RV" which is meant to be read "VR31", because Bungie defined those four bytes as a 32-bit integer, not a string, causing them to be written to disk in little-endian order.
==File limits==
*Max level number: 127
*Max number of instance files in GameDataFolder: 512 (Windows), 16 (Windows demo, Mac)
*Max number of simultaneously loaded instance files: 64
*Max number of instances in a file: 131071
*Max length of an instance file name: 31
*Max length of an instance name: 63 (including the 4 character template tag)
==Header==
Here is a walkthrough of an instance file using the level0_Final.dat in English Windows Oni. Follow along in a hex editor for maximum learnage. Each term will be explained in-depth when we fully consider the related data. First, here is how the file begins:
{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | int64  | | 1F 27 DC 33 DF BC 03 00 | 0x0003BCDF33DC271F | Windows level file total template checksum; Windows demo and Mac retail/demo use 0x0003BCDF23C13061 instead }}
{{OBDtr| 0x00 | int64  | | 1F 27 DC 33 DF BC 03 00 | 0x0003BCDF33DC271F | PC template checksum; PC Demo and Mac files use 0x0003BCDF23C13061 instead }}
{{OBDtr| 0x08 | int32  | | 31 33 52 56            | 'VR31'            | .dat version; OniSplit's .oni files use 'VR32' instead }}
{{OBDtr| 0x08 | int32  | | 31 33 52 56            | 'VR31'            | .dat version; .oni files use 'VR32' instead }}
{{OBDtr| 0x0C | int16  | | 40 00 14 00 10 00 08 00 | 0x0008001000140040 | signature }}
{{OBDtr| 0x0C | int64   | | 40 00 14 00 10 00 08 00 | 0x0008001000140040 | signature }}
{{OBDtr| 0x0C | int16  | | 40 00 14 00 10 00 08 00 | 0x0008001000140040 | signature }}
{{OBDtr| 0x14 | int32  | | F9 05 00 00 | 1529     | instance descriptor count  }}
{{OBDtr| 0x0C | int16  | | 40 00 14 00 10 00 08 00 | 0x0008001000140040 | signature }}
{{OBDtr| 0x18 | int32  | | D5 01 00 00 | 469      | name descriptor count }}
{{OBDtr| 0x0C | int16   | | 40 00 14 00 10 00 08 00 | 0x0008001000140040 | signature }}
{{OBDtr| 0x1C | int32  | | 32 00 00 00 | 50       | template descriptor count }}
{{OBDtr| 0x14 | int32  | | 83 24 00 00 | 9347     | instance descriptor count  }}
{{OBDtr| 0x20 | int32  | | 80 89 00 00 | 0x008980 | data table offset }}
{{OBDtr| 0x18 | int32  | | D4 1B 00 00 | 7124      | name descriptor count }}
{{OBDtr| 0x24 | int32  | | 20 6E 55 00 | 0x556E20  | data table size }}
{{OBDtr| 0x1C | int32  | | 38 00 00 00 | 56       | template descriptor count }}
{{OBDtr| 0x28 | int32  | | A0 F7 55 00 | 0x55F7A0 | name table offset }}
{{OBDtr| 0x20 | int32  | | A0 BC 03 00 | 0x03BCA0 | data table offset }}
{{OBDtr| 0x2C | int32  | | 35 1D 00 00 | 0x001D35  | name table size }}
{{OBDtr| 0x24 | int32  | | A0 35 25 00 | 2438560  | data table size }}
{{OBDtr| 0x30 | int32  | | 00 00 00 00 |          | OniSplit only: raw table offset }}
{{OBDtr| 0x28 | int32  | | 40 F2 28 00 | 0x28F240 | name table offset }}
{{OBDtr| 0x34 | int32  | | 00 00 00 00 |          | OniSplit only: raw table size }}
{{OBDtr| 0x2C | int32  | | 04 4F 02 00 | 151300    | name table size }}
{{OBDtr| 0x30 | int32  | | 00 00 00 00 |          | used by OniSplit only: raw table offset }}
{{OBDtr| 0x34 | int32  | | 00 00 00 00 |          | used by OniSplit only: raw table size }}
{{OBDtr| 0x38 | int32  | | 00 00 00 00 |          | unused }}
{{OBDtr| 0x38 | int32  | | 00 00 00 00 |          | unused }}
{{OBDtr| 0x3C | int32  | | 00 00 00 00 |          | unused }}
{{OBDtr| 0x3C | int32  | | 00 00 00 00 |          | unused }}
|}
|}


The file's '''total template checksum''' is the sum of all the  tells us that this level data is in the .dat/.raw file scheme, as opposed to the .dat/.raw/.sep file scheme used by Mac Oni and the Windows demo of Oni.


The '''version''' of the instance file is the format version. Reading it backwards, as discussed under the "Backwards and garbage data" section, we get "VR31", which is probably "version 31". This is the format version of all instance files in all releases of Oni, regardless of file scheme.


The '''signature''' is identical in all instance files.
The "instance descriptors" array stores information about every instance contained in the file.  


The '''descriptor counts''' are the sizes of arrays which are coming up soon in this file: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.
{{Table}}
 
{{OBD_Table_Header}}
Next we are told the addresses and sizes of the '''data and name tables''' in the instance file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, but that doesn't mean the name table offset is redundant; if its start was not 32-bit-aligned, it probably would be moved down to start at the next 32-bit word, but this is unnecessary because it happens to be aligned already.
{{OBDtr| 0x00 | tag    | | 41 53 49 41 | 'AISA'   | template tag }}
{{OBDtr| 0x04 | int32  | | 08 00 00 00 | 0x0008    | data offset (relative to data table) }}
{{OBDtr| 0x08 | int32  | | 00 00 00 00 | 0x0000    | name offset (relative to name table) }}
{{OBDtr| 0x0C | int32  | | 80 01 00 00 | 0x0180    | data size }}
{{OBDtr| 0x10 | int32  | | 00 00 00 00 | 0        | flags; possible values:
:0x'''01''' 00 00 00 - unnamed
:0x'''02''' 00 00 00 - empty
:0x'''04''' 00 00 00 - never used; appears to mean "big endian" data  
:0x'''08''' 00 00 00 - shared }}
|}


After this comes four "int"s of '''zeroes'''. Empty space like this is common in the data files, and indicates that something stored in memory at this relative position was not written to disk (probably pointers, or sometimes a space reserved for possible future use in a resource type).


That concludes the header of the instance file. Immediately after this header, we find the instance descriptors array.


==Instance descriptors==
The "name descriptors" array stores the numbers of all named instances in alphabetically order. This allows the engine to do a binary search to faster find instances by name. It is also used when finding instances by type.
The instance descriptor array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. The descriptors start at 0x40 in the .dat file, but below is a descriptor found at 0x017B50 in the file which makes a better example. In the table below, we use offsets relative to the start of this descriptor.


{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | tag    | | 54 42 55 53 | 'SUBT'    | template tag }}
{{OBDtr| 0x00 | int32  | | 00 00 00 00 | 0   | instance number }}
{{OBDtr| 0x04 | int32  | | C8 30 22 00 | 0x2230C8  | data offset (relative to data table) }}
{{OBDtr| 0x04 | int32  | | 00 00 00 00 | 0   | runtime: pointer to instance name }}
{{OBDtr| 0x08 | int32  | | 01 CB 00 00 | 0xCB01   | name offset (relative to name table) }}
{{OBDtr| 0x0C | int32  | | C0 09 00 00 | 2496      | data size }}
{{OBDtr| 0x10 | int32  | | 00 00 00 00 | 0         | flags; possible values:
:0x'''01''' 00 00 00 - unnamed
:0x'''02''' 00 00 00 - empty
:0x'''04''' 00 00 00 - never used; appears to mean "big-endian" data
:0x'''08''' 00 00 00 - shared }}
|}
|}


This descriptor tells us that a resource of '''type''' SUBT (a subtitle file for Oni; there are only two of these, one containing all speech subtitles, and one for help messages) has '''data''' that can be found 0x2230C8 bytes into the data table, which we learned from the file header starts at 0x03BCA0. Its '''name''' can be found 0xCB01 bytes into the name table that starts, according to the file header, at 0x28F240. The '''data's size''' is 0x09C0, or 2,496, bytes.  The '''flags''' "unnamed" and "empty" require special explanation.


===Unnamed and empty resources===
You'll notice that the level file header lists fewer names (7,124) than instances (9,347). That's because there are 3 types of instance:
*Unnamed and not empty - they are only referenced by other instances in the same file, generally as child data (e.g., 3D geometry elements like ABNA are "contained" by AKEV, a level's environment).
*Named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them.
*Named and empty - "empty" instances are used in level-specific instance files (i.e. not in level0_Final.dat) to associate an instance ID with a name. For every empty resource, there's another one with a matching name in level0_Final.dat that has data in it. The empty resource in the instance file is (usually) looked up by ID, then the engine searches all the loaded files for a non-empty instance with the same name, causing it to find the actual file in the global data in level0_Final.dat.


===Peeking at instance name===
The "template descriptor" array contains information about all templates used in the file. The template checksum is used to prevent loading of instance files that are not compatible with the current engine version.
Before we talk about the name table in depth, we can peek ahead at the name of this resource using the offset we've just been given. Let's add the offset 0xCB01 to 0x28F240, the file header's address for the name table. This gives us the address 0x29BD41. There we find the string "SUBTsubtitles".
 
===Peeking at instance data===
The actual subtitle data should be found by adding the offset 0x2230C8 to 0x03BCA0, the file header's address for the data table, to get 0x25ED68. We're going to leave the full details of the data table for later, but below is the data you should actually see for the English Oni SUBT file at this address. You have to consult the [[SUBT]] page to know how to read this data.


{{Table}}
{{Table}}
{{OBDth}}
{{OBD_Table_Header}}
{{OBDtr| 0x08 | char[16] | | AD DE      | dead        | unused }}
{{OBDtr| 0x00 | int64  | | A0 6D 12 00 00 00 00 00 | 0x126DA0    | template checksum }}
{{OBDtr| 0x18 | offset  | | 80 44 44 01 | 0x01444480  | raw file data address }}
{{OBDtr| 0x0C | tag    | | 41 4E 42 41            | 'ABNA'      | template tag }}
{{OBDtr| 0x1C | int32   | | 61 02 00 00 | 609        | array size }}
{{OBDtr| 0x08 | int32   | | 01 00 00 00             | 1          | number of instances that use this template }}
|}
|}


After '''padding''' of 16 unused bytes, we find that, instead of data, there's an address of the actual data: it's in the level's raw file. Open level0_Final.raw and jump to address 0x01444480, and you should see "01_01_01 Griffin: Give me another reading.", and the rest of some very familiar dialogue continuing from there.


The '''array size''' of 609 tells the part of the engine that reads SUBT data to expect a chunk of 609 subtitled lines of dialogue.


==Name descriptors==
The data table stores all the instance data. Instance ID is always stored 32 byte aligned (thus the instance specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.). Instance ID and file ID are not actually part of the instance data. The engine always has pointers to "instance specific data" and instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance give a pointer to it).
The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In this case, that means 20 * 9347 = 186940, or 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.
 
The name descriptor array stores the numbers of all named instances in alphabetical order. This allows the engine to do a binary search to quickly find instances by name. It is also used when finding instances by type. However the addresses of these instances in memory cannot be known until the file is loaded into RAM, so a space of 32 bits is reserved for that runtime pointer.


{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | int32  | | 15 16 00 00 | 5653      | instance number }}
{{OBDtr| 0x00 | int32  | | 01 00 00 00 | 1          | instance id }}
{{OBDtr| 0x04 | int32  | | 60 2C 1C 0E | (garbage) | runtime: pointer to instance name }}
{{OBDtr| 0x04 | int32  | | 01 00 00 02 | 0x02000001 | file id }}
{{OBDtr| 0x08 |        | |            |           | [[OBD:File_types|instance specific data]] }}
|}
|}


==Template descriptors==
Likewise, the template descriptor array starts directly after the name descriptors. Since name descriptors are 8 bytes, 8 * 7124 (taken from the header) = 56992, or 0xDEA0, and adding that to the name descriptor array's start address (0x02DA7C) gives us 0x03B91C as the start of the template descriptors.


The template descriptor array contains information about all templates (that is, resource types, AKA tags), used in the file (56 in this case, as we learned from the file header). Any resource occurring in this instance file has to have its type listed here.
 
The name table stores all the instance names as C style strings (terminated by 0).


{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | int64  | | A0 6D 12 00 00 00 00 00 | 0x126DA0    | template checksum }}
{{OBDtr| 0x00 | string  | | 41 49 ... 00 | "AISAlevel1_scripts" | name string (0 terminated) }}
{{OBDtr| 0x08 | tag    | | 41 4E 42 41            | 'ABNA'      | template tag }}
{{OBDtr| 0x0C | int32  | | 01 00 00 00             | 1          | number of resources in file that use this template }}
|}
|}


The '''template checksum''' is used to prevent loading of instance files that are not compatible with the current engine version. The '''tag''' is the same kind of number-written-as-backwards-ASCII that we discussed in the "Backwards and garbage data" section. The '''number of resources''' is self-explanatory.


You might wonder how Oni knows how to read each type of data, such as a SUBT or an ABNA. The simple answer is that this information is hard-coded into Oni. In fact, the information on each instance type, as stored in Oni's code, is actually the real "template". The file data only gives the tag and checksum that refer to a certain template. Which types of data fields are encountered in which order is already known by Oni. These hardcoded templates also tell Oni which parts of the file data are reserved for pointers.
;Instance (.dat) file
An instance file is a dump of engine's in memory data structures. It is acompanied by a .raw file and a .sep file (the .sep file is only used and present in the PC Demo and Mac versions of the game) which stores additional data (usually large and unstructured like texture or sound data) needed by some instance types. [[OniSplit]] generated .oni files are PC .dat files with all the data contained by the .raw/.sep files appended at the end.
 
 
;Binary (.raw, .sep) file
Binary files do not have any file header. The only rule about binary files is that all data parts are stored 32 byte aligned and the first 32 byte of the file are always 0 (reserved to represent NULL pointers). Instances store file offsets into binary files and at loaded time the offsets are converted to pointers.
 


That's because an instance may have pointers to other related instances, but pointers are only valid in memory; they cannot be stored on disk. They must be set when the level data is loaded into memory and the address in RAM has been determined. So one type of data field in Oni's templates is a raw pointer; on Macs and the Windows demo, there is an additional "separate offset" type. The pointer and offset are 32 bits in length, as one must expect since Oni was compiled for 32-bit PCs.


Incidentally, the templates in Oni's code have not just the familiar four-character tags attached to them, but also a descriptive string, e.g. "BSP Tree Node Array". This is the source of the names on the [[OBD:File types|File types]] page.
;Instance file name
An instance file name has the following structure:
levelN_T.dat
where N is the level number (from 0 to 127) and T is the type of file. Known types are "Final" and "Tools". The original exe only loads "Final" files.


==Data table==
The data table stores all the instance data (or points to its actual location in a raw/separate file). We peeked at this before when we looked at the instance descriptor for SUBTsubtitles.


The start of each instance's record, the ID number, is always 32 byte-aligned. Thus, even though the template descriptors ended at 0x03BC9C, there are four empty bytes here so that the data table can begin at 0x03BCA0, which divides evenly by 32. This alignment also means that the instance-specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.
;Level 0 file
Level 0 file does not acually contain a level but instances shared across all levels. It is loaded first when the game starts and never unloaded. All other level files are only loaded when the coresponding level starts and unloaded when it ends.


The instance ID and file ID are not actually part of the instance data, but are considered to be the resource header. The engine always keeps pointers to the start of the type-specific data itself; we saw this before when we jumped to 0x25ED68 and saw the data for the SUBT rather than the header for this data. The instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance, given a pointer to it).


{{Table}}
;Instance descriptors
{{OBD_Table_Header}}
There are 3 types of instance descriptors:
{{OBDtr| 0x00 | res_id  | | 01 00 00 00 | 0  | instance ID }}
*unnamed - they are referenced by other instances in the same file and the engine never reaches them directly
{{OBDtr| 0x04 | lev_id  | | 01 00 00 00 | 0  | file ID }}
*named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them
{{OBDtr| 0x08 | ...    | | ...        | ... | [[OBD:File_types|type-specific data]]... }}
*named and empty - the instance data is stored in a different file and they exist only to associate an instance id with a name; when an instance references such an instance id the engine searches all the loaded files for a non empty instance with the same name
|}


The '''instance's ID''' is computed as:
(instance_descriptor_index << 8) <nowiki>|</nowiki> 1
The 1 allows the engine to know which IDs have already been converted to pointers (an instance pointer will always be 8 byte-aligned, so it will never have the zero bit already set).


The '''file ID''' is computed from the name of the instance file. For "_Final" files the file ID is computed as:
;Instance ID
  (level_number << 25) <nowiki>|</nowiki> 1
The ID of an instance is computed as:
Again, the 1 allows the engine to know which file IDs have already been converted to pointers.
  (instance_descriptor_index << 8) <nowiki>|</nowiki> 1.
The 1 allows the engine to know which IDs have already been converted to pointers (a instance pointer will always be 8 byte aligned so it can never have the bit 0 set).


As you can see, after the header, the size of the actual instance data can be almost anything. Thus, we cannot compute the end of the data table in any simple way. That's why the instance file header explicitly gives us the address of the name table that comes after this.


By the way, how do we know which resource's data we're looking at in the data table? Let's look at the very first data, at 0x03BCA0. Noting that the first two numbers, the instance and file ID, do not count as data, there must be a resource with a data offset of 0x08, the lowest offset possible into the table. We can find this offset listed right at the start of the instance descriptor array:
;File ID
The file ID is computed from the name of the instance file. For "_Final" files the file ID is computed as:
(level_number << 25) <nowiki>|</nowiki> 1
Again the 1 allows the engine to know which file IDs have already been converted to pointers.


{{Table}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | tag    | | 53 47 4E 4F | 'ONGS'  | template tag }}
{{OBDtr| 0x04 | int32  | | 08 00 00 00 | 0x08    | data offset (relative to data table) }}
{{OBDtr| 0x08 | int32  | | 00 00 00 00 | 0x00    | name offset (relative to name table) }}
{{OBDtr| 0x0C | int32  | | 60 0F 00 00 | 3936    | data size }}
{{OBDtr| 0x10 | int32  | | 00 00 00 00 | 0      | flags }}
|}


So this tells us that the first data in the data table belongs to the ONGS resource, and that it extends for 3,936 bytes.
;Templates
An instance can have pointers to other instances but since pointers are only valid in memory they are converted to instance identifiers when the file is saved and converted back to pointers when the file is loaded into memory. To be able to do this the engine must know where pointers are and this is done using "templates". A template contains:
*a checksum of the data contained by the template (the checksum algorithm is unknown)
*a 4-letter tag used to identify the template (ABNA, ONCC, WMDD etc.)
*a short description of the data structure like "BSP Tree Node Array"
*a list of all data structure's fields and their types
*other data that appears to be unused like size of the fixed part and size of an array element for data structures that contain variable length arrays


==Name table==
The name table stores all the instance names as C-style strings (terminated by a zero byte). We peeked at this before when we looked at the instance descriptor for SUBTsubtitles.


{{Table}}
;Absolute limits
{{OBD_Table_Header}}
*Max level number: 127
{{OBDtr| 0x00 | string  | | 53 55 ... 00 | "SUBTsubtitles" | name string (zero-terminated) }}
*Max number of instance files in GameDataFolder: 512 (PC), 16 (PC Demo, Mac)
|}
*Max number of simultaneously loaded instance files: 64
*Max number of instances in a file: 131071
*Max length of an instance file name: 31
*Max length of an instance name: 63 (including the 4 character template tag)


These names can be up to 63 characters long, counting the tag. The instance file concludes with the end of the name table.


{{OBD}}
{{OBD}}