OBD:Instance file format: Difference between revisions

(more explaining)
(the engine refers to these sections as blocks, not tables, and I think that term is more accurate)
 
(65 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Terminology==
{{UpdatedForOniX|1.0.0}}<!--Documentation below is waiting to be un-commented.-->
Oni's level data is broken into two kinds of files in Windows retail Oni. One type ends in ".dat" and is called an instance file. An "instance" is essentially a resource, in plain English, such as a texture. Before raw and separate files existed, all resources would have been stored in the levelX_Final.dat file, so it was rightfully called an "instance file". The second type of file ends in ".raw" and is simply called a raw file. Windows demo Oni and Mac retail/demo Oni use a third type which ends in ".sep", short for "separate". You can read about raw and separate files [[Raw|HERE]].
{{OBD Home}}
{{Hatnote|".dat" redirects here; for other files ending in ".dat", see [[Oni (folder)]].<br>
:You should read the [[Game data terminology]] page before this one.<br>
:The [[Raw|Raw and separate file formats]] page should be read after this one.}}
Files in GameDataFolder/ named "level[0-19]_Final.dat", together with ".raw" and sometimes ".sep" counterparts, contain the game data for Oni. These are called "instance files" internally, but a more common-sense name for them is level data files. The format described below was also used for the tool files which supplied the GUI for the in-game editor, however the retail Oni game application refuses to load tool files; for the story behind the tool files, see [[level0_Tools]].


Note that ".dat" is a generic suffix originally used by Oni for all kinds of data, including [[persist.dat]]. The only reason that any other suffixes exist at all is that raw and separate files were created later in development and given unique suffixes to distinguish them from the .dat files in the same folder. Therefore, the proper, specific name for the <u>level data format</u>, as opposed to the save-game format, film format, etc. is not ".dat file" or "DAT file", but "instance file". That being said, ".dat" has only been used by the community historically to refer to instance files, so you can reasonably assume that's what is meant when you see the suffix.
The level 0 files do not contain resources for a specific level but rather resources (instances) shared across all levels. Level 0 is loaded when the game starts and is never unloaded. All other level files, 1-19, are only loaded when their corresponding level starts and then unloaded when it ends. Oni can only hold two level files in memory concurrently. Thus, resources have to be duplicated on disk whenever a character class, sound effect, etc. occurs in more than one level. For instance, although there are only 2,380 unique sounds in the game, there are 7,386 sounds stored across all level data files.
{{TOClimit}}
==Backwards and garbage data==
As mentioned, the game's developers used the in-game editor to create AIs, particles, etc. in a level. When one of these developers saved his work, the contents of the level, stored in his PC's RAM, were flushed directly to disk. Thus the structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory. So when we read the data in the files with a hex editor, we can see eccentricities such as blank space (coming from unused fields and byte-alignment padding) and garbage data (such as now-meaningless pointer values). [[OBD:Raw and separate file formats#Gaps|Further gaps]], mostly representing orphaned obsolete resources, add up to about 25 MB for the whole game.


==Introduction==
Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte. [[wp:FourCC|FourCCs]] in the data are stored "backwards", such as "13RV" which is meant to be read "VR31", because Bungie defined those four bytes as a 32-bit integer, not a string, causing them to be written to disk in little-endian order.
Instance files are the "main" type of data file in the sense that, when loading a level, Oni reads the instance file first, and this file serves as an index that allows it to find resources which are packed back-to-back into the raw and separate files. All instance files begin with a 64 byte header followed by 3 "descriptor" arrays, a data table and a name table. Among other things, the header contains the number of descriptors in each of the 3 arrays and the offset of the data and name tables (relative to the start of the file).


During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory, and thus when when we read the data in the files with a hex editor, we can see various eccentricities such as blank space and garbage data that represented unused memory on the development machine.
==File limits==
 
*Max level number: 127
Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte, which looks "backwards" from the standpoint of a culture that reads left-to-right. When Macs, which were big-endian at the time due to their PowerPC architecture, read these files, they then had to flip each sequence of bytes in memory before they could be understood.
*Max number of instance files in GameDataFolder: 512 (Windows), 16 (Windows demo, Mac)
*Max number of simultaneously loaded instance files: 64
*Max number of instances in a file: 131071
*Max length of an instance file name: 31
*Max length of an instance name: 63 (including the 4 character template tag)


An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, thus they are written left-to-right. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a number. For instance, writing the number 1,448,227,633 to disk in little-endian order results in the bytes 0x31, 0x33, 0x52, and 0x56, which happens to produce the ASCII codes for '1', '3', 'R' and 'V'. This provided a combination of more convenient storage in memory as a number, and human-readability when on disk.
==Header==
 
Here is a walkthrough of an instance file using the level0_Final.dat in English Windows Oni. Follow along in a hex editor for maximum educational value. Each term will be explained in-depth when we fully consider the related data. First, here is how the file begins:
==Walkthrough==
===Header===
Here is a walkthrough of an instance file using the level0_Final.dat in English Windows Oni. Follow along in a hex editor for maximum learnage. Each term will be explained in-depth when we fully consider the related data. First, here is how the file begins:
{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | int64   | | 1F 27 DC 33 DF BC 03 00 | 0x0003BCDF33DC271F | Windows template checksum; Windows demo and Mac retail/demo use 0x0003BCDF23C13061 instead }}
{{OBDtr| 0x00 | uint64   | | 1F 27 DC 33 DF BC 03 00 | 0x0003BCDF33DC271F | Total template checksum (main indicator of engine compatibility):
{{OBDtr| 0x08 | int32   | | 31 33 52 56             | 'VR31'             | .dat version; .oni files use 'VR32' instead }}
*0x0003BCDF33DC271F (PC v1.0) - templates compatible with Windows retail engine
{{OBDtr| 0x0C | int64   | | 40 00 14 00 10 00 08 00 | 0x0008001000140040 | signature }}
*0x0003BCDF23C13061 (PC v1.1) - templates compatible with Windows demo and Mac engines
{{OBDtr| 0x14 | int32   | | 83 24 00 00 | 9347      | instance descriptor count  }}
*0x0003BA70A8DBAE11 (PS2) - templates compatible with PlayStation 2 engine
{{OBDtr| 0x18 | int32   | | D4 1B 00 00 | 7124      | name descriptor count }}
<!--*0x0000000000000000 (blank) - for use with [[OniX]] engine, which instead handle data versioning using the 0x3C field below-->
{{OBDtr| 0x1C | int32   | | 38 00 00 00 | 56        | template descriptor count }}
OniSplit's .oni files use PC 1.0 checksum by default and 1.1 checksums when holding data that is stored differently in the 1.1 format (SNDD, TXMP, AGQG, M3GM, IGSt, TSFT/TSGA, TRAM/TREX) }}
{{OBDtr| 0x20 | int32   | | A0 BC 03 00 | 0x03BCA0  | data table offset }}
{{OBDtr| 0x08 | uint32   | | 31 33 52 56 | '13RV'   | .dat version (meant to be read as "VR31")<br>OniSplit's .oni files use '23RV' ("VR32") instead<!--<br>OniX's [[Oni (folder)|GDFX]] uses '33RV' ("VR33") to signify that the new data versioning system is in use--> }}
{{OBDtr| 0x24 | int32   | | A0 35 25 00 | 0x2535A0  | data table size }}
{{OBDtr| 0x0C | uint16   | | 40 00       | 64        | size of this header }}
{{OBDtr| 0x28 | int32   | | 40 F2 28 00 | 0x28F240  | name table offset }}
{{OBDtr| 0x0E | uint16  | | 14 00       | 20        | size of instance descriptor (32 in Windows alpha 6) }}
{{OBDtr| 0x2C | int32   | | 04 4F 02 00 | 0x024F04  | name table size }}
{{OBDtr| 0x10 | uint16  | | 10 00       | 16        | size of template descriptor }}
{{OBDtr| 0x30 | int32   | | 00 00 00 00 |           | used by OniSplit only: raw table offset }}
{{OBDtr| 0x12 | uint16  | | 08 00       | 8        | size of name descriptor }}
{{OBDtr| 0x34 | int32   | | 00 00 00 00 |           | used by OniSplit only: raw table size }}
{{OBDtr| 0x14 | uint32   | | 83 24 00 00 | 9347      | instance descriptor count  }}
{{OBDtr| 0x38 | int32   | | 00 00 00 00 |           | unused }}
{{OBDtr| 0x18 | uint32   | | D4 1B 00 00 | 7124      | name descriptor count }}
{{OBDtr| 0x3C | int32   | | 00 00 00 00 |           | unused }}
{{OBDtr| 0x1C | uint32   | | 38 00 00 00 | 56        | template descriptor count }}
{{OBDtr| 0x20 | uint32   | | A0 BC 03 00 | 0x03BCA0  | data block offset }}
{{OBDtr| 0x24 | uint32   | | A0 35 25 00 | 2438560  | data block size }}
{{OBDtr| 0x28 | uint32   | | 40 F2 28 00 | 0x28F240  | name block offset }}
{{OBDtr| 0x2C | uint32   | | 04 4F 02 00 | 151300    | name block size }}
{{OBDtr| 0x30 | uint32   | | 99 CF 40 00 | (garbage) | used by OniSplit for raw table offset }}
{{OBDtr| 0x34 | uint32   | | 90 4F 63 00 | (garbage) | used by OniSplit for raw table size }}
{{OBDtr| 0x38 | uint32   | | F4 55 5F 00 | (garbage) | unused<!--used by OniX for data versioning; the three high bytes contains the highest data version (timestamp) found in any instance in this .dat; see instance descriptor table's 0x10 for format--> }}
{{OBDtr| 0x3C | uint32   | | 90 4F 63 00 | (garbage) | unused<!--used by OniX for content versioning; the three high bytes contain the highest content version (timestamp) found in any instance in this .dat; see instance descriptor table's 0x10 for format--> }}
|}
|}


The '''template checksum''' tells us that this level data is in the .dat/.raw file scheme, as opposed to the .dat/.raw/.sep file scheme.
The file's '''total template checksum''' is the sum of all the template checksums (see "Template descriptors" below). Oni looks at this number in order to validate that it can read this version of the game data format. In practical terms, the total checksum value given for Windows above tells us that this level data is in the .dat/.raw file scheme, and the value given for Mac Oni and the Windows demo tells us that the level data uses the .dat/.raw/.sep file scheme.


The '''version''' of the instance file is the format version. Reading it backwards, as discussed under "Introduction", we get "VR31", which is probably "version 31". This is the format version of all instance files in all releases of Oni.
The '''version''' of the instance file is the format version. Reading it backwards, as discussed under the "Backwards and garbage data" section, we get "VR31" (which probably means "version 3.1" because the engine subsystem that reads template data was in its third iteration when the game shipped). This is the format version of all instance files in all releases of Oni.


The '''signature''' is identical in all instance files.
The '''descriptor sizes''' are the sizes of the instance, template, and name descriptors which are coming up in this file (see breakdowns in later sections). For instance, each instance descriptor will be 0x14, or 20 bytes, in length.


The '''descriptor counts''' are the sizes of some arrays which are coming up soon: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.
The '''descriptor counts''' are the sizes of arrays which are coming up in this file: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.


Next we are told the addresses and sizes of the '''data and name tables''' in this file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, but that doesn't mean the name table offset is redundant; if its start was not 32-bit-aligned, it probably would be moved down to start at the next 32-bit word, but this is unnecessary because it just happens to fall on such an even number already.
Next we are told the addresses and sizes of the '''data and name tables''' in the instance file. The name block simply follows the data block, as you'll see if you add the data block offset plus the data block size, so the name block offset is technically redundant. The name block offset plus the name block size equals the total size of the file since it's the last segment of the file.


After this comes four "int"s of '''zeroes'''. Empty space like this is common in the data files, and indicates that something stored in memory at this relative position was not written to disk (probably pointers, sometimes a buffer reserved for possible future use).
After the name block's size comes four "int"s of '''garbage'''; this is padding in order to align the start of the next segment of the file on a 32-byte boundary. The first two 32-bit fields in this space are, however, used in .oni files generated by OniSplit<!--, and the last 32-bit field is partly used by OniX for a new form of template versioning. Future usage of these fields by OniSplit and/or OniX may change (hopefully not too much)-->.


That concludes the header of the instance file. Immediately after this header, we find the instance descriptors, starting with....
That concludes the header of the instance file. Immediately after this header we find the instance descriptors array.


===Instance descriptors===
==Instance descriptors==
The "instance descriptors" array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. It starts at 0x40 in the .dat file, but below is a descriptor found at 0x17B50 in the file which makes a good example. In the table below, we use offsets relative to the start of this descriptor.
The instance descriptor array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. The descriptors start at 0x40 in the .dat file, but below is a descriptor found at 0x017B50 in the file which makes a better example. In the table below, we use offsets relative to the start of this descriptor. We also show the alternate structure in the Windows alpha 6, the oldest known version of Oni and the only one with an observed difference in the instance descriptor format.


{{Table}}
{| class="wikitable"
{{OBD_Table_Header}}
|- bgcolor="#E9E9E9"
{{OBDtr| 0x00 | tag     | | 54 42 55 53 | 'SUBT'   | template tag }}
! width=5% | Offset
{{OBDtr| 0x04 | int32   | | C8 30 22 00 | 0x2230C8 | data offset (relative to data table) }}
! width=5% | Type
{{OBDtr| 0x08 | int32   | | 01 CB 00 00 | 0xCB01   | name offset (relative to name table) }}
! width=10% | Raw Hex
{{OBDtr| 0x0C | int32   | | C0 09 00 00 | 0x09C0    | data size }}
! width=10% | Value
{{OBDtr| 0x10 | int32   | | 00 00 00 00 | 0        | flags; possible values:
! width=35% | Description (retail<!--, OniX-->)
|- align=center
| 0x00
| tag
| 54 42 55 53
| 'SUBT'
|align=left | template tag
|- align=center
| 0x04
| int32
| C8 30 22 00
| 0x2230C8
|align=left | data offset (relative to data block)
|- align=center
| 0x08
| int32
| 01 CB 00 00
| 0xCB01
|align=left | name offset (relative to name block)
|- align=center
| 0x0C
| int32
| C0 09 00 00
| 2496
|align=left | data size
|- align=center
| 0x10
| int32
| 00 00 30 00
| 0x300000
|align=left | flags
|}
 
{{Divhide|Windows alpha 6}}
{| class="wikitable"
|- bgcolor="#E9E9E9"
! width=5% | Offset
! width=5% | Type
! width=10% | Raw Hex
! width=10% | Value
! width=35% | Description (alpha 6)
|- align=center
| 0x00
| int64
| 68 6C 04 00
| 0x46C68
|align=left | template checksum
|- align=center
| 0x08
| tag
| 54 42 55 53
| 'SUBT'
|align=left | template tag
|- align=center
| 0x0C
| int32
| E8 37 18 00
| 0x1837E8
|align=left | data offset (relative to data block)
|- align=center
| 0x10
| int32
| 4E C5 00 00
| 0xC54E
|align=left | name offset (relative to name block)
|- align=center
| 0x14
| int32
| 20 08 00 00
| 2080
|align=left | data size
|- align=center
| 0x18
| int32
| 00 00 30 00
| 0x300000
|align=left | flags
|- align=center
| 0x1C
| int32
| EA 5F A6 39
| {{LocaleDate|2000|08|25}}<br>08:00:42 AM
|align=left | creation date (seconds since 1/1/1900)
|}
{{Divhide|end}}
The retail version of this instance descriptor tells us that a resource of '''type''' SUBT (a subtitle file for Oni; there are only two of these, one containing all speech subtitles, and one for help messages) has '''data''' that can be found 0x2230C8 bytes into the data block, which we learned from the file header starts at 0x03BCA0. Its '''name''' can be found 0xCB01 bytes into the name block that starts, according to the file header, at 0x28F240.
 
The data's '''size''' is given as 0x09C0, or 2,496 bytes, but it's important to clarify that this is the total size of the data counting from the resource header to the next 32-byte boundary after the end of this instance's actual data; in other words it is the true total of the space occupied on disk by this instance. This is interesting because the data offset leads you to the start of the instance-specific data which begins 8 bytes after the resource header, so if you erroneously add the data size to the data offset to find the end of the instance data then you will find yourself 8 bytes into the next instance.
 
Before we proceed, let's expand upon the '''flags''' field.
 
;Flags - data usage
:0x'''01''' 00 00 00 - unnamed
:0x'''01''' 00 00 00 - unnamed
:0x'''02''' 00 00 00 - empty
:0x'''02''' 00 00 00 - empty
:0x'''04''' 00 00 00 - never used; appears to mean "big-endian" data
:0x'''04''' 00 00 00 - never used; intended to mark instance as pointing to duplicate data rather than its own data
:0x'''08''' 00 00 00 - shared }}
:0x'''08''' 00 00 00 - instance's data is being used by duplicate instances as a source
|}


This descriptor tells us that a resource of type SUBT (a subtitle file for Oni; there are only two in the game) has data that can be found 0x2230C8 bytes into the data table, which we learned from the file header starts at 0x03BCA0. Its name can be found 0xCB01 bytes into the name table that starts, according to the file header, at 0x28F240. The data is 0x09C0, or 2,496 bytes.
;Flags - Tool mode<!-- (retail)-->
The first two of the following bits occur throughout the original .dat files. However all of these bits are ignored by the engine when loading data because they only have relevance at runtime when Oni is in Tool mode:
:0x00 00 '''10''' 00 - touched (unsaved data)
:0x00 00 '''20''' 00 - "in batch file"
:0x00 00 '''40''' 00 - delete upon next save


If you want to see the name of this resource, let's look at address 0xCB01 + 0x28F240 = 0x29BD41. There we find the string "SUBTsubtitles".
<!--;Flags - Tool mode (OniX)
These bits have been moved to the upper half of the first byte (on disk they are cleared altogether in the GDFX data, but this is their location in memory):
:0x'''10''' 00 00 00 - touched (unsaved data)
:0x'''20''' 00 00 00 - "in batch file"
:0x'''40''' 00 00 00 - delete upon next save
This frees up the three higher bytes for the data versioning timestamp which is in YY/MM/DD format, stored thusly:
:0x00 '''00''' 00 00 - versioning timestamp – day
:0x00 00 '''00''' 00 - versioning timestamp – month
:0x00 00 00 '''00''' - versioning timestamp – year-->
The flags "unnamed" and "empty" require special explanation.


===Name descriptors===
===Unnamed and empty resources===
The "name descriptors" array stores the numbers of all named instances in alphabetical order. This allows the engine to do a binary search to faster find instances by name. It is also used when finding instances by type.
You'll notice that the level file header lists fewer names (7,124) than instances (9,347). That's because there are 3 types of instance:
*Unnamed and not empty - they are only referenced by other instances in the same file, generally as child data (e.g., 3D geometry elements like ABNA are "contained" by AKEV, a level's environment).
*:In vanilla Oni .dats there are some rare occurrences of unnamed non-empty ''orphan'' instances (e.g., [[OBD:File types/Naming#TRCM|TRCM]]). These are a form of garbage and are discarded by OniSplit when unpacking a level.
*Named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them.
*Named and empty - "empty" instances are used in level-specific instance files (i.e. not in level0_Final.dat) to associate an instance ID with a name. For every empty resource, there's another one with a matching name in level0_Final.dat that has data in it. The empty resource in the instance file is (usually) looked up by ID, then the engine searches all the loaded files for a non-empty instance with the same name, causing it to find the actual file in the global data in level0_Final.dat.


{{Table}}
===Peeking ahead at instance name===
{{OBD_Table_Header}}
Before we talk about the name block in depth, we can peek ahead at the name of this resource using the offset we've just been given. Let's add the offset 0xCB01 to 0x28F240, the file header's address for the name block. This gives us the address 0x29BD41. There we find the string "SUBTsubtitles".
{{OBDtr| 0x00 | int32  | | 00 00 00 00 | 0    | instance number }}
{{OBDtr| 0x04 | int32  | | 00 00 00 00 | 0    | runtime: pointer to instance name }}
|}


The "template descriptor" array contains information about all templates used in the file. The template checksum is used to prevent loading of instance files that are not compatible with the current engine version.
===Peeking ahead at instance data===
The actual subtitle data should be found by adding the offset 0x2230C8 to 0x03BCA0, the file header's address for the data block, to get 0x25ED68. We're going to leave the full details of the data block for later, but below is the data you should actually see for the English Oni SUBT file at this address. You have to consult the [[SUBT]] page to know how to read this data.


{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBDth}}
{{OBDtr| 0x00 | int64  | | A0 6D 12 00 00 00 00 00 | 0x126DA0    | template checksum }}
{{OBDtr| 0x08 | char[16] | | AD DE      | dead        | unused }}
{{OBDtr| 0x0C | tag    | | 41 4E 42 41            | 'ABNA'      | template tag }}
{{OBDtr| 0x18 | offset  | | 80 44 44 01 | 0x01444480  | raw file data address }}
{{OBDtr| 0x08 | int32   | | 01 00 00 00             | 1          | number of instances that use this template }}
{{OBDtr| 0x1C | int32   | | 61 02 00 00 | 609        | array size }}
|}
|}


After '''padding''' of 16 unused bytes, we find that, instead of data, there's an address of the actual data: it's in the level's raw file. Open level0_Final.raw and jump to address 0x01444480, and you should see "01_01_01 Griffin: Give me another reading.", and the rest of some very familiar dialogue continuing from there.


The '''array size''' of 609 tells the part of the engine that reads SUBT data to expect a chunk of 609 subtitled lines of dialogue.


The data table stores all the instance data. Instance ID is always stored 32 byte aligned (thus the instance specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.). Instance ID and file ID are not actually part of the instance data. The engine always has pointers to "instance specific data" and instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance give a pointer to it).
==Name descriptors==
The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In this case, that means 20 * 9347 = 186940, or 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.
 
The name descriptor array stores the numbers of all named instances in the alphabetical order by said names, which are found in the name block but also pointed to by these entries. This array is used by the engine to look up instances by name; it's also used to find instances by template (scanning just the tag at the start of each name). The purpose of this array being alphabetized was to allow the engine to do a binary search to find instances by name more quickly, but the retail engine no longer attempts a binary search and merely iterates over the array from start to end.


{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | int32  | | 01 00 00 00 | 1          | instance id }}
{{OBDtr| 0x00 | int32  | | 15 16 00 00 | 5653      | instance descriptor index }}
{{OBDtr| 0x04 | int32  | | 01 00 00 02 | 0x02000001 | file id }}
{{OBDtr| 0x04 | int32  | | 60 2C 1C 0E | (garbage) | runtime: pointer to instance name }}
{{OBDtr| 0x08 |        | |            |           | [[OBD:File_types|instance specific data]] }}
|}
|}


The index number here is referring to the instance's position in the instance descriptor array. This number is also used by the data block to identify each instance, thus it is found in two places in the data explicitly and one place implicitly.


Since the addresses of the names in memory cannot be known until the file is loaded into RAM, a space of 32 bits is reserved for each pointer at runtime.


The name table stores all the instance names as C style strings (terminated by 0).
==Template descriptors==
Likewise, the template descriptor array starts directly after the name descriptors. Since name descriptors are 8 bytes, 8 * 7124 (taken from the header) = 56992, or 0xDEA0, and adding that to the name descriptor array's start address (0x02DA7C) gives us 0x03B91C as the start of the template descriptors.
 
The template descriptor array contains information about all templates (that is, resource types, aka tags), used in the file (56 in this case, as we learned from the file header). Any resource occurring in this instance file has to have its type listed here. Here is the template descriptor at 0x3B9FC:


{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | string  | | 41 49 ... 00 | "AISAlevel1_scripts" | name string (0 terminated) }}
{{OBDtr| 0x00 | int64  | | 3C B9 A6 71 08 00 00 00 | 0x871A6B93C | template checksum }}
{{OBDtr| 0x08 | tag    | | 45 47 52 54            | 'EGRT'      | template tag }}
{{OBDtr| 0x0C | int32  | | 01 00 00 00            | 1          | unused: number of resources in file that use this template }}
|}
|}


The '''template checksum''' is used to prevent loading of instance files that are not compatible with the current engine version. The '''tag''' is the same kind of number-written-as-backwards-ASCII that we discussed in the "Backwards and garbage data" section; in this case, 'EGRT' means [[TRGE]]. The field for the '''number of resources''' using this template is unused. The number should be correct for each template, but Oni never uses it for anything.


;Instance (.dat) file
You might wonder how Oni knows how to read each type of data, such as a SUBT or an ABNA. The simple answer is that this information is hard-coded into Oni. In fact, the information on each instance type, as stored in Oni's code, is actually the real "template". The file data merely gives the tag and checksum that identify the template in use so that Oni knows how to read the following data fields. These hardcoded templates also tell Oni which parts of the file data are reserved for pointers.
An instance file is a dump of engine's in memory data structures. It is acompanied by a .raw file and a .sep file (the .sep file is only used and present in the PC Demo and Mac versions of the game) which stores additional data (usually large and unstructured like texture or sound data) needed by some instance types. [[OniSplit]] generated .oni files are PC .dat files with all the data contained by the .raw/.sep files appended at the end.


That's because an instance may have pointers to other related instances, but pointers are only valid in memory; they cannot be stored meaningfully on disk. They must be set at runtime when the level data is loaded into memory and an address in RAM has been assigned. Thus one type of data field in Oni's templates is a "raw data" pointer; on Macs and the Windows demo, there is an additional "separate data" pointer. These pointers are 32 bits in length, as one must expect since Oni was compiled for 32-bit PCs.


;Binary (.raw, .sep) file
Incidentally, the templates in Oni's code have not just the familiar four-character tags attached to them, but also a descriptive string, e.g. "BSP Tree Node Array". These strings were typed into the source code where each template structure was defined, and eventually extracted from the binary by modders. This is the source of the names on [[OBD:File types]].
Binary files do not have any file header. The only rule about binary files is that all data parts are stored 32 byte aligned and the first 32 byte of the file are always 0 (reserved to represent NULL pointers). Instances store file offsets into binary files and at loaded time the offsets are converted to pointers.


==Data block==
The data block occupies the majority of the file and stores all the instance data (though this data sometimes points to the location of more data in a raw/separate file). We peeked at this table before when we looked at the instance descriptor for SUBTsubtitles. The table's starting point is found at the offset given in the header, in this case 0x03BCA0, saving us the trouble of adding up the size of the four preceding segments of the file and then aligning to the next 32-byte boundary.


The reason we'd need to align to 32 bytes is that the start of each instance's record (the instance ID) is always 32 byte-aligned. Thus, even though the template descriptors ended at 0x03BC9C, there are four empty bytes here so that the data block can begin at 0x03BCA0, which divides evenly by 32. This alignment rule also means that the instance-specific data will always start at an offset like 0x0008, 0x0028, 0x0148, etc.


;Instance file name
The instance ID and file ID are not actually part of the instance data but are considered to be the resource header. The engine always keeps pointers to the start of the type-specific data itself; we saw this before when we jumped to 0x25ED68 and saw the data for the SUBT rather than the header for this data. The instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance, given a pointer to it).
An instance file name has the following structure:
levelN_T.dat
where N is the level number (from 0 to 127) and T is the type of file. Known types are "Final" and "Tools". The original exe only loads "Final" files.


{{Table}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | res_id  | | 01 0B 04 00 | 1035 | instance descriptor index }}
{{OBDtr| 0x04 | lev_id  | | 01 00 00 06 |    3 | level number }}
{{OBDtr| 0x08 | ...    | | ...        | ...  | [[OBD:File types|type-specific data]]... }}
|}


;Level 0 file
This example is taken from level 3 so that the file ID is more instructive. In the OBD documentation, these fields are called res_id and lev_id as seen above.
Level 0 file does not acually contain a level but instances shared across all levels. It is loaded first when the game starts and never unloaded. All other level files are only loaded when the coresponding level starts and unloaded when it ends.


The '''instance's ID''' is stored as "(instance descriptor index << 8) | 1". Thus the 1,035th entry in the instance descriptor index will be encoded as 0x40B00. The '1' allows the engine to know which IDs have already been converted to pointers (an instance pointer will always be 8-byte aligned, so it will never have the zero bit already set). These pointer flags were retained when the file was written to disk but are meaningless now. At level-load time the flags are cleared and then set again when Oni allocates memory for each instance. The purpose of left-shifting the index number is simply to leave the lowest byte open for the pointer flag.


;Instance descriptors
The '''file ID''' is computed from the number found in the name of the instance file: "(level number << 25) | 1". Thus instances found in level3_Final.dat will have the file ID encoded as 0x6000001. Again, the '1' is used by the engine to know which file IDs have been converted to pointers at runtime, but on disk this is a relic which has no meaning to us. The reason for left-shifting the level number might have originally been to store it alongside the instance ID and the pointer flag in a single int32, but they are separate numbers now, perhaps so that both IDs can have their own pointer flag.
There are 3 types of instance descriptors:
*unnamed - they are referenced by other instances in the same file and the engine never reaches them directly
*named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them
*named and empty - the instance data is stored in a different file and they exist only to associate an instance id with a name; when an instance references such an instance id the engine searches all the loaded files for a non empty instance with the same name


After the header, the size of each instance's data is of a somewhat arbitrary length depending on the template this instance falls under. As mentioned under "Instance descriptors", the data size given by the descriptor includes the 8-byte resource header and the padding at the end of the data to align the next instance on 32 bytes.


;Instance ID
===Looking backward from data to instance===
The ID of an instance is computed as:
By the way, if you pick a random place in the data block to look at with a hex editor, how do you know which resource you're looking at? You would look for the highest data offset in the instance descriptor array that is less than your position in the file. Let's say that the string at 0x3BD40 caught our eye: "powerup_ammo". Subtracting the start of the data block, 0x3BCA0, gives us 0xA0 as the position of this string. Now looking back at the instance descriptor array, the instances' data offsets occur every 20 bytes and come directly after the tags. We can see that the first data offset is 0x8 and the next one is 0xF68, thus our offset into the data block of 0xA0 means we are looking at the instance which starts at 0x8. It's the very first instance listed at the start of the instance descriptor array:
(instance_descriptor_index << 8) <nowiki>|</nowiki> 1.  
The 1 allows the engine to know which IDs have already been converted to pointers (a instance pointer will always be 8 byte aligned so it can never have the bit 0 set).


{{Table}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | tag    | | 53 47 4E 4F | 'ONGS'  | template tag }}
{{OBDtr| 0x04 | int32  | | 08 00 00 00 | 0x08    | data offset (relative to data block) }}
{{OBDtr| 0x08 | int32  | | 00 00 00 00 | 0x00    | name offset (relative to name block) }}
{{OBDtr| 0x0C | int32  | | 60 0F 00 00 | 3936    | data size }}
{{OBDtr| 0x10 | int32  | | 00 00 00 00 | 0      | flags }}
|}


;File ID
So this tells us that the first data in the data block belongs to the solitary [[ONGS]] resource, and that it extends for 3,936 bytes. Since its name offset is 0x0, it's the first string in the name block, which we can see below is SUBTsubtitles.
The file ID is computed from the name of the instance file. For "_Final" files the file ID is computed as:
(level_number << 25) <nowiki>|</nowiki> 1
Again the 1 allows the engine to know which file IDs have already been converted to pointers.


==Name block==
This final segment of the file stores all the instance names as C-style ASCII strings (terminated by a zero byte). We peeked at this before when we looked at the instance descriptor for SUBTsubtitles. The start of this table is 32-byte aligned but after that the strings are simply packed end to end, separated only by their null terminator. As with the data block, the name block's starting point is given in the header, in this case 0x28F240.


;Templates
{{Table}}
An instance can have pointers to other instances but since pointers are only valid in memory they are converted to instance identifiers when the file is saved and converted back to pointers when the file is loaded into memory. To be able to do this the engine must know where pointers are and this is done using "templates". A template contains:
{{OBD_Table_Header}}
*a checksum of the data contained by the template (the checksum algorithm is unknown)
{{OBDtr| 0x00 | string  | | 53 55 ... 00 | "SUBTsubtitles" | name string (zero-terminated) }}
*a 4-letter tag used to identify the template (ABNA, ONCC, WMDD etc.)
|}
*a short description of the data structure like "BSP Tree Node Array"
*a list of all data structure's fields and their types
*other data that appears to be unused like size of the fixed part and size of an array element for data structures that contain variable length arrays
 
 
;Absolute limits
*Max level number: 127
*Max number of instance files in GameDataFolder: 512 (PC), 16 (PC Demo, Mac)
*Max number of simultaneously loaded instance files: 64
*Max number of instances in a file: 131071
*Max length of an instance file name: 31
*Max length of an instance name: 63 (including the 4 character template tag)


These names can be up to 63 characters long, counting the tag. The instance file concludes with the end of the name block.


{{OBD}}
{{OBD}}

Latest revision as of 04:57, 1 December 2025

Click to return to the main OBD page.
".dat" redirects here; for other files ending in ".dat", see Oni (folder).
You should read the Game data terminology page before this one.
The Raw and separate file formats page should be read after this one.

Files in GameDataFolder/ named "level[0-19]_Final.dat", together with ".raw" and sometimes ".sep" counterparts, contain the game data for Oni. These are called "instance files" internally, but a more common-sense name for them is level data files. The format described below was also used for the tool files which supplied the GUI for the in-game editor, however the retail Oni game application refuses to load tool files; for the story behind the tool files, see level0_Tools.

The level 0 files do not contain resources for a specific level but rather resources (instances) shared across all levels. Level 0 is loaded when the game starts and is never unloaded. All other level files, 1-19, are only loaded when their corresponding level starts and then unloaded when it ends. Oni can only hold two level files in memory concurrently. Thus, resources have to be duplicated on disk whenever a character class, sound effect, etc. occurs in more than one level. For instance, although there are only 2,380 unique sounds in the game, there are 7,386 sounds stored across all level data files.

Backwards and garbage data

As mentioned, the game's developers used the in-game editor to create AIs, particles, etc. in a level. When one of these developers saved his work, the contents of the level, stored in his PC's RAM, were flushed directly to disk. Thus the structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory. So when we read the data in the files with a hex editor, we can see eccentricities such as blank space (coming from unused fields and byte-alignment padding) and garbage data (such as now-meaningless pointer values). Further gaps, mostly representing orphaned obsolete resources, add up to about 25 MB for the whole game.

Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte. FourCCs in the data are stored "backwards", such as "13RV" which is meant to be read "VR31", because Bungie defined those four bytes as a 32-bit integer, not a string, causing them to be written to disk in little-endian order.

File limits

  • Max level number: 127
  • Max number of instance files in GameDataFolder: 512 (Windows), 16 (Windows demo, Mac)
  • Max number of simultaneously loaded instance files: 64
  • Max number of instances in a file: 131071
  • Max length of an instance file name: 31
  • Max length of an instance name: 63 (including the 4 character template tag)

Header

Here is a walkthrough of an instance file using the level0_Final.dat in English Windows Oni. Follow along in a hex editor for maximum educational value. Each term will be explained in-depth when we fully consider the related data. First, here is how the file begins:

Offset Type Raw Hex Value Description
0x00 uint64 1F 27 DC 33 DF BC 03 00 0x0003BCDF33DC271F Total template checksum (main indicator of engine compatibility):
  • 0x0003BCDF33DC271F (PC v1.0) - templates compatible with Windows retail engine
  • 0x0003BCDF23C13061 (PC v1.1) - templates compatible with Windows demo and Mac engines
  • 0x0003BA70A8DBAE11 (PS2) - templates compatible with PlayStation 2 engine

OniSplit's .oni files use PC 1.0 checksum by default and 1.1 checksums when holding data that is stored differently in the 1.1 format (SNDD, TXMP, AGQG, M3GM, IGSt, TSFT/TSGA, TRAM/TREX)

0x08 uint32 31 33 52 56 '13RV' .dat version (meant to be read as "VR31")
OniSplit's .oni files use '23RV' ("VR32") instead
0x0C uint16 40 00 64 size of this header
0x0E uint16 14 00 20 size of instance descriptor (32 in Windows alpha 6)
0x10 uint16 10 00 16 size of template descriptor
0x12 uint16 08 00 8 size of name descriptor
0x14 uint32 83 24 00 00 9347 instance descriptor count
0x18 uint32 D4 1B 00 00 7124 name descriptor count
0x1C uint32 38 00 00 00 56 template descriptor count
0x20 uint32 A0 BC 03 00 0x03BCA0 data block offset
0x24 uint32 A0 35 25 00 2438560 data block size
0x28 uint32 40 F2 28 00 0x28F240 name block offset
0x2C uint32 04 4F 02 00 151300 name block size
0x30 uint32 99 CF 40 00 (garbage) used by OniSplit for raw table offset
0x34 uint32 90 4F 63 00 (garbage) used by OniSplit for raw table size
0x38 uint32 F4 55 5F 00 (garbage) unused
0x3C uint32 90 4F 63 00 (garbage) unused

The file's total template checksum is the sum of all the template checksums (see "Template descriptors" below). Oni looks at this number in order to validate that it can read this version of the game data format. In practical terms, the total checksum value given for Windows above tells us that this level data is in the .dat/.raw file scheme, and the value given for Mac Oni and the Windows demo tells us that the level data uses the .dat/.raw/.sep file scheme.

The version of the instance file is the format version. Reading it backwards, as discussed under the "Backwards and garbage data" section, we get "VR31" (which probably means "version 3.1" because the engine subsystem that reads template data was in its third iteration when the game shipped). This is the format version of all instance files in all releases of Oni.

The descriptor sizes are the sizes of the instance, template, and name descriptors which are coming up in this file (see breakdowns in later sections). For instance, each instance descriptor will be 0x14, or 20 bytes, in length.

The descriptor counts are the sizes of arrays which are coming up in this file: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.

Next we are told the addresses and sizes of the data and name tables in the instance file. The name block simply follows the data block, as you'll see if you add the data block offset plus the data block size, so the name block offset is technically redundant. The name block offset plus the name block size equals the total size of the file since it's the last segment of the file.

After the name block's size comes four "int"s of garbage; this is padding in order to align the start of the next segment of the file on a 32-byte boundary. The first two 32-bit fields in this space are, however, used in .oni files generated by OniSplit.

That concludes the header of the instance file. Immediately after this header we find the instance descriptors array.

Instance descriptors

The instance descriptor array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. The descriptors start at 0x40 in the .dat file, but below is a descriptor found at 0x017B50 in the file which makes a better example. In the table below, we use offsets relative to the start of this descriptor. We also show the alternate structure in the Windows alpha 6, the oldest known version of Oni and the only one with an observed difference in the instance descriptor format.

Offset Type Raw Hex Value Description (retail)
0x00 tag 54 42 55 53 'SUBT' template tag
0x04 int32 C8 30 22 00 0x2230C8 data offset (relative to data block)
0x08 int32 01 CB 00 00 0xCB01 name offset (relative to name block)
0x0C int32 C0 09 00 00 2496 data size
0x10 int32 00 00 30 00 0x300000 flags
Windows alpha 6
Offset Type Raw Hex Value Description (alpha 6)
0x00 int64 68 6C 04 00 0x46C68 template checksum
0x08 tag 54 42 55 53 'SUBT' template tag
0x0C int32 E8 37 18 00 0x1837E8 data offset (relative to data block)
0x10 int32 4E C5 00 00 0xC54E name offset (relative to name block)
0x14 int32 20 08 00 00 2080 data size
0x18 int32 00 00 30 00 0x300000 flags
0x1C int32 EA 5F A6 39 08252000
08:00:42 AM
creation date (seconds since 1/1/1900)

The retail version of this instance descriptor tells us that a resource of type SUBT (a subtitle file for Oni; there are only two of these, one containing all speech subtitles, and one for help messages) has data that can be found 0x2230C8 bytes into the data block, which we learned from the file header starts at 0x03BCA0. Its name can be found 0xCB01 bytes into the name block that starts, according to the file header, at 0x28F240.

The data's size is given as 0x09C0, or 2,496 bytes, but it's important to clarify that this is the total size of the data counting from the resource header to the next 32-byte boundary after the end of this instance's actual data; in other words it is the true total of the space occupied on disk by this instance. This is interesting because the data offset leads you to the start of the instance-specific data which begins 8 bytes after the resource header, so if you erroneously add the data size to the data offset to find the end of the instance data then you will find yourself 8 bytes into the next instance.

Before we proceed, let's expand upon the flags field.

Flags - data usage
0x01 00 00 00 - unnamed
0x02 00 00 00 - empty
0x04 00 00 00 - never used; intended to mark instance as pointing to duplicate data rather than its own data
0x08 00 00 00 - instance's data is being used by duplicate instances as a source
Flags - Tool mode

The first two of the following bits occur throughout the original .dat files. However all of these bits are ignored by the engine when loading data because they only have relevance at runtime when Oni is in Tool mode:

0x00 00 10 00 - touched (unsaved data)
0x00 00 20 00 - "in batch file"
0x00 00 40 00 - delete upon next save

The flags "unnamed" and "empty" require special explanation.

Unnamed and empty resources

You'll notice that the level file header lists fewer names (7,124) than instances (9,347). That's because there are 3 types of instance:

  • Unnamed and not empty - they are only referenced by other instances in the same file, generally as child data (e.g., 3D geometry elements like ABNA are "contained" by AKEV, a level's environment).
    In vanilla Oni .dats there are some rare occurrences of unnamed non-empty orphan instances (e.g., TRCM). These are a form of garbage and are discarded by OniSplit when unpacking a level.
  • Named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them.
  • Named and empty - "empty" instances are used in level-specific instance files (i.e. not in level0_Final.dat) to associate an instance ID with a name. For every empty resource, there's another one with a matching name in level0_Final.dat that has data in it. The empty resource in the instance file is (usually) looked up by ID, then the engine searches all the loaded files for a non-empty instance with the same name, causing it to find the actual file in the global data in level0_Final.dat.

Peeking ahead at instance name

Before we talk about the name block in depth, we can peek ahead at the name of this resource using the offset we've just been given. Let's add the offset 0xCB01 to 0x28F240, the file header's address for the name block. This gives us the address 0x29BD41. There we find the string "SUBTsubtitles".

Peeking ahead at instance data

The actual subtitle data should be found by adding the offset 0x2230C8 to 0x03BCA0, the file header's address for the data block, to get 0x25ED68. We're going to leave the full details of the data block for later, but below is the data you should actually see for the English Oni SUBT file at this address. You have to consult the SUBT page to know how to read this data.

Offset Type Raw Hex Value Description
0x08 char[16] AD DE dead unused
0x18 offset 80 44 44 01 0x01444480 raw file data address
0x1C int32 61 02 00 00 609 array size

After padding of 16 unused bytes, we find that, instead of data, there's an address of the actual data: it's in the level's raw file. Open level0_Final.raw and jump to address 0x01444480, and you should see "01_01_01 Griffin: Give me another reading.", and the rest of some very familiar dialogue continuing from there.

The array size of 609 tells the part of the engine that reads SUBT data to expect a chunk of 609 subtitled lines of dialogue.

Name descriptors

The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In this case, that means 20 * 9347 = 186940, or 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.

The name descriptor array stores the numbers of all named instances in the alphabetical order by said names, which are found in the name block but also pointed to by these entries. This array is used by the engine to look up instances by name; it's also used to find instances by template (scanning just the tag at the start of each name). The purpose of this array being alphabetized was to allow the engine to do a binary search to find instances by name more quickly, but the retail engine no longer attempts a binary search and merely iterates over the array from start to end.

Offset Type Raw Hex Value Description
0x00 int32 15 16 00 00 5653 instance descriptor index
0x04 int32 60 2C 1C 0E (garbage) runtime: pointer to instance name

The index number here is referring to the instance's position in the instance descriptor array. This number is also used by the data block to identify each instance, thus it is found in two places in the data explicitly and one place implicitly.

Since the addresses of the names in memory cannot be known until the file is loaded into RAM, a space of 32 bits is reserved for each pointer at runtime.

Template descriptors

Likewise, the template descriptor array starts directly after the name descriptors. Since name descriptors are 8 bytes, 8 * 7124 (taken from the header) = 56992, or 0xDEA0, and adding that to the name descriptor array's start address (0x02DA7C) gives us 0x03B91C as the start of the template descriptors.

The template descriptor array contains information about all templates (that is, resource types, aka tags), used in the file (56 in this case, as we learned from the file header). Any resource occurring in this instance file has to have its type listed here. Here is the template descriptor at 0x3B9FC:

Offset Type Raw Hex Value Description
0x00 int64 3C B9 A6 71 08 00 00 00 0x871A6B93C template checksum
0x08 tag 45 47 52 54 'EGRT' template tag
0x0C int32 01 00 00 00 1 unused: number of resources in file that use this template

The template checksum is used to prevent loading of instance files that are not compatible with the current engine version. The tag is the same kind of number-written-as-backwards-ASCII that we discussed in the "Backwards and garbage data" section; in this case, 'EGRT' means TRGE. The field for the number of resources using this template is unused. The number should be correct for each template, but Oni never uses it for anything.

You might wonder how Oni knows how to read each type of data, such as a SUBT or an ABNA. The simple answer is that this information is hard-coded into Oni. In fact, the information on each instance type, as stored in Oni's code, is actually the real "template". The file data merely gives the tag and checksum that identify the template in use so that Oni knows how to read the following data fields. These hardcoded templates also tell Oni which parts of the file data are reserved for pointers.

That's because an instance may have pointers to other related instances, but pointers are only valid in memory; they cannot be stored meaningfully on disk. They must be set at runtime when the level data is loaded into memory and an address in RAM has been assigned. Thus one type of data field in Oni's templates is a "raw data" pointer; on Macs and the Windows demo, there is an additional "separate data" pointer. These pointers are 32 bits in length, as one must expect since Oni was compiled for 32-bit PCs.

Incidentally, the templates in Oni's code have not just the familiar four-character tags attached to them, but also a descriptive string, e.g. "BSP Tree Node Array". These strings were typed into the source code where each template structure was defined, and eventually extracted from the binary by modders. This is the source of the names on OBD:File types.

Data block

The data block occupies the majority of the file and stores all the instance data (though this data sometimes points to the location of more data in a raw/separate file). We peeked at this table before when we looked at the instance descriptor for SUBTsubtitles. The table's starting point is found at the offset given in the header, in this case 0x03BCA0, saving us the trouble of adding up the size of the four preceding segments of the file and then aligning to the next 32-byte boundary.

The reason we'd need to align to 32 bytes is that the start of each instance's record (the instance ID) is always 32 byte-aligned. Thus, even though the template descriptors ended at 0x03BC9C, there are four empty bytes here so that the data block can begin at 0x03BCA0, which divides evenly by 32. This alignment rule also means that the instance-specific data will always start at an offset like 0x0008, 0x0028, 0x0148, etc.

The instance ID and file ID are not actually part of the instance data but are considered to be the resource header. The engine always keeps pointers to the start of the type-specific data itself; we saw this before when we jumped to 0x25ED68 and saw the data for the SUBT rather than the header for this data. The instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance, given a pointer to it).

Offset Type Raw Hex Value Description
0x00 res_id 01 0B 04 00 1035 instance descriptor index
0x04 lev_id 01 00 00 06 3 level number
0x08 ... ... ... type-specific data...

This example is taken from level 3 so that the file ID is more instructive. In the OBD documentation, these fields are called res_id and lev_id as seen above.

The instance's ID is stored as "(instance descriptor index << 8) | 1". Thus the 1,035th entry in the instance descriptor index will be encoded as 0x40B00. The '1' allows the engine to know which IDs have already been converted to pointers (an instance pointer will always be 8-byte aligned, so it will never have the zero bit already set). These pointer flags were retained when the file was written to disk but are meaningless now. At level-load time the flags are cleared and then set again when Oni allocates memory for each instance. The purpose of left-shifting the index number is simply to leave the lowest byte open for the pointer flag.

The file ID is computed from the number found in the name of the instance file: "(level number << 25) | 1". Thus instances found in level3_Final.dat will have the file ID encoded as 0x6000001. Again, the '1' is used by the engine to know which file IDs have been converted to pointers at runtime, but on disk this is a relic which has no meaning to us. The reason for left-shifting the level number might have originally been to store it alongside the instance ID and the pointer flag in a single int32, but they are separate numbers now, perhaps so that both IDs can have their own pointer flag.

After the header, the size of each instance's data is of a somewhat arbitrary length depending on the template this instance falls under. As mentioned under "Instance descriptors", the data size given by the descriptor includes the 8-byte resource header and the padding at the end of the data to align the next instance on 32 bytes.

Looking backward from data to instance

By the way, if you pick a random place in the data block to look at with a hex editor, how do you know which resource you're looking at? You would look for the highest data offset in the instance descriptor array that is less than your position in the file. Let's say that the string at 0x3BD40 caught our eye: "powerup_ammo". Subtracting the start of the data block, 0x3BCA0, gives us 0xA0 as the position of this string. Now looking back at the instance descriptor array, the instances' data offsets occur every 20 bytes and come directly after the tags. We can see that the first data offset is 0x8 and the next one is 0xF68, thus our offset into the data block of 0xA0 means we are looking at the instance which starts at 0x8. It's the very first instance listed at the start of the instance descriptor array:

Offset Type Raw Hex Value Description
0x00 tag 53 47 4E 4F 'ONGS' template tag
0x04 int32 08 00 00 00 0x08 data offset (relative to data block)
0x08 int32 00 00 00 00 0x00 name offset (relative to name block)
0x0C int32 60 0F 00 00 3936 data size
0x10 int32 00 00 00 00 0 flags

So this tells us that the first data in the data block belongs to the solitary ONGS resource, and that it extends for 3,936 bytes. Since its name offset is 0x0, it's the first string in the name block, which we can see below is SUBTsubtitles.

Name block

This final segment of the file stores all the instance names as C-style ASCII strings (terminated by a zero byte). We peeked at this before when we looked at the instance descriptor for SUBTsubtitles. The start of this table is 32-byte aligned but after that the strings are simply packed end to end, separated only by their null terminator. As with the data block, the name block's starting point is given in the header, in this case 0x28F240.

Offset Type Raw Hex Value Description
0x00 string 53 55 ... 00 "SUBTsubtitles" name string (zero-terminated)

These names can be up to 63 characters long, counting the tag. The instance file concludes with the end of the name block.