OBD:Instance file format: Difference between revisions

wording
(more explaining; if anyone can explain why the data table address is off by 8 bytes, I'd appreciate it)
(wording)
Line 1: Line 1:
==Terminology==
==Terminology==
Oni's level data is broken into two kinds of files in Windows retail Oni. One type ends in ".dat" and is called an instance file. An "instance" is essentially a resource, in plain English, such as a texture. Before raw and separate files existed, all resources would have been stored in the levelX_Final.dat file, so it was rightfully called an "instance file". The second type of file ends in ".raw" and is simply called a raw file. Windows demo Oni and Mac retail/demo Oni use a third type which ends in ".sep", short for "separate". You can read about raw and separate files [[Raw|HERE]].
Oni's level data is broken into two kinds of files in Windows retail Oni. One type ends in ".dat" and is called an instance file. An "instance" is essentially a resource, such as a texture. Initially, all resources would have been stored in the levelX_Final.dat file, so it was rightfully called an "instance file". However eventually much of the resource data was moved to a new type of file ending in ".raw", simply called a raw file. By the time that Oni for the Mac was finalized for release, some of the raw data was moved to a third file type which ends in ".sep", short for "separate". You can read about raw and separate files [[Raw|HERE]].


Note that ".dat" is a generic suffix originally used by Oni for all kinds of data, including [[persist.dat]]. The only reason that any other suffixes exist at all is that raw and separate files were created later in development and given unique suffixes to distinguish them from the .dat files in the same folder. Therefore, the proper, specific name for the <u>level data format</u>, as opposed to the save-game format, film format, etc. is not ".dat file" or "DAT file", but "instance file". That being said, ".dat" has only been used by the community historically to refer to instance files, so you can reasonably assume that's what is meant when you see the suffix.
Note that ".dat" is a generic suffix originally used by Oni for all kinds of data, including [[persist.dat]]. The only reason that any other suffixes exist at all is that when raw and separate files were created, they needed unique suffixes to distinguish them from the .dat files in the same folder. Therefore, the proper, specific name for the .dat files containing <u>level data</u>, as opposed to the .dat files containing the save-game data, films, etc. is "instance file". That being said, ".dat" has only been used by the community historically to refer to instance files, so you can reasonably assume that's what is meant when you see the suffix.


==Introduction==
==Introduction==
Line 11: Line 11:
Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte, which looks "backwards" from the standpoint of a culture that reads left-to-right. When Macs, which were big-endian at the time due to their PowerPC architecture, read these files, they then had to flip each sequence of bytes in memory before they could be understood.
Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte, which looks "backwards" from the standpoint of a culture that reads left-to-right. When Macs, which were big-endian at the time due to their PowerPC architecture, read these files, they then had to flip each sequence of bytes in memory before they could be understood.


An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, thus they are written left-to-right. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a number. For instance, writing the number 1,448,227,633 to disk in little-endian order results in the bytes 0x31, 0x33, 0x52, and 0x56, which happens to produce the ASCII codes for '1', '3', 'R' and 'V'. This provided a combination of more convenient storage in memory as a number, and human-readability when on disk.  
An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, so they retain their left-to-right order. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a number. You see, writing the number 1,448,227,633 to disk in little-endian order results in the bytes 0x31, 0x33, 0x52, and 0x56, which happens to produce the ASCII codes for '1', '3', 'R' and 'V'. This practice of Bungie's provided a combination of more convenient storage in memory as a number, and human-readability when on disk.  


==Walkthrough==
==Walkthrough==
Line 34: Line 34:
|}
|}


The '''template checksum''' tells us that this level data is in the .dat/.raw file scheme, as opposed to the .dat/.raw/.sep file scheme.
The '''template checksum''' tells us that this level data is in the .dat/.raw file scheme, as opposed to the .dat/.raw/.sep file scheme used by Mac Oni and the Windows demo of Oni.


The '''version''' of the instance file is the format version. Reading it backwards, as discussed under "Introduction", we get "VR31", which is probably "version 31". This is the format version of all instance files in all releases of Oni.
The '''version''' of the instance file is the format version. Reading it backwards, as discussed under "Introduction", we get "VR31", which is probably "version 31". This is the format version of all instance files in all releases of Oni.
Line 40: Line 40:
The '''signature''' is identical in all instance files.
The '''signature''' is identical in all instance files.


The '''descriptor counts''' are the sizes of some arrays which are coming up soon: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.
The '''descriptor counts''' are the sizes of some arrays which are coming up soon in this file: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.


Next we are told the addresses and sizes of the '''data and name tables''' in this file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, but that doesn't mean the name table offset is redundant; if its start was not 32-bit-aligned, it probably would be moved down to start at the next 32-bit word, but this is unnecessary because it just happens to fall on such an even number already.
Next we are told the addresses and sizes of the '''data and name tables''' in this file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, but that doesn't mean the name table offset is redundant; if its start was not 32-bit-aligned, it probably would be moved down to start at the next 32-bit word, but this is unnecessary because it just happens to be aligned already.


After this comes four "int"s of '''zeroes'''. Empty space like this is common in the data files, and indicates that something stored in memory at this relative position was not written to disk (probably pointers, sometimes a buffer reserved for possible future use).
After this comes four "int"s of '''zeroes'''. Empty space like this is common in the data files, and indicates that something stored in memory at this relative position was not written to disk (probably pointers, or sometimes a space reserved for possible future use in a resource type).


That concludes the header of the instance file. Immediately after this header, we find the instance descriptors, starting with....
That concludes the header of the instance file. Immediately after this header, we find the instance descriptors array.


===Instance descriptors===
===Instance descriptors===
The "instance descriptors" array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. It starts at 0x40 in the .dat file, but below is a descriptor found at 0x17B50 in the file which makes a good example. In the table below, we use offsets relative to the start of this descriptor.
The instance descriptors array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. The descriptors start at 0x40 in the .dat file, but below is a descriptor found at 0x017B50 in the file which makes a better example. In the table below, we use offsets relative to the start of this descriptor.


{{Table}}
{{Table}}
Line 64: Line 64:
|}
|}


This descriptor tells us that a resource of type SUBT (a subtitle file for Oni; there are only two in the game) has data that can be found 0x2230C8 bytes into the data table, which we learned from the file header starts at 0x03BCA0. Its name can be found 0xCB01 bytes into the name table that starts, according to the file header, at 0x28F240. The data is 0x09C0, or 2,496 bytes.
This descriptor tells us that a resource of type SUBT (a subtitle file for Oni; there are only two of these, one for speech subtitles and one for help messages) has data that can be found 0x2230C8 bytes into the data table, which we learned from the file header starts at 0x03BCA0. Its name can be found 0xCB01 bytes into the name table that starts, according to the file header, at 0x28F240. The data is 0x09C0, or 2,496, bytes long.


If you want to see the name of this resource, let's look at address 0xCB01 + 0x28F240 = 0x29BD41. There we find the string "SUBTsubtitles". The actual subtitle data should be found at the address 0x2230C8 + 0x03BCA0 = 0x25ED68. Let's go there now....
If you want to see the name of this resource, let's look at address 0xCB01 + 0x28F240 = 0x29BD41. There we find the string "SUBTsubtitles". The actual subtitle data should be found at the address 0x2230C8 + 0x03BCA0 = 0x25ED68. Let's go there now....


====Instance data====
====Instance data====
For some reason, the addresses we calculate from the descriptor data offsets are all off by eight bytes, so we need to subtract 8 from 0x25ED68 and go to 0x25ED60. Compare what you see here to the documentation for the [[SUBT]] type. Below is the data you should actually see for the English Oni SUBT file at this address:
For some reason, the addresses we calculate from the descriptor data offsets are all off by eight bytes, so we need to subtract 8 from 0x25ED68 and go to 0x25ED60. Compare what you see here to the documentation for the [[SUBT]] type. Below is the data you should actually see for the English Oni SUBT file at this address. Note that we still haven't found the actual subtitle data. The princess is in another castle:


{{Table}}
{{Table}}
Line 76: Line 76:
{{OBDtr| 0x04 | lev_id  |FFFF00| 01 00 00 00 | 0          | level 0 }}
{{OBDtr| 0x04 | lev_id  |FFFF00| 01 00 00 00 | 0          | level 0 }}
{{OBDtr| 0x08 | char[16] |00FF00| AD DE      | dead        | unused }}
{{OBDtr| 0x08 | char[16] |00FF00| AD DE      | dead        | unused }}
{{OBDtr| 0x18 | offset  |00FFFF| 80 44 44 01 | 0x01444480  | at this position starts the part in the raw file }}
{{OBDtr| 0x18 | offset  |00FFFF| 80 44 44 01 | 0x01444480  | raw file data address }}
{{OBDtr| 0x1C | int32    |FF00FF| 61 02 00 00 | 609        | array size }}
{{OBDtr| 0x1C | int32    |FF00FF| 61 02 00 00 | 609        | array size }}
|}
|}
Line 84: Line 84:
The second and third bytes of the second word are the '''level number''' where this resource is found.
The second and third bytes of the second word are the '''level number''' where this resource is found.


After a '''buffer''' of 16 unused bytes, we find the address of the actual data, but it's the position in the level's raw file. Open level0_Final.raw and jump to address 0x01444480, and you should see "01_01_01 Griffin: Give me another reading.", and the rest of some very familiar dialogue continuing from there.
After a '''buffer''' of 16 unused bytes, we find the address of the actual data: it's in the level's raw file. Open level0_Final.raw and jump to address 0x01444480, and you should see "01_01_01 Griffin: Give me another reading.", and the rest of some very familiar dialogue continuing from there.


The '''array size''' of 609 tells us that there are 609 subtitled lines of dialogue to be found in the raw file.
The '''array size''' of 609 tells us that there are 609 subtitled lines of dialogue to be found in the raw file as part of this resource.


===Name descriptors===
===Name descriptors===
The "name descriptors" array stores the numbers of all named instances in alphabetical order. This allows the engine to do a binary search to faster find instances by name. It is also used when finding instances by type.
The "name descriptors" array stores the numbers of all named instances in alphabetical order. This allows the engine to do a binary search to quickly find instances by name. It is also used when finding instances by type.


{{Table}}
{{Table}}