OBD:Instance file format: Difference between revisions

From OniGalore
(more explaining)
(more explaining)
Line 1: Line 1:
==Terminology==
==Terminology==
Oni's level data is broken into two kinds of files in Windows retail Oni. One type ends in ".dat" and is called an instance file. The other type ends in ".raw" and is simply called a raw file. Windows demo Oni and Mac retail/demo Oni use a third type which ends in ".sep", short for "separate". You can read about raw and separate files [[Raw|HERE]].
Oni's level data is broken into two kinds of files in Windows retail Oni. One type ends in ".dat" and is called an instance file. An "instance" is essentially a resource, in plain English, such as a texture. Before raw and separate files existed, all resources would have been stored in the levelX_Final.dat file, so it was rightfully called an "instance file". The second type of file ends in ".raw" and is simply called a raw file. Windows demo Oni and Mac retail/demo Oni use a third type which ends in ".sep", short for "separate". You can read about raw and separate files [[Raw|HERE]].


Note that ".dat" is a generic suffix originally used by Oni for all kinds of data, including [[persist.dat]]. The only reason that any other suffixes exist at all is that raw and separate files were created later in development and given unique suffixes to distinguish them from the .dat files in the same folder. Therefore, the proper, specific name for the <u>level data format</u>, as opposed to the save-game format, film format, etc. is not ".dat file" or "DAT file", but "instance file". That being said, ".dat" has only been used by the community historically to refer to instance files, so you can reasonably assume that's what is meant when you see the suffix.
Note that ".dat" is a generic suffix originally used by Oni for all kinds of data, including [[persist.dat]]. The only reason that any other suffixes exist at all is that raw and separate files were created later in development and given unique suffixes to distinguish them from the .dat files in the same folder. Therefore, the proper, specific name for the <u>level data format</u>, as opposed to the save-game format, film format, etc. is not ".dat file" or "DAT file", but "instance file". That being said, ".dat" has only been used by the community historically to refer to instance files, so you can reasonably assume that's what is meant when you see the suffix.
Line 7: Line 7:
Instance files are the "main" type of data file in the sense that, when loading a level, Oni reads the instance file first, and this file serves as an index that allows it to find resources which are packed back-to-back into the raw and separate files. All instance files begin with a 64 byte header followed by 3 "descriptor" arrays, a data table and a name table. Among other things, the header contains the number of descriptors in each of the 3 arrays and the offset of the data and name tables (relative to the start of the file).
Instance files are the "main" type of data file in the sense that, when loading a level, Oni reads the instance file first, and this file serves as an index that allows it to find resources which are packed back-to-back into the raw and separate files. All instance files begin with a 64 byte header followed by 3 "descriptor" arrays, a data table and a name table. Among other things, the header contains the number of descriptors in each of the 3 arrays and the offset of the data and name tables (relative to the start of the file).


During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to load levels into memory, and thus when when we read the data in the files with a hex editor, we can see various eccentricities such as blank space and garbage data that resulted from unused memory.
During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory, and thus when when we read the data in the files with a hex editor, we can see various eccentricities such as blank space and garbage data that represented unused memory on the development machine.


Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, every sequence of bytes was written "backwards" from the standpoint of a left-to-right reading culture. See the "VR31" string below for an example; it is actually saved to disk as "13RV". Likewise, every number is written from least-significant to most-significant byte. When Macs, which were big-endian at the time due to their PowerPC architecture, read these files, they then had to flip each sequence of bytes in memory before they could be understood.
Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte, which looks "backwards" from the standpoint of a culture that reads left-to-right. When Macs, which were big-endian at the time due to their PowerPC architecture, read these files, they then had to flip each sequence of bytes in memory before they could be understood.
 
An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, thus they are written left-to-right. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a number. For instance, writing the number 1,448,227,633 to disk in little-endian order results in the bytes 0x31, 0x33, 0x52, and 0x56, which happens to produce the ASCII codes for '1', '3', 'R' and 'V'. This provided a combination of more convenient storage in memory as a number, and human-readability when on disk.  


==Walkthrough==
==Walkthrough==
Line 34: Line 36:
The '''template checksum''' tells us that this level data is in the .dat/.raw file scheme, as opposed to the .dat/.raw/.sep file scheme.
The '''template checksum''' tells us that this level data is in the .dat/.raw file scheme, as opposed to the .dat/.raw/.sep file scheme.


The '''version''' of the instance file is the format version. This is written in ASCII characters, but you have to read it backwards because it's been written to disk in small-endian. Get used to this, because you'll be doing a lot of backwards-ASCII-reading. Thus we get "VR31", which is probably "version 31". This is the format version of all instance files in all releases of Oni.
The '''version''' of the instance file is the format version. Reading it backwards, as discussed under "Introduction", we get "VR31", which is probably "version 31". This is the format version of all instance files in all releases of Oni.


The '''signature''' is identical in all instance files.
The '''signature''' is identical in all instance files.
Line 42: Line 44:
Next we are told the addresses and sizes of the '''data and name tables''' in this file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, but that doesn't mean the name table offset is redundant; if its start was not 32-bit-aligned, it probably would be moved down to start at the next 32-bit word, but this is unnecessary because it just happens to fall on such an even number already.
Next we are told the addresses and sizes of the '''data and name tables''' in this file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, but that doesn't mean the name table offset is redundant; if its start was not 32-bit-aligned, it probably would be moved down to start at the next 32-bit word, but this is unnecessary because it just happens to fall on such an even number already.


After this comes four "int"s of '''zeroes'''. Empty space like this is common in the data files, and indicates that something stored in memory at this relative position was not written to disk (probably pointers).
After this comes four "int"s of '''zeroes'''. Empty space like this is common in the data files, and indicates that something stored in memory at this relative position was not written to disk (probably pointers, sometimes a buffer reserved for possible future use).


That concludes the header of the instance file. Immediately after this header, we find the instance descriptors, starting with....
That concludes the header of the instance file. Immediately after this header, we find the instance descriptors, starting with....


===Instance descriptors===
===Instance descriptors===
The "instance descriptors" array stores information about every instance contained in the file.
The "instance descriptors" array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. It starts at 0x40 in the .dat file, but below is a descriptor found at 0x17B50 in the file which makes a good example. In the table below, we use offsets relative to the start of this descriptor.


{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | tag    | | 41 53 49 41 | 'AISA'    | template tag }}
{{OBDtr| 0x00 | tag    | | 54 42 55 53 | 'SUBT'    | template tag }}
{{OBDtr| 0x04 | int32  | | 08 00 00 00 | 0x0008    | data offset (relative to data table) }}
{{OBDtr| 0x04 | int32  | | C8 30 22 00 | 0x2230C8  | data offset (relative to data table) }}
{{OBDtr| 0x08 | int32  | | 00 00 00 00 | 0x0000   | name offset (relative to name table) }}
{{OBDtr| 0x08 | int32  | | 01 CB 00 00 | 0xCB01   | name offset (relative to name table) }}
{{OBDtr| 0x0C | int32  | | 80 01 00 00 | 0x0180   | data size }}
{{OBDtr| 0x0C | int32  | | C0 09 00 00 | 0x09C0   | data size }}
{{OBDtr| 0x10 | int32  | | 00 00 00 00 | 0        | flags; possible values:
{{OBDtr| 0x10 | int32  | | 00 00 00 00 | 0        | flags; possible values:
:0x'''01''' 00 00 00 - unnamed
:0x'''01''' 00 00 00 - unnamed
:0x'''02''' 00 00 00 - empty  
:0x'''02''' 00 00 00 - empty
:0x'''04''' 00 00 00 - never used; appears to mean "big endian" data  
:0x'''04''' 00 00 00 - never used; appears to mean "big-endian" data
:0x'''08''' 00 00 00 - shared }}
:0x'''08''' 00 00 00 - shared }}
|}
|}


This descriptor tells us that a resource of type SUBT (a subtitle file for Oni; there are only two in the game) has data that can be found 0x2230C8 bytes into the data table, which we learned from the file header starts at 0x03BCA0. Its name can be found 0xCB01 bytes into the name table that starts, according to the file header, at 0x28F240. The data is 0x09C0, or 2,496 bytes.


If you want to see the name of this resource, let's look at address 0xCB01 + 0x28F240 = 0x29BD41. There we find the string "SUBTsubtitles".


The "name descriptors" array stores the numbers of all named instances in alphabetically order. This allows the engine to do a binary search to faster find instances by name. It is also used when finding instances by type.
===Name descriptors===
The "name descriptors" array stores the numbers of all named instances in alphabetical order. This allows the engine to do a binary search to faster find instances by name. It is also used when finding instances by type.


{{Table}}
{{Table}}
Line 71: Line 76:
{{OBDtr| 0x04 | int32  | | 00 00 00 00 | 0    | runtime: pointer to instance name }}
{{OBDtr| 0x04 | int32  | | 00 00 00 00 | 0    | runtime: pointer to instance name }}
|}
|}


The "template descriptor" array contains information about all templates used in the file. The template checksum is used to prevent loading of instance files that are not compatible with the current engine version.
The "template descriptor" array contains information about all templates used in the file. The template checksum is used to prevent loading of instance files that are not compatible with the current engine version.

Revision as of 21:10, 16 July 2014

Terminology

Oni's level data is broken into two kinds of files in Windows retail Oni. One type ends in ".dat" and is called an instance file. An "instance" is essentially a resource, in plain English, such as a texture. Before raw and separate files existed, all resources would have been stored in the levelX_Final.dat file, so it was rightfully called an "instance file". The second type of file ends in ".raw" and is simply called a raw file. Windows demo Oni and Mac retail/demo Oni use a third type which ends in ".sep", short for "separate". You can read about raw and separate files HERE.

Note that ".dat" is a generic suffix originally used by Oni for all kinds of data, including persist.dat. The only reason that any other suffixes exist at all is that raw and separate files were created later in development and given unique suffixes to distinguish them from the .dat files in the same folder. Therefore, the proper, specific name for the level data format, as opposed to the save-game format, film format, etc. is not ".dat file" or "DAT file", but "instance file". That being said, ".dat" has only been used by the community historically to refer to instance files, so you can reasonably assume that's what is meant when you see the suffix.

Introduction

Instance files are the "main" type of data file in the sense that, when loading a level, Oni reads the instance file first, and this file serves as an index that allows it to find resources which are packed back-to-back into the raw and separate files. All instance files begin with a 64 byte header followed by 3 "descriptor" arrays, a data table and a name table. Among other things, the header contains the number of descriptors in each of the 3 arrays and the offset of the data and name tables (relative to the start of the file).

During development, Oni had in-game editing tools. These tools presented a GUI for things like placing AIs and setting their attributes, editing particles, etc. When a developer saved his work, the contents of the level, stored in RAM, were written directly to disk. The structure of the .dat/.raw/.sep files reflects the way in which Bungie West chose to store levels in memory, and thus when when we read the data in the files with a hex editor, we can see various eccentricities such as blank space and garbage data that represented unused memory on the development machine.

Additionally, because the levels were built on Intel-based machines, which use a little-endian architecture, sequences of bytes which represent numbers were written from least-significant to most-significant byte, which looks "backwards" from the standpoint of a culture that reads left-to-right. When Macs, which were big-endian at the time due to their PowerPC architecture, read these files, they then had to flip each sequence of bytes in memory before they could be understood.

An exception to this backwards-writing rule is when strings of ASCII characters were written to disk. These are not numbers and thus are not subject to endianness, thus they are written left-to-right. Now, this may not seem to be the case as you continue reading below. The first two strings of characters which you'll see are "13RV" and "TBUS", which are meant to be read "VR31" and "SUBT". The reason these four-character strings are backwards is that Oni stored them as a number. For instance, writing the number 1,448,227,633 to disk in little-endian order results in the bytes 0x31, 0x33, 0x52, and 0x56, which happens to produce the ASCII codes for '1', '3', 'R' and 'V'. This provided a combination of more convenient storage in memory as a number, and human-readability when on disk.

Walkthrough

Header

Here is a walkthrough of an instance file using the level0_Final.dat in English Windows Oni. Follow along in a hex editor for maximum learnage. Each term will be explained in-depth when we fully consider the related data. First, here is how the file begins:

Offset Type Raw Hex Value Description
0x00 int64 1F 27 DC 33 DF BC 03 00 0x0003BCDF33DC271F Windows template checksum; Windows demo and Mac retail/demo use 0x0003BCDF23C13061 instead
0x08 int32 31 33 52 56 'VR31' .dat version; .oni files use 'VR32' instead
0x0C int64 40 00 14 00 10 00 08 00 0x0008001000140040 signature
0x14 int32 83 24 00 00 9347 instance descriptor count
0x18 int32 D4 1B 00 00 7124 name descriptor count
0x1C int32 38 00 00 00 56 template descriptor count
0x20 int32 A0 BC 03 00 0x03BCA0 data table offset
0x24 int32 A0 35 25 00 0x2535A0 data table size
0x28 int32 40 F2 28 00 0x28F240 name table offset
0x2C int32 04 4F 02 00 0x024F04 name table size
0x30 int32 00 00 00 00 used by OniSplit only: raw table offset
0x34 int32 00 00 00 00 used by OniSplit only: raw table size
0x38 int32 00 00 00 00 unused
0x3C int32 00 00 00 00 unused

The template checksum tells us that this level data is in the .dat/.raw file scheme, as opposed to the .dat/.raw/.sep file scheme.

The version of the instance file is the format version. Reading it backwards, as discussed under "Introduction", we get "VR31", which is probably "version 31". This is the format version of all instance files in all releases of Oni.

The signature is identical in all instance files.

The descriptor counts are the sizes of some arrays which are coming up soon: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.

Next we are told the addresses and sizes of the data and name tables in this file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, but that doesn't mean the name table offset is redundant; if its start was not 32-bit-aligned, it probably would be moved down to start at the next 32-bit word, but this is unnecessary because it just happens to fall on such an even number already.

After this comes four "int"s of zeroes. Empty space like this is common in the data files, and indicates that something stored in memory at this relative position was not written to disk (probably pointers, sometimes a buffer reserved for possible future use).

That concludes the header of the instance file. Immediately after this header, we find the instance descriptors, starting with....

Instance descriptors

The "instance descriptors" array tells Oni where to find the data and the name of every instance (resource) indexed by the .dat file. It starts at 0x40 in the .dat file, but below is a descriptor found at 0x17B50 in the file which makes a good example. In the table below, we use offsets relative to the start of this descriptor.

Offset Type Raw Hex Value Description
0x00 tag 54 42 55 53 'SUBT' template tag
0x04 int32 C8 30 22 00 0x2230C8 data offset (relative to data table)
0x08 int32 01 CB 00 00 0xCB01 name offset (relative to name table)
0x0C int32 C0 09 00 00 0x09C0 data size
0x10 int32 00 00 00 00 0 flags; possible values:
0x01 00 00 00 - unnamed
0x02 00 00 00 - empty
0x04 00 00 00 - never used; appears to mean "big-endian" data
0x08 00 00 00 - shared

This descriptor tells us that a resource of type SUBT (a subtitle file for Oni; there are only two in the game) has data that can be found 0x2230C8 bytes into the data table, which we learned from the file header starts at 0x03BCA0. Its name can be found 0xCB01 bytes into the name table that starts, according to the file header, at 0x28F240. The data is 0x09C0, or 2,496 bytes.

If you want to see the name of this resource, let's look at address 0xCB01 + 0x28F240 = 0x29BD41. There we find the string "SUBTsubtitles".

Name descriptors

The "name descriptors" array stores the numbers of all named instances in alphabetical order. This allows the engine to do a binary search to faster find instances by name. It is also used when finding instances by type.

Offset Type Raw Hex Value Description
0x00 int32 00 00 00 00 0 instance number
0x04 int32 00 00 00 00 0 runtime: pointer to instance name

The "template descriptor" array contains information about all templates used in the file. The template checksum is used to prevent loading of instance files that are not compatible with the current engine version.

Offset Type Raw Hex Value Description
0x00 int64 A0 6D 12 00 00 00 00 00 0x126DA0 template checksum
0x0C tag 41 4E 42 41 'ABNA' template tag
0x08 int32 01 00 00 00 1 number of instances that use this template


The data table stores all the instance data. Instance ID is always stored 32 byte aligned (thus the instance specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.). Instance ID and file ID are not actually part of the instance data. The engine always has pointers to "instance specific data" and instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance give a pointer to it).

Offset Type Raw Hex Value Description
0x00 int32 01 00 00 00 1 instance id
0x04 int32 01 00 00 02 0x02000001 file id
0x08 instance specific data


The name table stores all the instance names as C style strings (terminated by 0).

Offset Type Raw Hex Value Description
0x00 string 41 49 ... 00 "AISAlevel1_scripts" name string (0 terminated)


Instance (.dat) file

An instance file is a dump of engine's in memory data structures. It is acompanied by a .raw file and a .sep file (the .sep file is only used and present in the PC Demo and Mac versions of the game) which stores additional data (usually large and unstructured like texture or sound data) needed by some instance types. OniSplit generated .oni files are PC .dat files with all the data contained by the .raw/.sep files appended at the end.


Binary (.raw, .sep) file

Binary files do not have any file header. The only rule about binary files is that all data parts are stored 32 byte aligned and the first 32 byte of the file are always 0 (reserved to represent NULL pointers). Instances store file offsets into binary files and at loaded time the offsets are converted to pointers.


Instance file name

An instance file name has the following structure:

levelN_T.dat

where N is the level number (from 0 to 127) and T is the type of file. Known types are "Final" and "Tools". The original exe only loads "Final" files.


Level 0 file

Level 0 file does not acually contain a level but instances shared across all levels. It is loaded first when the game starts and never unloaded. All other level files are only loaded when the coresponding level starts and unloaded when it ends.


Instance descriptors

There are 3 types of instance descriptors:

  • unnamed - they are referenced by other instances in the same file and the engine never reaches them directly
  • named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them
  • named and empty - the instance data is stored in a different file and they exist only to associate an instance id with a name; when an instance references such an instance id the engine searches all the loaded files for a non empty instance with the same name


Instance ID

The ID of an instance is computed as:

(instance_descriptor_index << 8) | 1. 

The 1 allows the engine to know which IDs have already been converted to pointers (a instance pointer will always be 8 byte aligned so it can never have the bit 0 set).


File ID

The file ID is computed from the name of the instance file. For "_Final" files the file ID is computed as:

(level_number << 25) | 1

Again the 1 allows the engine to know which file IDs have already been converted to pointers.


Templates

An instance can have pointers to other instances but since pointers are only valid in memory they are converted to instance identifiers when the file is saved and converted back to pointers when the file is loaded into memory. To be able to do this the engine must know where pointers are and this is done using "templates". A template contains:

  • a checksum of the data contained by the template (the checksum algorithm is unknown)
  • a 4-letter tag used to identify the template (ABNA, ONCC, WMDD etc.)
  • a short description of the data structure like "BSP Tree Node Array"
  • a list of all data structure's fields and their types
  • other data that appears to be unused like size of the fixed part and size of an array element for data structures that contain variable length arrays


Absolute limits
  • Max level number: 127
  • Max number of instance files in GameDataFolder: 512 (PC), 16 (PC Demo, Mac)
  • Max number of simultaneously loaded instance files: 64
  • Max number of instances in a file: 131071
  • Max length of an instance file name: 31
  • Max length of an instance name: 63 (including the 4 character template tag)