OBD:Instance file format
Oni's level data is broken into two kinds of files in Windows retail Oni. One type ends in ".dat" and is called an instance file. The other type ends in ".raw" and is simply called a raw file. Windows demo Oni and Mac retail/demo Oni use a third type which ends in ".sep", short for "separate". You can read about raw and separate files HERE.
Note that ".dat" is a generic suffix originally used by Oni for all kinds of data, including persist.dat. The only reason that any other suffixes exist at all is that raw and separate files were created later in development and given unique suffixes to distinguish them from the .dat files in the same folder. Therefore, the proper, specific name for the level data format, as opposed to the save-game format, film format, etc. is not ".dat file" or "DAT file", but "instance file". That being said, ".dat" has only been used by the community historically to refer to instance files, so you can reasonably assume that's what is meant when you see the suffix.
Instance files are the "main" type of data file in the sense that, when loading a level, Oni reads the instance file first, and this file serves as an index that allows it to find resources which are packed back-to-back into the raw and separate files. All instance files begin with a 64 byte header followed by 3 "descriptor" arrays, a data table and a name table. Among other things, the header contains the number of descriptors in each of the 3 arrays and the offset of the data and name tables (relative to the start of the file).
Here is a walkthrough of an instance file using the level0_Final.dat in English Windows Oni. Follow along in a hex editor for maximum learnage. Remember that this is small-endian data. First, here is how the file begins:
| Offset | Type | Raw Hex | Value | Description | 
|---|---|---|---|---|
| 0x00 | int64 | 1F 27 DC 33 DF BC 03 00 | 0x0003BCDF33DC271F | Windows template checksum; Windows demo and Mac retail/demo use 0x0003BCDF23C13061 instead | 
| 0x08 | int32 | 31 33 52 56 | 'VR31' | .dat version; .oni files use 'VR32' instead | 
| 0x0C | int64 | 40 00 14 00 10 00 08 00 | 0x0008001000140040 | signature | 
| 0x14 | int32 | 83 24 00 00 | 9347 | instance descriptor count | 
| 0x18 | int32 | D4 1B 00 00 | 7124 | name descriptor count | 
| 0x1C | int32 | 38 00 00 00 | 56 | template descriptor count | 
| 0x20 | int32 | A0 BC 03 00 | 0x03BCA0 | data table offset | 
| 0x24 | int32 | A0 35 25 00 | 0x2535A0 | data table size | 
| 0x28 | int32 | 40 F2 28 00 | 0x28F240 | name table offset | 
| 0x2C | int32 | 04 4F 02 00 | 0x024F04 | name table size | 
| 0x30 | int32 | 00 00 00 00 | OniSplit only: raw table offset | |
| 0x34 | int32 | 00 00 00 00 | OniSplit only: raw table size | |
| 0x38 | int32 | 00 00 00 00 | unused | |
| 0x3C | int32 | 00 00 00 00 | unused | 
The template checksum tells us that this level data is in the .dat/.raw file scheme, as opposed to the .dat/.raw/.sep file scheme.
The version of the instance file is the format version. This is written in ASCII characters, but you have to read it backwards because it's been written to disk in small-endian. Get used to this, because you'll be doing a lot of backwards-ASCII-reading. Thus we get "VR31", which is probably "version 31". This is the format version of all instance files in all releases of Oni.
The signature is identical in all instance files.
Next we are told the size of some arrays which are coming up soon: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.
The "instance descriptors" array stores information about every instance contained in the file.
| Offset | Type | Raw Hex | Value | Description | 
|---|---|---|---|---|
| 0x00 | tag | 41 53 49 41 | 'AISA' | template tag | 
| 0x04 | int32 | 08 00 00 00 | 0x0008 | data offset (relative to data table) | 
| 0x08 | int32 | 00 00 00 00 | 0x0000 | name offset (relative to name table) | 
| 0x0C | int32 | 80 01 00 00 | 0x0180 | data size | 
| 0x10 | int32 | 00 00 00 00 | 0 | flags; possible values: 
 | 
The "name descriptors" array stores the numbers of all named instances in alphabetically order. This allows the engine to do a binary search to faster find instances by name. It is also used when finding instances by type.
| Offset | Type | Raw Hex | Value | Description | 
|---|---|---|---|---|
| 0x00 | int32 | 00 00 00 00 | 0 | instance number | 
| 0x04 | int32 | 00 00 00 00 | 0 | runtime: pointer to instance name | 
The "template descriptor" array contains information about all templates used in the file. The template checksum is used to prevent loading of instance files that are not compatible with the current engine version.
| Offset | Type | Raw Hex | Value | Description | 
|---|---|---|---|---|
| 0x00 | int64 | A0 6D 12 00 00 00 00 00 | 0x126DA0 | template checksum | 
| 0x0C | tag | 41 4E 42 41 | 'ABNA' | template tag | 
| 0x08 | int32 | 01 00 00 00 | 1 | number of instances that use this template | 
The data table stores all the instance data. Instance ID is always stored 32 byte aligned (thus the instance specific data will always be found at an offset like 0x0008, 0x0028, 0x0148 etc.). Instance ID and file ID are not actually part of the instance data. The engine always has pointers to "instance specific data" and instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance give a pointer to it).
| Offset | Type | Raw Hex | Value | Description | 
|---|---|---|---|---|
| 0x00 | int32 | 01 00 00 00 | 1 | instance id | 
| 0x04 | int32 | 01 00 00 02 | 0x02000001 | file id | 
| 0x08 | instance specific data | 
The name table stores all the instance names as C style strings (terminated by 0).
| Offset | Type | Raw Hex | Value | Description | 
|---|---|---|---|---|
| 0x00 | string | 41 49 ... 00 | "AISAlevel1_scripts" | name string (0 terminated) | 
- Instance (.dat) file
An instance file is a dump of engine's in memory data structures. It is acompanied by a .raw file and a .sep file (the .sep file is only used and present in the PC Demo and Mac versions of the game) which stores additional data (usually large and unstructured like texture or sound data) needed by some instance types. OniSplit generated .oni files are PC .dat files with all the data contained by the .raw/.sep files appended at the end.
- Binary (.raw, .sep) file
Binary files do not have any file header. The only rule about binary files is that all data parts are stored 32 byte aligned and the first 32 byte of the file are always 0 (reserved to represent NULL pointers). Instances store file offsets into binary files and at loaded time the offsets are converted to pointers.
- Instance file name
An instance file name has the following structure:
levelN_T.dat
where N is the level number (from 0 to 127) and T is the type of file. Known types are "Final" and "Tools". The original exe only loads "Final" files.
- Level 0 file
Level 0 file does not acually contain a level but instances shared across all levels. It is loaded first when the game starts and never unloaded. All other level files are only loaded when the coresponding level starts and unloaded when it ends.
- Instance descriptors
There are 3 types of instance descriptors:
- unnamed - they are referenced by other instances in the same file and the engine never reaches them directly
- named and not empty - they can be referenced by other instances in any file and the engine can use their name or template tag to find them
- named and empty - the instance data is stored in a different file and they exist only to associate an instance id with a name; when an instance references such an instance id the engine searches all the loaded files for a non empty instance with the same name
- Instance ID
The ID of an instance is computed as:
(instance_descriptor_index << 8) | 1.
The 1 allows the engine to know which IDs have already been converted to pointers (a instance pointer will always be 8 byte aligned so it can never have the bit 0 set).
- File ID
The file ID is computed from the name of the instance file. For "_Final" files the file ID is computed as:
(level_number << 25) | 1
Again the 1 allows the engine to know which file IDs have already been converted to pointers.
- Templates
An instance can have pointers to other instances but since pointers are only valid in memory they are converted to instance identifiers when the file is saved and converted back to pointers when the file is loaded into memory. To be able to do this the engine must know where pointers are and this is done using "templates". A template contains:
- a checksum of the data contained by the template (the checksum algorithm is unknown)
- a 4-letter tag used to identify the template (ABNA, ONCC, WMDD etc.)
- a short description of the data structure like "BSP Tree Node Array"
- a list of all data structure's fields and their types
- other data that appears to be unused like size of the fixed part and size of an array element for data structures that contain variable length arrays
- Absolute limits
- Max level number: 127
- Max number of instance files in GameDataFolder: 512 (PC), 16 (PC Demo, Mac)
- Max number of simultaneously loaded instance files: 64
- Max number of instances in a file: 131071
- Max length of an instance file name: 31
- Max length of an instance name: 63 (including the 4 character template tag)