OBD:Instance file format: Difference between revisions

the engine refers to these sections as blocks, not tables, and I think that term is more accurate
(more clarity on the use of name descriptors, including that binary search is not actually taking place)
(the engine refers to these sections as blocks, not tables, and I think that term is more accurate)
 
Line 39: Line 39:
{{OBDtr| 0x18 | uint32  | | D4 1B 00 00 | 7124      | name descriptor count }}
{{OBDtr| 0x18 | uint32  | | D4 1B 00 00 | 7124      | name descriptor count }}
{{OBDtr| 0x1C | uint32  | | 38 00 00 00 | 56        | template descriptor count }}
{{OBDtr| 0x1C | uint32  | | 38 00 00 00 | 56        | template descriptor count }}
{{OBDtr| 0x20 | uint32  | | A0 BC 03 00 | 0x03BCA0  | data table offset }}
{{OBDtr| 0x20 | uint32  | | A0 BC 03 00 | 0x03BCA0  | data block offset }}
{{OBDtr| 0x24 | uint32  | | A0 35 25 00 | 2438560  | data table size }}
{{OBDtr| 0x24 | uint32  | | A0 35 25 00 | 2438560  | data block size }}
{{OBDtr| 0x28 | uint32  | | 40 F2 28 00 | 0x28F240  | name table offset }}
{{OBDtr| 0x28 | uint32  | | 40 F2 28 00 | 0x28F240  | name block offset }}
{{OBDtr| 0x2C | uint32  | | 04 4F 02 00 | 151300    | name table size }}
{{OBDtr| 0x2C | uint32  | | 04 4F 02 00 | 151300    | name block size }}
{{OBDtr| 0x30 | uint32  | | 99 CF 40 00 | (garbage) | used by OniSplit for raw table offset }}
{{OBDtr| 0x30 | uint32  | | 99 CF 40 00 | (garbage) | used by OniSplit for raw table offset }}
{{OBDtr| 0x34 | uint32  | | 90 4F 63 00 | (garbage) | used by OniSplit for raw table size }}
{{OBDtr| 0x34 | uint32  | | 90 4F 63 00 | (garbage) | used by OniSplit for raw table size }}
Line 57: Line 57:
The '''descriptor counts''' are the sizes of arrays which are coming up in this file: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.
The '''descriptor counts''' are the sizes of arrays which are coming up in this file: the instance, name and template descriptors. For instance, the size of the instance descriptor array will be 0x2483, or 9,347 items, in length.


Next we are told the addresses and sizes of the '''data and name tables''' in the instance file. The name table simply follows the data table, as you'll see if you add the data table offset plus the data table size, so the name table offset is technically redundant. The name table offset plus the name table size equals the total size of the file since it's the last segment of the file.
Next we are told the addresses and sizes of the '''data and name tables''' in the instance file. The name block simply follows the data block, as you'll see if you add the data block offset plus the data block size, so the name block offset is technically redundant. The name block offset plus the name block size equals the total size of the file since it's the last segment of the file.


After the name table's size comes four "int"s of '''garbage'''; this is padding in order to align the start of the next segment of the file on a 32-byte boundary. The first two 32-bit fields in this space are, however, used in .oni files generated by OniSplit<!--, and the last 32-bit field is partly used by OniX for a new form of template versioning. Future usage of these fields by OniSplit and/or OniX may change (hopefully not too much)-->.
After the name block's size comes four "int"s of '''garbage'''; this is padding in order to align the start of the next segment of the file on a 32-byte boundary. The first two 32-bit fields in this space are, however, used in .oni files generated by OniSplit<!--, and the last 32-bit field is partly used by OniX for a new form of template versioning. Future usage of these fields by OniSplit and/or OniX may change (hopefully not too much)-->.


That concludes the header of the instance file. Immediately after this header we find the instance descriptors array.
That concludes the header of the instance file. Immediately after this header we find the instance descriptors array.
Line 84: Line 84:
| C8 30 22 00
| C8 30 22 00
| 0x2230C8
| 0x2230C8
|align=left | data offset (relative to data table)
|align=left | data offset (relative to data block)
|- align=center
|- align=center
| 0x08
| 0x08
Line 90: Line 90:
| 01 CB 00 00
| 01 CB 00 00
| 0xCB01
| 0xCB01
|align=left | name offset (relative to name table)
|align=left | name offset (relative to name block)
|- align=center
|- align=center
| 0x0C
| 0x0C
Line 130: Line 130:
| E8 37 18 00
| E8 37 18 00
| 0x1837E8
| 0x1837E8
|align=left | data offset (relative to data table)
|align=left | data offset (relative to data block)
|- align=center
|- align=center
| 0x10
| 0x10
Line 136: Line 136:
| 4E C5 00 00
| 4E C5 00 00
| 0xC54E
| 0xC54E
|align=left | name offset (relative to name table)
|align=left | name offset (relative to name block)
|- align=center
|- align=center
| 0x14
| 0x14
Line 157: Line 157:
|}
|}
{{Divhide|end}}
{{Divhide|end}}
The retail version of this instance descriptor tells us that a resource of '''type''' SUBT (a subtitle file for Oni; there are only two of these, one containing all speech subtitles, and one for help messages) has '''data''' that can be found 0x2230C8 bytes into the data table, which we learned from the file header starts at 0x03BCA0. Its '''name''' can be found 0xCB01 bytes into the name table that starts, according to the file header, at 0x28F240.
The retail version of this instance descriptor tells us that a resource of '''type''' SUBT (a subtitle file for Oni; there are only two of these, one containing all speech subtitles, and one for help messages) has '''data''' that can be found 0x2230C8 bytes into the data block, which we learned from the file header starts at 0x03BCA0. Its '''name''' can be found 0xCB01 bytes into the name block that starts, according to the file header, at 0x28F240.


The data's '''size''' is given as 0x09C0, or 2,496 bytes, but it's important to clarify that this is the total size of the data counting from the resource header to the next 32-byte boundary after the end of this instance's actual data; in other words it is the true total of the space occupied on disk by this instance. This is interesting because the data offset leads you to the start of the instance-specific data which begins 8 bytes after the resource header, so if you erroneously add the data size to the data offset to find the end of the instance data then you will find yourself 8 bytes into the next instance.
The data's '''size''' is given as 0x09C0, or 2,496 bytes, but it's important to clarify that this is the total size of the data counting from the resource header to the next 32-byte boundary after the end of this instance's actual data; in other words it is the true total of the space occupied on disk by this instance. This is interesting because the data offset leads you to the start of the instance-specific data which begins 8 bytes after the resource header, so if you erroneously add the data size to the data offset to find the end of the instance data then you will find yourself 8 bytes into the next instance.
Line 194: Line 194:


===Peeking ahead at instance name===
===Peeking ahead at instance name===
Before we talk about the name table in depth, we can peek ahead at the name of this resource using the offset we've just been given. Let's add the offset 0xCB01 to 0x28F240, the file header's address for the name table. This gives us the address 0x29BD41. There we find the string "SUBTsubtitles".
Before we talk about the name block in depth, we can peek ahead at the name of this resource using the offset we've just been given. Let's add the offset 0xCB01 to 0x28F240, the file header's address for the name block. This gives us the address 0x29BD41. There we find the string "SUBTsubtitles".


===Peeking ahead at instance data===
===Peeking ahead at instance data===
The actual subtitle data should be found by adding the offset 0x2230C8 to 0x03BCA0, the file header's address for the data table, to get 0x25ED68. We're going to leave the full details of the data table for later, but below is the data you should actually see for the English Oni SUBT file at this address. You have to consult the [[SUBT]] page to know how to read this data.
The actual subtitle data should be found by adding the offset 0x2230C8 to 0x03BCA0, the file header's address for the data block, to get 0x25ED68. We're going to leave the full details of the data block for later, but below is the data you should actually see for the English Oni SUBT file at this address. You have to consult the [[SUBT]] page to know how to read this data.


{{Table}}
{{Table}}
Line 213: Line 213:
The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In this case, that means 20 * 9347 = 186940, or 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.
The name descriptor array starts immediately after the instance descriptors array. To find the end of the instance descriptors, we can simply take the size of an instance descriptor, 20 bytes, and multiply it by the number of instance descriptors in the file header. In this case, that means 20 * 9347 = 186940, or 0x02DA3C. Adding that to 0x40 (the start of the instance descriptors) takes us to address 0x02DA7C. Voila, the start of the name descriptors.


The name descriptor array stores the numbers of all named instances in the alphabetical order by said names, which are found in the name table but also pointed to by these entries. This array is used by the engine to look up instances by name; it's also used to find instances by template (scanning just the tag at the start of each name). The purpose of this array being alphabetized was to allow the engine to do a binary search to find instances by name more quickly, but the retail engine no longer attempts a binary search and merely iterates over the array from start to end.
The name descriptor array stores the numbers of all named instances in the alphabetical order by said names, which are found in the name block but also pointed to by these entries. This array is used by the engine to look up instances by name; it's also used to find instances by template (scanning just the tag at the start of each name). The purpose of this array being alphabetized was to allow the engine to do a binary search to find instances by name more quickly, but the retail engine no longer attempts a binary search and merely iterates over the array from start to end.


{{Table}}
{{Table}}
Line 221: Line 221:
|}
|}


The index number here is referring to the instance's position in the instance descriptor array. This number is also used by the data table to identify each instance, thus it is found in two places in the data explicitly and one place implicitly.
The index number here is referring to the instance's position in the instance descriptor array. This number is also used by the data block to identify each instance, thus it is found in two places in the data explicitly and one place implicitly.


Since the addresses of the names in memory cannot be known until the file is loaded into RAM, a space of 32 bits is reserved for each pointer at runtime.
Since the addresses of the names in memory cannot be known until the file is loaded into RAM, a space of 32 bits is reserved for each pointer at runtime.
Line 245: Line 245:
Incidentally, the templates in Oni's code have not just the familiar four-character tags attached to them, but also a descriptive string, e.g. "BSP Tree Node Array". These strings were typed into the source code where each template structure was defined, and eventually extracted from the binary by modders. This is the source of the names on [[OBD:File types]].
Incidentally, the templates in Oni's code have not just the familiar four-character tags attached to them, but also a descriptive string, e.g. "BSP Tree Node Array". These strings were typed into the source code where each template structure was defined, and eventually extracted from the binary by modders. This is the source of the names on [[OBD:File types]].


==Data table==
==Data block==
The data table occupies the majority of the file and stores all the instance data (though this data sometimes points to the location of more data in a raw/separate file). We peeked at this table before when we looked at the instance descriptor for SUBTsubtitles. The table's starting point is found at the offset given in the header, in this case 0x03BCA0, saving us the trouble of adding up the size of the four preceding segments of the file and then aligning to the next 32-byte boundary.
The data block occupies the majority of the file and stores all the instance data (though this data sometimes points to the location of more data in a raw/separate file). We peeked at this table before when we looked at the instance descriptor for SUBTsubtitles. The table's starting point is found at the offset given in the header, in this case 0x03BCA0, saving us the trouble of adding up the size of the four preceding segments of the file and then aligning to the next 32-byte boundary.


The reason we'd need to align to 32 bytes is that the start of each instance's record (the instance ID) is always 32 byte-aligned. Thus, even though the template descriptors ended at 0x03BC9C, there are four empty bytes here so that the data table can begin at 0x03BCA0, which divides evenly by 32. This alignment rule also means that the instance-specific data will always start at an offset like 0x0008, 0x0028, 0x0148, etc.  
The reason we'd need to align to 32 bytes is that the start of each instance's record (the instance ID) is always 32 byte-aligned. Thus, even though the template descriptors ended at 0x03BC9C, there are four empty bytes here so that the data block can begin at 0x03BCA0, which divides evenly by 32. This alignment rule also means that the instance-specific data will always start at an offset like 0x0008, 0x0028, 0x0148, etc.  


The instance ID and file ID are not actually part of the instance data but are considered to be the resource header. The engine always keeps pointers to the start of the type-specific data itself; we saw this before when we jumped to 0x25ED68 and saw the data for the SUBT rather than the header for this data. The instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance, given a pointer to it).
The instance ID and file ID are not actually part of the instance data but are considered to be the resource header. The engine always keeps pointers to the start of the type-specific data itself; we saw this before when we jumped to 0x25ED68 and saw the data for the SUBT rather than the header for this data. The instance ID and file ID are accessed using negative offsets when needed (usually to find the name or template tag of an instance, given a pointer to it).
Line 268: Line 268:


===Looking backward from data to instance===
===Looking backward from data to instance===
By the way, if you pick a random place in the data table to look at with a hex editor, how do you know which resource you're looking at? You would look for the highest data offset in the instance descriptor array that is less than your position in the file. Let's say that the string at 0x3BD40 caught our eye: "powerup_ammo". Subtracting the start of the data table, 0x3BCA0, gives us 0xA0 as the position of this string. Now looking back at the instance descriptor array, the instances' data offsets occur every 20 bytes and come directly after the tags. We can see that the first data offset is 0x8 and the next one is 0xF68, thus our offset into the data table of 0xA0 means we are looking at the instance which starts at 0x8. It's the very first instance listed at the start of the instance descriptor array:
By the way, if you pick a random place in the data block to look at with a hex editor, how do you know which resource you're looking at? You would look for the highest data offset in the instance descriptor array that is less than your position in the file. Let's say that the string at 0x3BD40 caught our eye: "powerup_ammo". Subtracting the start of the data block, 0x3BCA0, gives us 0xA0 as the position of this string. Now looking back at the instance descriptor array, the instances' data offsets occur every 20 bytes and come directly after the tags. We can see that the first data offset is 0x8 and the next one is 0xF68, thus our offset into the data block of 0xA0 means we are looking at the instance which starts at 0x8. It's the very first instance listed at the start of the instance descriptor array:


{{Table}}
{{Table}}
{{OBD_Table_Header}}
{{OBD_Table_Header}}
{{OBDtr| 0x00 | tag    | | 53 47 4E 4F | 'ONGS'  | template tag }}
{{OBDtr| 0x00 | tag    | | 53 47 4E 4F | 'ONGS'  | template tag }}
{{OBDtr| 0x04 | int32  | | 08 00 00 00 | 0x08    | data offset (relative to data table) }}
{{OBDtr| 0x04 | int32  | | 08 00 00 00 | 0x08    | data offset (relative to data block) }}
{{OBDtr| 0x08 | int32  | | 00 00 00 00 | 0x00    | name offset (relative to name table) }}
{{OBDtr| 0x08 | int32  | | 00 00 00 00 | 0x00    | name offset (relative to name block) }}
{{OBDtr| 0x0C | int32  | | 60 0F 00 00 | 3936    | data size }}
{{OBDtr| 0x0C | int32  | | 60 0F 00 00 | 3936    | data size }}
{{OBDtr| 0x10 | int32  | | 00 00 00 00 | 0      | flags }}
{{OBDtr| 0x10 | int32  | | 00 00 00 00 | 0      | flags }}
|}
|}


So this tells us that the first data in the data table belongs to the solitary [[ONGS]] resource, and that it extends for 3,936 bytes. Since its name offset is 0x0, it's the first string in the name table, which we can see below is SUBTsubtitles.
So this tells us that the first data in the data block belongs to the solitary [[ONGS]] resource, and that it extends for 3,936 bytes. Since its name offset is 0x0, it's the first string in the name block, which we can see below is SUBTsubtitles.


==Name table==
==Name block==
This final segment of the file stores all the instance names as C-style ASCII strings (terminated by a zero byte). We peeked at this before when we looked at the instance descriptor for SUBTsubtitles. The start of this table is 32-byte aligned but after that the strings are simply packed end to end, separated only by their null terminator. As with the data table, the name table's starting point is given in the header, in this case 0x28F240.
This final segment of the file stores all the instance names as C-style ASCII strings (terminated by a zero byte). We peeked at this before when we looked at the instance descriptor for SUBTsubtitles. The start of this table is 32-byte aligned but after that the strings are simply packed end to end, separated only by their null terminator. As with the data block, the name block's starting point is given in the header, in this case 0x28F240.


{{Table}}
{{Table}}
Line 289: Line 289:
|}
|}


These names can be up to 63 characters long, counting the tag. The instance file concludes with the end of the name table.
These names can be up to 63 characters long, counting the tag. The instance file concludes with the end of the name block.


{{OBD}}
{{OBD}}