OBD:SNDD/wav

From OniGalore
Jump to navigation Jump to search
.WAV files          .AIF files

WAVE is a set of formats for storing waveform data in a RIFF container. Oni adopted a small subset of WAVE for its PC versions.

  • The simplest and most widespread format is linear PCM (format ID 1), where waveform samples are simply stored one after the other, in chronological order, as integer values proportional to the amplitude of the waveform. The most common bit depth is 16 bits per sample, with sample values in the range [-32768:32767] (signed), stored in Little Endian order. In the case of stereo, Left and Right samples are interleaved.
  • Another widespread format is MS ADPCM (format ID2), where samples are stored as 4-bit nibbles packed into large blocks. Within each block, each sample of the waveform is inferred from the previous sample through a prediction-correction algorithm, initialized with two samples explicitly stored as 16-bit values at the start of the block (more details below). The compressed data is almost 4 times more compact than PCM.

There are hundreds of other formats (including MP3, Vorbis, Dolby, FLAC and lots of legacy codecs that no one cares about), but (L)PCM and (MS)ADPCM are the most widespread and useful.

Vanilla Oni data exclusively uses MS ADPCM, but some mods have successfully used (L)PCM, which is why we are documenting it too. Also, since PCM is significantly simpler, we describe it first.

PCM

Below is the beginning of a standard-compliant WAVE file with stereo PCM data (decoded from SNDDmus_ot6.aif).

0x0000:  52 49 46 46 20 50 09 00 57 41 56 45 66 6D 74 20  RIFF P°°WAVEfmt
0x0010:  10 00 00 00 01 00 02 00 22 56 00 00 88 58 01 00  °°°°°°°°"V°°ˆX°°
0x0020:  04 00 10 00 64 61 74 61 00 50 09 00 F5 FF F5 FF  °°°°data°P°°õÿõÿ
(The contents of the "fmt " header, relevant to SNDD storage, has been highlighted in bold italic.)
Offset Type Raw Hex Value Description
0x00 char[4] 52 49 46 46 RIFF identifier for the "IBM/Microsoft RIFF" standard
0x04 uint32 20 50 09 00 610336 size of the RIFF container from 0x08 to end of file
0x08 char[4] 57 41 56 45 WAVE identifier for the "WAVE" format
0x0C char[4] 66 6D 74 20 "fmt " identifier announcing the following "fmt " (format) section
0x10 uint32 10 00 00 00 16 content size for the "fmt " section, in bytes (always 16 for PCM)
0x14 uint16 01 00 1 format ID (1 = linear PCM format)
0x16 uint16 02 00 2 number of channels (2 = stereo)
0x18 uint32 22 56 00 00 22050 sample rate in Hz (samples per second), a.k.a. "sampling frequency"
0x1C uint32 88 58 01 00 88200 data rate (= "sample rate" * "block alignment"), in bytes per second
N.B. For PCM, there is one block per sample, hence the simple formula.
0x20 uint16 04 00 4 block alignment a.k.a "block size", in bytes
N.B. The block size is trivially 2 bytes for PCM mono (one 16-bit sample) and 4 bytes for PCM stereo (Left and Right 16-bit samples).
0x22 uint16 10 00 16 bits per sample (per channel); typically 16 bits for PCM, although other bit depths are possible
0x24 char[4] 64 61 74 61 data identifier announcing the following "data" section
0x28 uint32 00 50 09 00 610304 content size for the "data" section, in bytes (implies 152576 stereo sample blocks, 4 bytes each)
0x2C block[4] F5 FF F5 FF (-11,-11) first stereo sample; the left and the right sample values are both -11=0xFFF5
Mono vs stereo
For a mono sound, the layout would be the same, except for the channel count (1 instead of 2), block alignment (2 instead of 4), data rate (double of the sample rate instead of 4x larger) and different data sizes.
Sample rate
Standard WAVE PCM supports completely arbitrary sample rates. Besides 22050 and 44100, common ones are 8000, 11025, 48000 and 88200. See HERE for more.
Bit depth
16-bit sample depth provides satisfactory signal-to-noise ratio in most situations. Low-resolution 8-bit samples are sometimes used, and higher resolution waveforms can have 24-bit or 32-bit samples. See HERE for more.
Data size
The size at 0x28 is the size of the raw data that starts at 0x2C and exactly corresponds to the .raw data of an SNDD instance.

MS ADPCM

Below is the beginning of a standard-compliant WAVE file with stereo MS ADPCM data (adapted from SNDDalarm_loop.aif).

0x0000:  52 49 46 46 4E 0C 01 00 57 41 56 45 66 6D 74 20  RIFF P°°WAVEfmt
0x0010:  32 00 00 00 02 00 02 00 22 56 00 00 27 57 00 00  2°°°°°°°"V°°'W°°
0x0020:  00 04 04 00 20 00 F4 03 07 00 00 01 00 00 00 02  °°°° °ô°°°°°°°°
0x0030:  00 FF 00 00 00 00 C0 00 40 00 F0 00 00 00 CC 01  °ÿ°°°°À°@°ð°°°Ì°
0x0040:  30 FF 88 01 18 FF 66 61 63 74 04 00 00 00 8A 05  0ÿˆ°°ÿfact°°°°Š°
0x0050:  01 00 64 61 74 61 00 0C 01 00 05 05 10 00 10 00  °°data°°°°°°°°°°
0x0060:  0C 00 AF FF 2E 00 B4 FF F0 F1 00 00 0F 30 20 10  °°¯ÿ.°´ÿðñ°°°0°°
(The contents of the "fmt " header, relevant to SNDD storage, has been highlighted in bold italic.)
Offset Type Raw Hex Value Description
0x00 char[4] 52 49 46 46 RIFF identifier for the "IBM/Microsoft RIFF" standard
0x04 uint32 4E 0C 01 00 68686 size of the RIFF container from 0x08 to end of file
0x08 char[4] 57 41 56 45 WAVE identifier for the "WAVE" format
0x0C char[4] 66 6D 74 20 "fmt " identifier announcing the following "fmt " (format) section
0x10 uint32 32 00 00 00 50 content size for the "fmt " section, in bytes (typically 50 for MS ADPCM)
0x14 uint16 02 00 2 format ID (2 = MS ADPCM format)
0x16 uint16 02 00 2 number of channels (2 = stereo)
0x18 uint32 22 56 00 00 22050 sample rate in Hz (samples per second), a.k.a. "sampling frequency"
0x1C uint32 27 57 00 00 22311
N.B. The data rate (in bytes per second) is truncated to the lower integer value.
0x20 uint16 00 04 1024 block alignment a.k.a "block size", in bytes
N.B. The block size for MS ADPCM is typically a power of two.
0x22 uint16 04 00 4 bits per sample (per channel); typically 4 bits for MS ADPCM
0x24 uint16 20 00 32 size of the following extended format specification, in bytes
0x26 uint16 F4 03 1012
0x28 uint16 07 00 7 number of the following coefficient pairs; always 7 in practice
0x2A int16-16 00 01 00 00 256, 0 The coefficient pairs themselves (always the same in practice).
0x2E int16-16 00 02 00 FF 512, -256
0x32 int16-16 00 00 00 00 0, 0
0x36 int16-16 C0 00 40 00 192, 64
0x3A int16-16 F0 00 00 00 240, 0
0x3E int16-16 CC 01 30 FF 460, -208
0x42 int16-16 88 01 18 FF 392, -232
0x46 char[4] 66 61 63 74 fact identifier announcing the following "fact" section
0x4A int32 04 00 00 00 4 size of the following "fact" section in bytes
0x4E int32 8A 05 01 00 66954 actual number of samples (see below for calculation)
0x52 char[4] 64 61 74 61 data identifier announcing the following "data" section
0x56 int32 00 0C 01 00 68608 size of the following "data" section in bytes (67 blocks of 1024 bytes)
0x5A block[14] 05 05 10 00 10 00 0C
00 AF FF 2E 00 B4 FF
(5,5) (16,16)
(12,-81) (46,-76)
the header of the first 1024-byte block (14 bytes for a stereo block)
0x68 byte[8]... F0 F1 00 00
00 30 20 10
(-1,0) (-1,1) (0,0) (0,0)
(0,0) (3,0) (2,0) (1,0)
the first 8 pairs of nibbles (stereo samples); 1002 more bytes follow
ADPCM coefficient table
The 7 pairs of coefficients are a standard set, hardcoded in practically every implementation of MS ADPCM. In theory their number and values are allowed to vary, and therefore any ADPCM-compressed waveform still provides the coefficients that were used for encoding, even though they are always the same in practice. Think of it as a "key" that needs to be common to the compression and decompression phases.
Sample rate
Standard WAVE PCM supports completely arbitrary sample rates. (Besides 22050 and 44100, common ones are 8000, 11025, 48000 and 88200. See HERE for more.) Oni, however, ignores the sample rate and plays back all waveforms as 22.05 kHz.
Bit depth
The bit depth of the compressed nibbles is typically 4 bits, the decompressed data is ordinary 16-bit PCM (two 16-bit samples for each channel are explicitly provided at the start of each block).
Variable sizes, mono vs stereo
For a mono sound, the layout would be the same, except for the channel count (1 instead of 2) as well as a possibly different block alignment. From the channel count and block size follows the number of samples per block (at 0x26), the rules being as follows (details HERE):
  • For a mono signal (not shown above), an ADPCM block starts with seven bytes: an 8-bit predictor index, a 16-bit delta, and two 16-bit PCM samples (starting with the second sample to play).
  • For a stereo signal (as shown above), an ADPCM block starts with 14 bytes: two 8-bit predictor indices, two 16-bit deltas, and two pairs of 16-bit PCM samples (Left then Right in each case).
  • The rest of the block is filled, in the case of stereo, with pairs of Left and Right nibbles (grouped into bytes). In the case of mono, the bytes consist of consecutive same-channel nibbles.
From the "samples per block" follows the average rate (at 0x1C). From the actual number of samples (at 0x4E) follows the number of ADPCM blocks required to store the waveform, and therefore the data size at (0x56), a multiple of "block size", which in turn affects the total RIFF size at 0x04.
"fact" section VS truncated data
It is not standard-compliant to truncate the final ADPCM block after the last actual sample like Oni does. The WAVE file is expected to contain whole ADPCM blocks (usually padded with zeroes), and the exact number of actual samples is specified in the "fact" section. This is not very space-efficient, but it allows a waveform to have a completely arbitrary number of samples. With Oni's truncated data and lack of an explicit sample count, it is impossible to specify an odd-numbered count of mono samples (because the bytes of mono ADPCM blocks consist of consecutive same-channel nibbles, and there is no way to tell whether the last nibble counts or not). Also, Oni can only truncate after the header of an ADPCM block, therefore the two explicitly stored samples in the header are always played, whereas "fact" can be used to indicate that only the first sample is actual waveform data, and the other is a dummy.
Data size
The size at 0x56 is the size of the raw data that starts at 0x5A, which consists of whole ADPCM blocks and will need to be truncated after the last actual sample for use in an Oni SNDD instance. If the data is not truncated, Oni will determine the sample count from the padded data size, and there will be a noticeable silence (up to 46 milliseconds, or almost 3 game ticks, at 22.05 kHz and for 1012 samples per block) – not a problem for impulse sounds, but definitely not recommendable for ambients or music.


Importing from WAVE into Oni

Filling in PC demo SNDD

Block alignment
The PC demo engine can only play back MS ADPCM sounds, and only with a block size of 512 bytes per channel. This means that a WAVE file cannot be adapted for Oni PC demo at all (not without reencoding, anyway), unless it conforms to the above "MS ADPCM" template with the following value at 0x20: 512 bytes per block for mono sounds; 1024 bytes per block for stereo sounds. The other non-trivial values (samples per block and data rate) follow unambiguously from the channel count, block alignment and sample rate, so you only need to check the block size against the channel count.
Sample rate
The PC demo will play waveforms at 22.05 kHz, so if your WAVE file has a different sample rate, you will need to adjust the pitch/speed rate in OSGr.
Sample count and raw data size
Oni uses the raw data size to determine the sample count of ADPCM data, and a WAVE file stores whole ADPCM blocks (zero-padded), so you will need to look up the sample count in the "fact" section and use it to determine the size of the trimmed data that should be copied to Oni's SNDD (.raw part). The formula is not entirely trivial, so it is recommended that you derive it on your own:
  • Divide the actual number of samples (from "fact") by the number of samples per block, and look at the remainder (R).
  • If R is 0, then you are lucky and the waveform exactly fits in a whole number of blocks; just copy the data as-is.
  • If R is 1, then the waveform only uses the first sample from the last block's header, so you trim the final ADPCM block to the first 7 or 14 bytes (for mono or stereo respectively).
  • If R is 2 or larger, then you need to keep the 7 or 14 first bytes of the last block, plus R-2 more bytes for stereo and ceil[(R-2)*0.5] for mono.
Copy the right amount of bytes as the SNDD's .raw part, store its address at 0x14 in the SNDD's .dat part, and put the size of the data at 0x10.
Duration in game ticks
The duration (value at 0x0C in the PC demo SNDD) is computed from the number of stereo samples: the implicit playback rate is 22.05 kHz, so divide the sample count by 367.5 and round off to the lower integer value.
Flags (at 0x08 in SNDD)
Use 0x00000001 (Little Endian 01 00 00 00) for mono, and 0x00000003 (Little Endian 03 00 00 00) for stereo.

Filling in PC retail SNDD

The PC retail engine is more flexible since it has a 50-byte format section specifically intended to receive the "fmt " chunk of a MS ADPCM .wav file along will all the non-trivial ADPCM settings. This format section is activated with the 0x00000008 flag, provided that the 0x00000004 flag is OFF (this "4" flag overrides the "8" flag and forces the interpretation of the data as IMA4 ADPCM).

The 50-byte format section in the SNDD can also be used to store the shorter "fmt " chunk of PCM .wav files, and this is the recommended procedure, as opposed to using the "raw PCM" functionality of PC retail SNDD (if you remove the 0x00000008 flag, then the format header is completely disabled and the .raw SNDD data is interpreted as 22.05 kHz PCM, using the channel count specified at OSGr level; don't do that, and just use the "8" flag).

Copying the .raw data
For PCM, just refer to the "PCM" template above: copy the data that starts at 0x2C in the WAVE file (its size can be found at 0x28, but it reaches to the end of the file anyway since there is no padding). Copy the data to the SNDD's .raw part, store the address at 0xC4 in the SNDD's .dat part and the size at 0xC0.
For MS ADPCM, you need to trim the final ADPCM block, with the same calculation as laid out for MS ADPCM above. Copy the truncated data from WAVE's "data" to the SNDD's .raw, and fill in its address and size in the SNDD's .dat, at 0x44 and 0x40 respectively.
Copying the "fmt " header
For MS ADPCM, overwrite the whole format section in the SNDD (at 0x0C) with the whole contents of the WAVE file's "fmt " section (at 0x14).
For PCM, use the "fmt " section's content to fill the first 16 bytes of the SNDD's format section (the other 34 bytes will be ignored).
Block alignment
Unlike for PC demo, any block alignment is supported, so as long as you are working with a self-consitent WAVE file, you have nothing to worry about (apart from the sample rate).
Sample rate
Like PC demo, PC retail will play all waveforms at 22.05 kHz (yes, even the 44.1 kHz waveforms from Vanilla game data are actually interpreted as 22.05 kHz waveforms).
Therefore, if your WAVE file has a sample rate other than 22.05 kHz, you will need to compensate for the difference by adjusting the pitch/speed rate in OSGr.
Flags (at 0x08 in PC retail SNDD)
The 0x00000008 flag (Little Endian 08 00 00 00) must be ON, and the 0x00000004 flag (Little Endian 04 00 00 00) must be OFF.
The effect of the "1" and "Z" flags is unknown, so it is recommended to just use the value 0x00000008 (Little Endian 08 00 00 00)

Exporting from Oni to WAVE

From PC demo SNDD

PC demo SNDDs contain standard MS ADPCM data, but the parameters are not stored in the SNDD file. Instead they reside in the engine (hard-coded) and must be generated from scratch when creating the WAVE file. The determining parameters are the sample rate (22.05 kHz), the channel count (either 1 or 2) and the block alignment (512 bytes per channel).

Generating the WAVE header (stereo)
For stereo, use the template provided above for the MS ADPCM variant of the WAVE file, seeing as it already has all the right parameters: channel count 2 (at 0x16), block alignment 1024 (at 0x20), samples per block 1012 (at 0x26), sample rate 22050 (at 0x18), data rate 22311 (at 0x1C).
For mono, use the same template, but with the following substitutions: channel count 1, block alignment 512, data rate 11155. The samples per block remain unchanged since the block size iz two times smaller but there also is twice as little data.
Calculating the sample count
The calculation here is the reverse of the WAVE-to-SNDD conversion. We know the size of the truncated data and we must construct the actual sample count that will be stored in the WAVE's "fact" section.
Divide the .raw data size (at 0x10 in the SNDD's .dat) by the block size (512 per channel) and consider the remainder (R):
  • If R is 0, then the sample count is simply the number of whole blocks times 1012.
  • If R is 14 or larger (for stereo), the sample count must be increased by the value R-14 (one sample per byte).
  • If R is 7 or larger (for mono), the sample count must be increased by ceil[(R-14)*0.5] (two samples per byte).
  • If R is in the [1:6] range for mono, or in the [1:13] range for stereo, then something is wrong with your SNDD file (incomplete ADPCM header for the final block).
Once the sample count is calculated, store it in the "fact" section of the WAVE file, at 0x4E.
Raw data
For a standard-compliant WAVE file, the .raw data of the SNDD must be padded to an integer multiple of "block size" (512 bytes for mono and 1024 for stereo), so that the data consists of whole ADPCM blocks. Once you have done that, put the padded .raw data at 0x5A in the WAVE file, and update the "data" section's size (at 0x56 as well as the RIFF container's size at 0x04).

From PC retail SNDD

Since "raw PCM" has remained udocumented for a long time, it is assumed that you will never need to convert to WAVE from a SNDD that uses the raw PCM functionality (i.e., with both the 0x0000004 and the 0x00000008 flags disabled). Most probably you will need to convert from an SNDD that contains a standard-compliant "fmt " section. (The raw PCM case will still be covered below, very briefly).

At the time of writing, it has not been confirmed whether PC retail Oni supports any WAVE format IDs other than 1 (Linear PCM) and 2 (MS ADPCM), so we are only going to assume those two (or raw PCM).

MS ADPCM

If the format ID (at 0x0C in the SNDD) is 2, then you need to work from the "MS ADPCM" template for the WAVE file.

Copying the "fmt " header
Take the whole 50-byte block that starts at 0x0C in the SNDD, and use it to replace the contents of the "fmt " section in the WAVE file (starting at 0x14).
Calculating the sample count
Same calculation as above for the PC-demo-SNDD-to-WAVE conversion, except the "block alignment" and "samples per block" parameters can be custom now.
Divide the .raw data size (at 0x40 in the SNDD's .dat) by the block size (custom value at 0x18 in the SNDD or at 0x20 in the new WAVE file) and consider the remainder (R):
  • If R is 0, then the sample count is simply the number of whole blocks times the "samples per block" value (at 0x1E in the SNDD or at 0x26 in the new WAVE file).
  • If R is 14 or larger (for stereo), the sample count must be increased by the value R-14 (one sample per byte).
  • If R is 7 or larger (for mono), the sample count must be increased by ceil[(R-14)*0.5] (two samples per byte).
  • If R is in the [1:6] range for mono, or in the [1:13] range for stereo, then something is wrong with your SNDD file (incomplete ADPCM header for the final block).
Once the sample count is calculated, store it in the "fact" section of the WAVE file, at 0x4E.
Raw data
Same as above for exporting from PC demo SNDDs, but with a possibly custom block size.
For a standard-compliant WAVE file, the .raw data of the SNDD must be padded to an integer multiple of "block size" (custom value at 0x18 in the SNDD or at 0x20 in the new WAVE file), so that the data consists of whole ADPCM blocks. Once you have done that, put the padded .raw data at 0x5A in the WAVE file, and update the "data" section's size (at 0x56 as well as the RIFF container's size at 0x04).

PCM (with "fmt ")

If the format ID (at 0x0C in the SNDD) is 2, then you need to work from the "PCM" template for the WAVE file.

Copying the "fmt " header
Grab a 16-byte block from the SNDD's .dat part, starting at 0x0C, and use it to replace the contents of the "fmt " section in the WAVE file (starting at 0x14). Be careful not to copy over 16 bytes, so as not to overwrite the "fact" and "data" sections.
Raw data
For PCM there is no padding, so copy the .raw data as-is into the "data" section of the WAVE file, at 0x2C, and store its size at 0x28. Update the size of the RIFF container (at 0x04) accordingly.

raw PCM

For raw PCM, everything is as with formatted PCM, except the format is a default one (Linear PCM, 22.05 kHz, 16 bits per sample). The channel count is inferred from OSGr.

Filling in the "fmt " header
Start with the "PCM" template. Leave the sample rate (at 0x18) as 22050 and the bit depth (at 0x22) as 16. Look up the channel count in the OSGr referencing the SNDD.
  • If the sound is referenced as mono (from OSGr), then back in the WAVE file set the block size (at 0x20) to 2 and the data rate (at 0x1C) to 44100.
  • If the sound is referenced as stereo (from OSGr), then back in the WAVE file set the block size (at 0x20) to 4 and the data rate (at 0x1C) to 88200.
Raw data
For PCM there is no padding, so copy the .raw data as-is into the "data" section of the WAVE file, at 0x2C, and store its size at 0x28. Update the size of the RIFF container (at 0x04) accordingly.


Back to SNDD