Wave Format Extensible and the .amb suffix
or
WAVEX and Ambisonics

index page

This is one of a series of pages detailing attempts to play Ambisonic sound on cheap everyday equipment. This is not the ideal way to do it, but it does allow for experimentation. The faltering steps reported here are offered as one person's experience not as suggestions of how anything should be done!!

1. Other sources

There are far better materials than this on the Web, the listing on the Ambisonia site for details.

The reason this page exists is that it took me the best part of a day to extract what I wanted from that material. So in case it is useful to others, here goes.

2. WAVE (.wav files)

Wave files are sound files. They are a Microsoft standard (but don't hold that against them …). They have been around a long time.

  1. Like many file formats they have a 'header' with metadata. That is with information about the file contents ('informaton about the information').
  2. Some of that information is always the same (bytes 1 to 4 read "RIFF" (or ?rarely "RIFX" if the file is big-endian?), bytes 9 to 12 read "WAVE").
  3. The format of the file contents is (or was) given in bytes 21 and 22.
    For example a code of 0x01 0x00 states the data format is PCM.
    (Interestingly for ambisonic users the next two bytes (23 and 24) give the number of channels (audio tracks) in the file -see below.)
  4. As, presumably, there are a lot more possibilities out there than when this was all invented, then the simple step has been taken that if bytes 21 and 22 are respectively 0xfe and 0xff then the file is what is called Wave Format Extensible ('WAVEX')and the format data occurs in a range of subsequent bytes.
  5. P Kabal (see reference list cited above) states the extensible format "should be used whenever:" -PCM is more than 16bytes/sample; -there are more than 2 channels; -etc.
  6. The 'extra' data in the extensible data starts around byte 37 and is detailed below.

3. A digression on channels

Bytes 23 and 24 give, as stated above give the number of channels.

The number is given (obviously) in binary. It is also in little-endian format, that is to say the least significant digit comes first.

channels byte 23 byte 24 value of the
two bytes
100000001
01
00000000
00
0x0001
200000010
02
00000000
00
0x0002
400000100
04
00000000
00
0x0004
900001001
09
00000000
00
0x0009

The same number format applies to the other fields, thus (as stated above) byte 21 is 0xfe and byte 22 is 0xff for extensible format, and the represented value of the two bytes is 0xfffe.

4. Bytes 37 to 60

Bytes 37 and 38

always record the size of the extension. There are two possible values 0 (0x0000) and 22 (0x0016). If the value is 0 there is no extension(!), if it is 22, then the following 22 bytes have the following content:

Bytes 39 and 40

record the bits per sample. Common values are 16, 24, 32, etc (as in 44.1KHz/16bit CD audio, 48KHz/24 bit DVD audio, etc.).

Bytes 41 to 44

give the speaker positions, that is the position of loudspeakers. This information makes no sense for an ambisonic file and the value 0 (0x00000000) is used.

Bytes 45 to 60

give the GUID, that is they say what format the file is in.

The first two bytes give the information that would have occurred in bytes 21 and 22 if they had not been used for a 'see later' flag. In other words 0x0001 indicates a PCM file (a non-float file, as in -for example- classic 44.1KHz/16bit CD audio) whilst 0x0003 indicates 'IEEE float', that is a floating point file (as in -again as an example- the "32 bit float" that is often offered as a save option on editors).

The two values that can go here for ambisonic files are:

byte: 45464748495051525354555657585960
value:030000002107d3118644c8c1ca000000
value:010000002107d3118644c8c1ca000000

The upper of the two is SUBTYPE_ANBISONIC_B_FORMAT_IEEE_FLOAT (note it begins 0x0003 (but backwards, so 0x03 the 0x00) the lower is SUBTYPE_ANBISONIC_B_FORMAT_PCM (note it begins 0x0000). Besides the digits specifying whether PCM of float the two are the same.

5. Uses

As currently (late 2007) much software will not write ambisonic headers, the above information can be used to write one's own.

I have written wav2amb which works as long as the file is already in Wave Format Extensible. All it does is sets bytes 41 to 44 to zero (that is we refuse to specify speaker positions) and then sets bytes 45 to 60 to one of the above GUIDs. Oh, and of course changes the file suffix from .wav to .amb.

This seems adequate as the change is only once in a file's history and any file can be made into Format Extensible by using Audacity (which seems to be available for virtually all platforms, and is free).

One could extend the rewriting to all WAVE files, this would necessitate adding bytes 39 to 60 (presumably inserting 22 new bytes between the existing byte 38 and 39); rewriting bytes 21/22 and bytes 37/38 to say the extra bytes were coming; … but it would also mean re-doing various calculations (bytes 17 to 20 which give the format 'chunk' size would change, bytes 5 to 8 which give the chunk size (file size - 8)) … It could be done, but the simpler approach is the more robust. (And hopefully in a few months common software will be writing ambisonic files (?)).

20071011: My thanks to Henry and Etienne on the ambisonia.com discussion Forum for forcing me to think about this more deeply.

There is actually a major logical flaw in the WAVEX specification (IMHO). It is totally unfatal and has little practical effect, but it shows the continuing confusion between 'form' and 'function'. All the data in the header is about what is in the file (form), basically about how it was recorded, except for eighteen bytes which say what it is suggested you do with the file (function). Perhaps it would only occur to an 'ambisonics person' to say that recording/creation are very different from playback —but they are!

The eighteen bytes of 'what you may care to do with this file' are bytes 41-44 and 47-60. Stuck in the middle are bytes 45 and 46 which have nothing to do with the intended purpose of the file. They say whether the amplitude is PCM encoded or floating-point encoded (or, even possibly, A-law, or μ-law), they have as much right to be there as whether the file is 48 KHz or 96 KHz, whether it is 16 bit or 24 bit, or …

The ambisonic GUIDs should not be plural, they should not (IMHO) be sixteen byte. It (singular) should be fourteen byte and the two bytes about encoding format should be (mentally (and ideally physically)) elsewhere. If we abandon PCM and float for some new better system, say clairvoyance, then we will have a new data format code and it will apply to all files not just ambisonic ones, and the 14 bytes at the end of the ambisonic GUIDs will stay the same (and the two bytes at the front of ALL GUIDs will change).

So in changing a wavex file (file.wav) to and ambisonic file (file.amb) bytes 45 and 46 stay the same as the encoded data stays the same! Only the intention changes (and thus bytes 41-44 (where to put your speakers (in the politest possible sense)) and the function of file (i.e. ambisonic B-format). No one is perfect, and this only occurred to me the day after I spent hours trying to decide whether to interrogate the user, whether to check configuration files, or what, to see whether the user wanted ambisonic-pcm or ambisonic-float recorded in the header. The user 'wants' neither, the file is in one or the other and bytes 45 and 46 are already correct —they just bizarrely occur in the middle of the intention/function data and have caused two ambisonic GUIDs to be published (whereas there are potentially 256**2 = 65,536 GUIDs for ambisonic files, following the encoding to its illogical extreme … ).

Link to WAV2AMB. (There is also a utility AMBINFO that displays the information (or some of it) that is in the header.)

Link to discussion on Audacity and Wave Format Extensible.


October 2007.

Copyright © 2007–2008 Michael Chapman.