Other File Formats
From Ambisonia
< Link to main article: Ambisonic File Formats
At the moment, official file formats for Ambisonics use
the WAVE format, to which is added additional
metadata residing in WAVE chunks. What
follows is a discussion of other file formats that
could be used for Ambisonics. The need for any of them
in the near-term can drive whether, and
the order in which, they are rolled out.
Metadata is needed in each file format to indicate that it contains an Ambisonic soundfield. This information is needed so that a knowledgeable player can handle the content appropriately. In the 21st century it is unreasonable to expect users to press buttons. Instead, the player is expected to press its own buttons based on metadata contained within the audio data being played.
Contents |
[edit] Fourth-order and higher
The ".amb" format only handles B-Format up to third-order full-sphere (16 channels). To handle any order of B-Format additional metadata needs to be added, specifically two integer numbers for the horizontal- and height-order. If this is done soon, rather than waiting for things to break, then the transition to fourth-order can be made invisible.
[edit] Lossless compression using WavPack
WavPack compresses WAVE files, preserving the WAVE chunks. This means that WavPack can be used for Ambisonics without any effort. It already handles Ambisonics, retaining the necessary metadata in the WAVE chunks.
WavPack is not as well supported as FLAC, but has about the same compression efficiency. In addition to a lossless mode, it also has a hybrid mode producing two files. One file is lossy compressed. It can optionally be combined with a second corrections file, the two together being lossless.
WavPack can either compress existing WAVE files using a command line utility, or WavPack files can be created directly using an API. The WavPack format can handle any number of channels. The current implementation (March 2007) is limited to only 16 channels but the developer, David Bryant, plans to remove this limitation before anybody needs more.
[edit] Files larger than 4 GBytes
WAVE is limited to 32-bit addressing, so cannot handle files larger than 4 GBytes. Other formats, such as FLAC and Ogg Vorbis, have never been limited to 4 GBytes. There are two competing formats for a 64-bit extension to WAVE, Sony Pictures Digital Wave 64 and RF64. Both use chunks. In Sony Wave64, the four-character chunk identifier is replaced with a 16-byte GUID. In RF64, the chunks are identical to 32-bit WAVE chunks. Designing Ambisonic metadata chunks for either format is easy.
It is suggested we wait to see which 64-bit format gathers more support. Hopefully the other will fade and die, at which point the decision of which one to support will be real easy.
The standard for RF64 states that it is limited to 18 channels, but when you look inside there is nothing actually implementing this limit. This can probably be ignored.
[edit] Lossy compression using Ogg Vorbis
The Vorbis I specification has always mentioned B-Format Ambisonics—it uses a channel mapping type other than zero—but the resulting stream is undecodable. The Xiph crowd have started work on supporting B-Format Ambisonics. For formats other than B-Format, there is a need to store additional Ambisonic metadata. The Xiph crowd have suggested that this can be placed in the Ogg container. (Specifically as Key-Value pairs in the message header fields in an Ogg Skeleton stream.)
A single Vorbis stream can handle 256 channels.
[edit] Lossless compression using FLAC
"FLAC" is the name of the codec. To be useful, it must be put into a container. When put in its own container, the result is called "Native FLAC". When put in an Ogg container, the result is called "Ogg FLAC".
Native FLAC can handle only a single FLAC stream, so is limited to eight channels; not even enough for second-order, full-sphere. An Ogg container can contain multiple streams, including multiple FLAC streams. Ogg FLAC can therefore handle any number of channels. Unfortunately, it is Native FLAC which is widely supported; Ogg FLAC is not well supported.
For Native FLAC, Ambisonic metadata can easily be placed into the FLAC codec. Third-party applications can register IDs for use in the METADATA_BLOCK_APPLICATION. These can be the same as the WAVE chunks, with the omission of the size (which, instead, goes into the METADATA_BLOCK_HEADER).
For Ogg FLAC, which will be needed for higher-order Ambisonics, the metadata will need to be placed into the Ogg container. This can be done the same way as for Ogg Vorbis. The only difference is that the metadata will need to include which channels are in what FLAC streams. There are many possible ways to do this using Key-Value pairs, none of them particularly elegant.
Note that the FLAC codec can only contain integer audio data, not floating-point.
[edit] OggPCM
OggPCM is a non-proprietary codec, being developed by the Xiph community, for uncompressed audio. It can accommodate up to third-order full-sphere B-Format (16 channels), 2-, 3- and 4-channel UHJ, and much else. OggPCM also supports preferential unmatrixing that can be used to recover B-Format from G-Format.
[edit] Apple Core Audio Format (CAF)
CAF already supports first-order B-Format, and can handle 64-bit addressing. (It even has its own in-built PEAK chunk.) At the moment CAF is not well supported, and it is suggested we wait to see if it gathers more support outside of Apple.
[edit] Other proprietary formats
Other possible formats for Ambisonics are Dolby Digital, DTS, and AAS. If people know of ways to place user-defined machine-parseable metadata into these, will they please speak up.

