Motivation of Decisions Behind the SDIF Specification

Overview

This document explains some of the motivation and background behind various
design decisions that went into defining SDIF.

Why Isn't SDIF IFF Compatible?

In early versions of SDIF, we claimed compatibility with the IFF standard,
the parent of several important standards AIFF
and RIFF.
However, we have decided to drop strict IFF compatibility because of problems
that arose in trying to ensure 64-bit alignment of
all data types.

Here's the issue: All IFF files consist of one large chunk, which must begin
with some header fields. AIFF files are a "FORM" chunk, whose header
contains these 3 fields:

"FORM" (4 bytes),
a 4 byte size count for the entire file or stream (4 bytes), and
the type of the file, e.g., "AIFF" or "SDIF" (4 bytes)

This adds up to 12 bytes, which makes all subsequent chunks not 16-byte (64
bit) aligned. We could have solved this problem with a mandatory chunk whose
length is 4 more than a multiple of 16 bytes, but that seemed too much like
a kludge.

Other IFF chunks besides FORM have similar problems; all have manadatory "headers"
whose sizes are not multiples of 16 bytes.

Another problem with IFF is that a 4 byte size count is not big enough for
certain extremely large sets of SDIF data.

The SDIF standard still follows the "spirit" of IFF, with a series
of frames each with an identifying 4-byte frame type and size count; the only
practical difference is the lack of an opening "FORM" chunk.

We don't know of any commonly available utilities or libraries for manipulating
IFF files in general, so it seems like we haven't given anything up with this
decision.

How To Embed SDIF in IFF Files

Here's a proposal for embedding SDIF data in IFF files such as AIFF files.
Embedded SDIF data can be no longer than about 2 gigabytes, because IFF chunks
contained a signed 32-bit count of the data size.

Embedded SDIF data would require a special IFF form chunk to be a "wrapper"
around the entire SDIF block:

ChunkID
char[4]
'FORM', as required by IFF

ChunkSize
int32
The size, in bytes, of the entire embedded SDIF data, plus this chunk,
not including the "FORM" ChunkID or this ChunkSize field.

FormType
char[4]
'SDIF' (We need to register this form type!)

PaddingChunkID
char[4]
'SPAD'

PaddingChunkSize
int32
Either 0, 2, 4, or 6

PaddingChunkData
char[n]
0 to 6 bytes of null characters, depending on PaddingChunkSize

SDIFChunkID
char[4]
'SDIF'

SDIFChunkSize
int32
The size, in bytes, of the embedded SDIF data

SDIFData
data
The SDIF data

This structure is a legal IFF chunk and can therefore be embedded inside other
IFF chunks. It consists of an enclosing FORM chunk that includes two subchunks.
The first subchunk is a padding chunk to ensure that the SDIF data will be aligned
on an 8-byte boundary. The second subchunk contains the SDIF data to be embedded,
wrapped in a legal IFF chunk.

The number of padding bytes depends on the size of the data that precedes this
chunk in the IFF file (which must always be a multiple of 2 bytes, per the IFF
standard). For example, suppose the preceeding portion of the file is 200 bytes.
Those 200 bytes, plus the 28 non-padding bytes in the above wrapper, is 228,
which is 4 more than a multiple of 8, so there would need to be 4 padding bytes
to make the SDIF data begin on an 8-byte boundary.

Note that changing the contents of the earlier portion of an IFF file may require
changing the number of padding bytes.

We welcome suggestions about better
ways to do this.

Why Is the Opening Frame's Format Different
From Other Frames?

The opening frame is the only frame that does not have a time tag, a stream
ID, and matrices. Why this nonuniformity? The SDIFSpecVersion and SDIFStandardTypesVersion
could be data elements in a matrices in a frame at time tag minus infinity.

Two reasons:

If a future version of SDIF has a new format for frame headers or matrix
headers, it will be impossible to find the SDIFSpecVersion that indicates
that a given file uses the new format.
Space efficiency: the opening frame would be 48 bytes long instead of only
16.

64-bit Alignment

We want to have the option to work with 64-bit data.
In order to do that in an efficient way, we have to require that all 64-bit
data items are aligned to 64-bit in respect to the beginning of the file.
This is especially important for working with memory mapping of the complete
file, for example with the 'mmap' utilities in UNIX. We made tests for SGI
O2s and DEC Alphas that showed that you have to align to 64-bit in memory
in order to work with 64-bit floats (For the test program mail
to Rolf Woehrmann).
We want to have the time information in the data frames always in 64-bit
floats. So the beginnings of the data frames have to be always aligned to
64-bits. In order to achieve that we have to require possible byte padding
at the end of each matrix data. This further allows to have only specific
matrixes in a single data frame in 64 bit.

Why Allow Optional Columns In Matrices?

There are many cases where it would make sense to add extra columns to a matrix.
For example, the Lemur project does analysis for and synthesis of "bandwidth
enhanced" sinusoids with the usual amplitude, frequency, and phase fields
plus a "bandwidth" field indicating the spectral width or "noisiness"
of the partial. Another example is a measure of the "importance" or
perceptual salience of each of a set of partials.

SDIF supports this in a way that allows programs
that don't understand these extra columns to read and process the information
they do understand without disturbing the extra information: Extra columns must
always appear after the required columns.