Clear, Efficient Audio Signal Processing in ANSI C
Adrian Freed
Introduction
The popularity of the C programming language and its
derivatives in signal processing applications is increasing. Two
books specifically about signal processing in C have recently
become available [Reid, David]. A broad collection of numerical
algorithms with implementations in C, Fortran and Pascal has been
available for some years [Press]. Efforts are underway to develop
an ANSI standard for Digital Signal Processing extensions to C
[Leary]. A discussion of the advantages and disadvantages of C
for numerical computing may be found in [MacDonald].
When a new reduced instruction set computer (RISC) or digital
signal processor (DSP) is introduced a C compiler is one of the
first high level development tools available. The wide
availability of compilers and the ease with which portable
applications may be written has prompted the use of C as a target
language for other programming languages, such as C++, Pascal and
Fortran and for higher level tools, such as the Ptolemy
system[Buck].
C is expressive enough to allow the careful programmer to
obtain the numerical performance potential of a processor so
important for many signal processing applications. This paper
presents recommendations for efficiently implementing signal
processing algorithms in C without sacrificing clarity or
portability. These recommendations are derived from the author's
experience implementing signal processing functions for musical
applications [Freed 1992] with the MIPS R4000 [Kane] C compiler
on a Silicon Graphics Indigo workstation. There is therefor a
bias in the example code towards audio signal processing, but the
recommendations apply to numerical computing in general.
Most of the recommendations can be interpreted for traditional
C [Kernighan], ANSI C [ANSI] and C++ [Ellis]. Although the ANSI
standardization of C did not greatly change the language, some of
these changes directly concern implementation of numerical
algorithms. These and issues relating to C++ will be discussed
throughout.
Recommendations will look like this
If you are in a hurry just read recommendations in
italics
Data Types for
Signals
Introduction
The choice of data types for variables in signal processing
applications relates directly to algorithm performance. The
following sections discuss the basic data types available in
C.
Integers
Integers in C come in four potentially different sizes:
char, short, int and long,
and two flavors: signed and unsigned. Long
integers are at least 32-bit wide in most modern machines and are
a good choice for most integer calculations in signal processing.
Short integers are often 16-bit wide and are commonly used to
represent sound samples for storage on disks and for D/A and A/D
conversions. Although int is usually 32 bits, it has
been customary for many compilers to provide 16-bit wide
int's for the Motorola 68000 Family of processors. Many
low-cost digital signal processors provide 16 or 24-bit
operations and addressing.
Floating point
Floating point numbers in C can be single (float),
double (double ) or extended precision
(long double). Most processors that provide
floating point operations support a 32-bit single precision
format with a 24-bit signed magnitude mantissa. Most use the IEEE
754 standard for such numbers, although popular low-cost DSP
chips, the TMS320c3x series for example, do not. Most RISC
machines and floating point coprocessors provide direct support
for double precision floating point numbers with a 53-bit signed
magnitude mantissa. Long double's are the same as
double's in the current generation of processors.
The following table is for the MIPS C compiler:
Name Type Size Range
Character char 8 0 to 255
Integer int (long) 32 -27147483648 to 27147483647
short 16 -32768 to 32767
unsigned 32 0 to 4294967295
Floating Point float 32 +/-10+/-38
double (long double) 64 +/-10+/-380
Example: Musical Signals
Audio signals for musical applications will serve as an
example of how to choose appropriate data types.
Samples: Integers or Floating
Point?
Most RISC processors have faster floating point multiply
operations than integer ones. Integer additions on the other hand
are usually as fast as or faster than floating point additions,
because of the normalization step required in floating point
arithmetic. Though the choice of data type for signals appears at
first glance to depend on the instruction timing of a particular
processor and the particular mix of multiplications and additions
of a given algorithm, many other factors support the choice of
floating point:
Floating point operations allow much greater dynamic
range*
Applications driving future processor designs require
good floating point performance.*
Use float for calculations on sound
samples
*Many processors implement integer arithmetic using the
floating point ALU, e.g. PA-RISC [Hansen]
typedef can be used to parameterize types to allow
for future changes, e.g.
typedef long Stype; /* sample */
typedef float Ftype; /* frequency */
typedef double Ttype; /* time */
In practice this parameterization is not as useful as might be
hoped. Consider the filter:
#ifdef FLOAT
typedef float Stype;
#else
typedef long Stype; /* sample */
#endif
Stype simple_filter(Stype input, float k, float j)
{
static Stype output= 0;
return output = output*k +input*j;
}
This code will be slower when Stype is long than when
it is float , because variables of integer type in the
same expression as variables of floating point types are promoted
to floating point. Not only will these conversions slow the
filter down, but the critical arithmetic operations will be done
in floating point anyway.
Here is the filter using integers:
#define SCALE 65536
long integer_filter(long input, long k, long j)
{
static long output = 0;
return output = (output*k+input*j)/SCALE;
}
Allowing for 16-bits of dynamic range in the input and output
and to avoid integer overflow, the above function only allows for
15-bits of precision in the specification of the filter
coefficients.
The above example illustrates a major difficulty for signal
processing with C integers : there is no fixed point arithmetic
interpretation of those integers such as that of fixed point
signal processor chips like the DSP56001 [Lee]. Although
multiplication of two 32-bit integers on RISC processors such as
the R3000 yields the expected 64 bit resolution result, the
multiplication operator in C yields an integer the same size as
its operands: 32 bits. The top 32 bits are unavailable to the C
programmer and it is therefor impossible to maintain higher
resolution for subsequent additions within an expression.
Although there are important applications where 16 bit signals
and filter coefficients suffice, most audio applications demand
at least 24-bit signal resolution and better than 48-bit
resolution for additions. This difficulty is addressed in DSP/C
by adding a fract data type to C [Leary].
Time
How many bits are needed to represent time to within a sample
at 48kHz sample rate over the closed interval of one day? All the
integers between 0 and 24*60*60*48000 = 4147200000 have to be
represented. 32-bit floating point numbers and 24-bit or smaller
integers can be eliminated as candidates since 2^23 = 8388352.
These types cannot represent time to the required precision over
the interval of one hour. This leaves the 32-bit integer or
double precision floating point types. The floating point
representation has the advantage that time could be easily
represented in seconds and be independent of sample rate. The
addition operation on 32-bit integers is faster than double
precision addition on many processors and may be implemented
concurrently with other instructions.
Use long for counting samples
Use double or long for measuring elapsed
time
Delay
Delay line lengths for waveguide filters, for example, [Laakso
et al] need to be specified to a better precision than the sample
interval. A 100th of a sample interval precision over the range 0
to a maximum delay of a few minutes covers the broad range of
needs from, for example, accurately tuning high frequencies for a
simulated flute to the long delays typical of large reverberant
spaces.
Use float or long integers for
fractional sample delays
Frequency
For most applications the range 0 to 48Khz suffices. However
it can be useful to represent negative frequencies and it is now
commonplace to sample at a rate higher than required by the
Nyquist criterion and the range of human hearing. Let's allow for
-1Mhz to 1Mhz. Although human pitch perception has limited
precision, it is a common mistake to use this to decide on the
required frequency precision. In additive synthesis, for example,
sounds are synthesized for more than their pitch percept. Their
timbre and loudness is also important and is a function of the
phase and amplitude of a set of sinusoidal components. Two
sinusoidal components close in frequency are perceived as one
component with an amplitude modulation whose rate is a function
of the difference in frequency. Differences between these beating
sounds are perceptible when the frequency difference is less than
1 Hz.
A usable range of values is easily covered by a single
precision floating point number or 32-bit integer. The 24-bit
fractional format of the DSP56000 family suffices, but 16-bits is
not enough.
Use float or an integer type with more than
16-bits for frequency parameters
Phase Increments
An important use of the frequency parameter is in the
representation of a phase increment for synthesis of periodic
signals using linear oscillators. If the signal to be synthesized
is read from a table that has a length which is a power of 2, a
very efficient oscillator implementation is possible by
accumulating integer phase increments [Hartmann].
Use an unsigned integer type with more than 16-bits for
phase increments and accumulators.
Amplitude
16-bit PCM (the standard used in Compact Discs) provides
enough dynamic range for most listening contexts. However, when
recording it is common to encounter situations where more dynamic
range is needed. It is also useful to have more resolution to
avoid the effects of round off errors during operations on
signals. Although most high-quality audio is stored on tape and
disk as 16-bit integers, many vendors now offer the option of
24-bits. Most already do at least 24-bit processing and many
offer 20-bit D/A conversion [Sony Hi-bit reference].
Use signed and unsigned char for low-quality sound
samples
Use short (16-bit) integers for sound samples on tape and
disk
Use float or better than 16-bit integers for
calculations on sounds
Machine and Compiler
Independent Optimizations
Consider the following vector function from UDI [Depalle]:
sun_vaddosb(vect_in1,incr_in1,vect_in2,incr_in2,
vect_out,incr_out,vect_length)
float *vect_in1, *vect_in2, *vect_out;
unsigned long vect_length, incr_in1, incr_in2, incr_out;
{
register float tmp,
tmp_1,
tmp_2;
if((incr_in1 == 1) && (incr_in2 == 1) && (incr_out == 1))
{
while(vect_length--)
{
tmp_1 = *vect_in1++;
tmp_2 = *vect_in2++;
tmp = tmp_1 - tmp_2;
*vect_out++ = (tmp_1 + tmp_2) / tmp;
}
}
else
{
while(vect_length--)
{
tmp_1 = *vect_in1;
tmp_2 = *vect_in2;
tmp = tmp_1 - tmp_2;
*vect_out = (tmp_1 + tmp_2) / tmp;
vect_in1 += incr_in1;
vect_in2 += incr_in2;
vect_out += incr_out;
}
}
}
This is typical of coding styles which have evolved since the
earliest applications of C. Programmers have taken advantage of
the ease with which C can be used to express low-level operations
to optimize the performance of the code using the compiler for
the machine at hand.
The following version might have been the starting point for
these optimizations:
void vaddosb(vect_in1,incr_in1,vect_in2,incr_in2,
vect_out,incr_out,vect_length)
float *vect_in1, *vect_in2, *vect_out;
unsigned long vect_length,incr_in1,incr_in2,incr_out;
{
unsigned int i;
for(i=0;i<vect_length;++i)
vect_out[i*incr_out] =
(vect_in1[i*incr_in1] + vect_in2[i*incr_in2])
/ (vect_in1[i*incr_in1] - vect_in2[i*incr_in2]);
}
No register variables, pointers or temporary variables are
used and no special case is made for vectors of contiguous
elements. Will this smaller, clearer function be much slower
because of the 5 multiplies introduced per iteration? Will it be
slower without the special case which allows compilers to
optimize the post increment? With the MIPS R3000 processor on the
SGI Indigo, the above code is faster than the original hand
optimized code! The following measurements are the sum of the
user time of ten runs of UNIX time(1) command executing a program
which called the aforementioned functions one million times with
32 element vectors:
vector stride 1
sun_vaddosb 225.9
vaddosb 225
vector stride 2
sun_vaddosb 225.4
vaddosb 225
The MIPS C compiler did all the optimizations the original
programmer did by hand and was able to generate more efficient
loop testing code, which accounts for the small advantage over
the original function. A conclusion from these measurements is
that it is better to write simple, clear programs and examine
their performance in a particular application than to write in a
machine specific optimized style. This section is devoted to
recommendations which are likely to yield faster code with most
machines and compilers without leading to hard to understand
code. The last section will discuss machine and compiler specific
optimizations. But first here are some general recommendations on
changing code to make it more efficient.
It can be difficult to predict performance by inspecting C
code. Now that many processors have optimizing assemblers it is
not easy to predict performance by looking at the output of the C
compiler either. The only way to tell that changes to code are
improving performance is to measure performance. Most modern
compilers come with tools for execution time profiling. Even when
these are not available, the UNIX time(1) command can be used to
good effect.
Measure real performance to changes in code
It is very easy to break working code while changing it to
make it more efficient. Most programmers have experienced the
punishment for not following these recommendations from
[Kernighan] :
Make it right before you make it faster
Keep it right when you make it faster
Make it clear before you make it faster
Function Argument
Promotion
Arguments to functions defined with "old style" declarations
are promoted to larger types. Char's and
short's are promoted to int's. Float's
are promoted to double's. So consider this version of
the simple filter:
typedef float Stype
Stype simple_filter(input, k,j)
Stype input; /* will be promoted to double */
float k; /* will be promoted to double */
float j; /* will be promoted to double */
{
static Stype output= 0;
return output = output*k +input*j;
}
On machines where the float and double types
are different sizes this version will be considerably slower than
one with the new declaration style introduced with ANSI C and
C++. Not only will all the arithmetic be performed in double
precision, but two conversions between float and
double will be required to use and assign to the output
variable. Most RISC processors have efficient double precision
ALU operations. However the additional costs of conversion
instructions cannot be avoided. Bus bandwidth and storage
requirements of double's is usually twice that of
float's. The size penalty may have an unpredictable and
negative impact on performance by wasting precious cache memory
or floating point registers.
Use new style declarations
Use conversion utilities to change old-style declarations
in traditional C code
Floating Point Constants and
Expressions
Note that the constant 3.141 is of type double and
its use in an expression will cause other types to be promoted to
double.
Use the ANSI C f or F suffix for single
precision floating point constants,
e.g. 3.141f
or when the suffix is unavailable use a caste:
e.g. (float) 3.141
In this example, most compilers will generate code to multiply
k by t, PI, and 2.
#define PI 3.14159265358979
a = r*sin(2*PI*k*t);
So this constant folding will be more efficient:
#define TWOPI 6.2831853072 /* 2[[pi]] */
a = r*sin(TWOPI*k*t);
Fold constant expressions into single
constants
Division is usually considerably more expensive than
multiplication and compilers often do not do strength reduction,
i.e. x/2.0 can be written as x*0.5. Strength
reduction of 2.0*x to x+x is often not done by
RISC compilers because in some instruction schedules the addition
may be a slower choice.
Consider strength reduction when it is measurably
faster
C Standard Library
Functions
Standard C library functions such as sin() and
cos()return double precision results to a double
precision argument. ANSI C standardized single precision floating
point versions of these routines.
When single precision suffices use faster library
routines, e.g. fsin() and fcos() instead of sin() and
cos()
Mathematical library routines are designed to achieve good
performance without compromising accuracy for a wide range of
input values. This means that they work over a much larger domain
than an application may require. They are also required to
maintain the errno variable for exception handling.
Consider restrictive and faster implementations of C
library functions
Remember thought that considerable effort may have been
expended on efficient implementations of the C library functions.
The hard work to do better ones for specific applications may be
difficult to justify.
Exceptions
Certain arithmetic operations such as division by zero may
require special additional exception processing. The C standard
does not define this additional work. It is usually processor
specific but may be well defined in the case of IEEE 754 floating
point exceptions such as overflow and underflow. In any event
many of these exceptions result in a potentially expensive
processor context switch to handle the exception.
Avoid computations which may lead to expensive exception
processing
Expensive
Conversions
Conversions from one data type to another may be expensive.
Conversions from floating point numbers to signed and unsigned
integers are commonly slow operations. On the R3000 it is cheaper
to convert a floating point number to a signed integer and then
that integer to an unsigned integer than directly from floating
point to an unsigned integer. The results are not necessarily the
same.
Avoid expensive conversions from one type to
another
Parentheses
In traditional C, compilers are completely free to a change
the order the evaluation of commutative and associative
operators. ANSI C requires that compilers honor the order of
expression evaluation specified by parentheses. Unless numerical
precision requirements dictate otherwise (since floating point
addition is not associative) it therefor is best to minimize use
of parentheses and give compilers the most options for
optimization. This unfortunately may compromize the readability
of programs.
With ANSI C consider parentheses to trade numerical
precision for optimization opportunity
Increase Locality of
References
Modern computer systems are designed to take advantage of
locality of memory references, whether these systems be based on
RISC, CISC, or DSP architecture. Register variables and data
cache memory are used to take advantage of temporal locality -
the tendency for most variable references to be to variables
which were most recently referenced. Instruction caches and
queues are designed to take advantage of temporal and spatial
locality of instruction streams - the tendency for instructions
streams to contain contiguous instructions or repeats of short
sequences from loops. Cache lines and page mode memory chips are
used to take advantage of spatial locality of data and
instructions.
It is important therefor to consider algorithms and
implementations which enhance the temporal and spatial locality
of instruction and data references. This will become even more
significant as systems evolve to include more than one processor,
because of the high cost of references to variables shared
between processors. Although the recommendations to enhance
locality are machine independent, the payoff in their application
will vary considerably among processors and memory subsystem
designs.
The cost of access to a variable may vary considerably
according to its address in memory.
Use a memory allocator which aligns data to memory page,
and cache line boundaries - the vendor supplied library function
for memory allocation may do this automatically
Register files and cache memory are much smaller in size than
main memory, so it is important to be economical in the use of
space. For numerical applications this means not unnecessarily
using large data types such as double when
float will do.
double variables usually use twice the register and
cache space of float variables, use them only when the precision
they offer is required
In C it is usually easy to control spatial locality of
variables, since arrays are implemented in contiguous memory
locations and successive elements in structures are required to
be implemented contiguously. For example on the SGI indigo, a
reference to the real element of a complex number consisting of
two float structure elements results in the complex
component being brought into cache memory concurrently, speeding
up a subsequent access to the complex element of the
structure.
Place variables that are commonly used together into
contiguous memory locations,
e.g. typedef struct { float r, c; } complex ;
Variables declared with local scope, are not necessarily
implemented on the stack contiguously, so it may pay to create a
structure to group together such variables, unless of course the
compiler is expected to place the variables in registers
anyway.
Machine and Compiler
Dependent Optimizations
Compiler Options
Modern C compilers offer a range of optimization and other
controls over the code generation process. They are usually
described concisely, perhaps in the UNIX manual format.
Consult compiler manual for control over code
generation
The MIPS C compiler for SGI workstations, like many ANSI C
compliant compilers, has an option to revert to traditional C
definition (-cckr). This option is useful for compiling old code,
but it should be used with the -float option, which overrides the
old type promotion rules and allows the compiler to generate
faster single precision floating point operations for expression
with no types larger than float.
use cc -cckr -float for old code
use cc -O2 -c or cc -O3 for ANSI code
The -g[0-3] argument which is used to specify generation
of information for symbolic debugging suppresses
optimization.
Use compiler optimization after code has been
debugged
Profiling code is another potential source of
inefficiency.
Turn off profiling options if it effects
performance
Register Variables
In addition to documentation on the compiler itself, a
reference manual for the language it implements is usually
provided. It is now standard practice to include in such a manual
a description of implementation-defined behavior. One of these
implementation choices is the interpretation of the
register storage-class specifier. The MIPS Compiler uses
up to 6 register storage-class specifiers when not optimizing and
ignores them when optimizing. It always ignores register
storage-class specifiers in new style function definitions and
declarations.
The MIPS compiler like most modern C compilers make very good
choices automatically of which variables to put into registers.
On the rare occasions when it can be shown, by examining the
output of the compiler and measuring performance, that its use of
registers can be improved on, a better way is needed to guide the
compiler. Unfortunately it is better to compile most modules with
optimization so register storage-class specifications
will be ignored. We can however use the volatile type
qualifier, introduced with ANSI C. Variables qualified this way
will not usually be placed in registers.
Use volatile to defeat a compilers unfruitful register
assignments
Exceptions
The MIPS processors have an overflow exception on signed
integer additions and subtractions but not on unsigned integer
operations. When using integers for phase accumulators in digital
oscillators it is common practice to allow the accumulator to
wrap by simply overflowing.
Use unsigned integers for phase accumulators to avoid
exceptions
Recursive digital filters (IIR) with impulsive inputs may
cause floating point underflows. MIPS floating point has an
unmaskable and expensive exception on underflow. This data
dependent instruction execution time is disastrous for hard
real-time applications such as sound synthesis. One solution to
this is to clamp the state variables of the recursive filter to 0
as they approach 0. Recent designs such as the Motorola 88110
allow for exceptions to be completely disabled.
Loops
The following functions all perform the addition of two non
zero length vectors:
void add(int count, float *a, float *b, float *result) /* fortran */
{
int i;
for(i=0;i<count;++i)
result[i] = a[i] +b[i];
}
void add(int count, float *a, float *b, float *result) /* UDI */
{
while(count--)
*result++ = *a++ + *b++;
}
void add(int count, float *a, float *b, float *result) /* Csound */
{
do
{
*result++ = *a++ + *b++;
}while(--count);
}
void add(int count, float *a, float *b, float *result) /* Resound */
{
float *end = result+count;
while(result<end)
{
*result++ = *a++ + *b++;
}
}
void add(int count, float *a, float *b, float *result) /* Ptolemy */
{
int i;
for(i=count-1;i >=0;--i)
result[i] = a[i] +b[i];
}
For the interesting values of count greater than 0,
these functions all perform the same operation. For values less
than or equal to 0 they do not all perform the same way. It is
unreasonable to expect a compiler to generate the same code for
each function. Which function will be the fastest? For many
compilers for RISC machines the first example will be fastest.
Compiler developers cannot optimize every way to express loops.
Perhaps because of a tradition of optimizing Fortran do
loops, the MIPS compiler generates good code for loops with array
indices which are functions of loop iteration count variables. It
unrolls loops thereby increasing the chance for scheduling
optimizations, and minimizing loop control between blocks. It can
also generate fast test instructions for iteration variables with
an upper and lower bound.
Use for loops and an iteration variable instead
of whileReference arrays with iteration
variables instead of pointers
Compilers do not generate code to place the variable
count in the following function into a register:
int count; loopindex() { int i; for(i=0; i < count; ++i)
x(i*i); }
Since the function x(int i) may modify count
the compiler cannot generate code to put a copy of count
in a register at the start of the loop. If it is known that there
are no such side effects it is possible to to help the compiler
to come to the same conclusion by assigning all the variables in
the loop to registers:
int count; loopindex() { int i; int n = count; for(i=0; i
< n; ++i) x(i*i); }
Increase the probability of fruitful register
assignments by avoiding side effects and narrowing scope of
variables so side effects may be checked for by the
compiler
Conclusion
The recommendations in this paper have been used by the author
to create an efficient vector and function library for real-time
audio signal processing on MIPS processors. Preliminary
experiences porting this work to other processors is very
encouraging.
Acknowledgments
Silicon Graphics
PacBell
References
ANSI C language specification, American National
Standard X3.159-1989
Buck, J., Ha, S., Lee, E. A., Messserschmitt, D. G.,
Ptolemy: A Platform for Heterogeneous Simulation and
Prototyping, 1991, Proc. of the 1991 European Simulation
Conference, Copenhagen, Denmark, June 17-19
David, Ruth A., Stearns, Samuel D., Signal Processing
Algorithms Using Fortran and C, 1992, Prentice-Hall
Depalle, Philippe Rodet, Xavier, UDI: A Unified DSP
Interface for Sound Signal Analysis and Synthesis,
Proceedings of ICMC, 1990, Computer Music Association
Ellis, Margaret A., Stroustrup, Bjarne, The Annotated C++
Reference Mnaual , 1990, Addison-Wesley
Freed, A., MacMix: Mixing Music with a Mouse,
Proceedings of the 12th International Computer Music Conference,
The Hague, 1986, Computer Music Association
Freed A., New Tools for Rapid Prototyping of Music Sound
Synthesis Algorithms and Control Strategies, Proc.
International Computer Music Conference, San Jose, Oct. 1992.
Freed, A., Recording, Mixing, and Signal Processing on a
Personal Computer , Proceedings of the AES 5th International
Conference on Music and Digital Technology, 1987
Hansen, Robert C., 1992, New Optimizations for PA-RISC
Compilers, Hewlett-Packard Journal, June 1992.
Hartmann, W. M., Digital Waveform Generation by Fractional
Addressing, J. Acoust. Soc. Amer. 82, 1987, p. 1183-1891
Kane, Gerry, Heinrich, Joe, MIPS RISC Architecture,
1992, Prentice Hall
Kastens, U., Compilation for Instruction Parallel
Processors, 1990, Proceedings 3rd Compiler Compilers
Conference 1990, Springer-Verlag
Kernighan, B. W., Ritchie, D. M., The C Programming
Language, 1978, 1988, Prentice-Hall
Kernighan, B. W., Plauger, P. J., The Elements of
Programming Style, 1978, McGraw-Hill
Laakso, T. I., Välimäki, V., Karjalainen, M., Laine,
U. K., Real-Time Implementation Techniques for a Continuously
Variable Digital Delay in Modeling Musical Instruments,
Proceedings of ICMC 1992, CMA, San Francisco.
Leary, Kevin W., Waddington, William, DSP/C: A Standard
High Level Language for DSP and Numeric Processing,
Proceedings of the ICASSP, 1990, p. 1065-1068
Lee, Edward A., Programmable DSP Architectures, October
1988, IEEE ASSP Magazine
Moore, F. Richard, Elements of Computer Music, 1990,
Prentice-Hall
Press, William H. et al, Numerical Recipes in C - 2nd
Edition, 1992, Cambridge University Press
Reid, Chistopher E., Passin, Thomas B., Signal Processing
in C, 1992, John Wiley & Sons
Macdonald, T., 1991, C for Numeric Computing, The
journal of Supercomputing, 5, 31-48, Kluwer Academic Pulishers,
Boston.