## Section: Scientific Foundations

### Algebraic and Elementary Functions

**Elementary Functions and Correct Rounding.**
Many libraries for elementary functions are currently available. We
refer to [55] for a general insight into the domain.
The functions in question are typically those defined by the C99 and
LIA-2 standards, and are offered by vendors of processors, compilers
or operating systems.

Though the 1985 version of the IEEE-754 standard does not deal with these functions, there
is some attempt to reproduce some of their mathematical properties, in
particular symmetries. For instance, monotonicity can be obtained for
some functions in some intervals as a direct consequence of accurate
internal computations or numerical properties of the chosen algorithm
to evaluate the function; otherwise it may be *very* difficult to
guarantee, and the general solution is to provide it through correct
rounding. Preserving the range (e.g., atan(x)[-/2, /2] )
may also be a goal though it may conflict with correct rounding (when
supported).

Concerning the correct rounding of the result, it was not required by
the IEEE-754-1985 standard: during the elaboration of this standard, it
was considered that correctly rounded elementary functions were
impossible to obtain at a reasonable cost, because of the so-called
*Table Maker's Dilemma* : an elementary function is evaluated to
some internal accuracy (usually higher than the target precision), and
then rounded to the target precision. What is the minimum accuracy necessary
to ensure that rounding this evaluation is equivalent to rounding the
exact result, for all possible inputs? This question could not be answered
in a simple manner, meaning that correctly rounding elementary functions
may require arbitrary precision, which is very slow and resource-consuming.

Indeed, correctly rounded libraries already exist, such as GNU MPFR
(http://www.mpfr.org/ ),
the Accurate Portable Library released by
IBM in 2002, or the `libmcr` library, released by Sun Microsystems
in late 2004. However they have worst-case execution time and memory
consumption up to 10,000 worse than usual libraries, which is the main
obstacle to their generalized use.

We have focused in the previous years on computing bounds on the intermediate precision required for correctly rounding some elementary functions in IEEE-754 double precision. This allows us to design algorithms using a tight precision. That makes it possible to offer the correct rounding with an acceptable overhead: we have experimental code where the cost of correct rounding is negligible in average, and less than a factor 10 in the worst case. These performances led the IEEE-754 revision committee to recommend (yet not request) correct rounding for some mathematical functions. It also enables to prove the correct-rounding property, and to show bounds on the worst-case performance of our functions. Such worst-case bounds may be needed in safety critical applications as well as a strict proof of the correct rounding property. Concurrent libraries by IBM and Sun can neither offer a complete proof for correct rounding nor bound the timing because of the lack of worst-case accuracy information. Our work actually shows a posteriori that their overestimates for the needed accuracy before rounding are however sufficient. IBM and Sun for themselves could not provide this information. See also § 3.4 concerning the proofs for our library.

**Approximation and Evaluation.**
The design of a library with correct rounding also requires the study
of algorithms in large (but not arbitrary) precision, as well as the
study of more general methods for the three stages of the evaluation
of elementary functions: argument reduction, approximation, and
reconstruction of the result.

When evaluating an elementary function for instance, the first step consists in reducing this evaluation to the one of a possibly different function on a small real interval. Then, this last function is replaced by an approximant, which can be a polynomial or a rational fraction. Being able to perform those processes in a very cheap way while keeping the best possible accuracy is a key issue [2] . The kind of approximants we can work with is very specific: the coefficients must fulfill some constraints imposed by the targeted application, such as some limits on their size in bits. The usual methods (such as Remez algorithm) do not apply in that situation and we have to design new processes to obtain good approximants with the required form. Regarding the approximation step, there are currently two main challenges for us. The first one is the computation of excellent approximations that will be stored in hardware or in software and that should be called thousands or millions of times. The second one is the target of automation of computation of good approximants when the function is only known at compile time. A third question concerns the evaluation of such good approximants. To find a best compromise between speed and accuracy, we combine various approaches ranging from numerical analysis (tools like backward and forward error analysis, conditioning, stabilization of algorithms) to computer arithmetic (properties like error-free subtraction, exactly-computable error bounds, etc.). The structure of the approximants must further be taken into account, as well as the degree of parallelism offered by the processor targeted for the implementation.

**Adequation Algorithm/Architecture.**
Some special-purpose processors, like DSP cores, may not
have floating-point units, mainly for cost reasons. For such integer or
fixed-point processors, it is thus desirable to have software support
for floating-point functions, starting with the basic operations.
To facilitate the development or porting of numerical applications on
such processors, the emulation in software of floating-point arithmetic
should be compliant with the IEEE-754 standard; it
should also be very fast. To achieve this twofold goal, a solution is
to exploit as much as possible the characteristics of the target
processor (instruction set, parallelism, etc.) when designing
algorithms for floating-point operations.

So far, we have successfully applied this “algorithm/architecture adequation” approach to some VLIW processor cores from STMicroelectronics, in particular the ST231; the ST231 cores have integer units only, but for their applications (namely, multimedia applications), being able to perform basic floating-point arithmetic very efficiently was necessary. When various architectures are targeted, this approach should further be (at least partly) automated. The problem now is not only to write some fast and accurate code for one given architecture, but to have this optimized code generated automatically according to various constraints (hardware resources, speed and accuracy requirements).