Skip to main content

G.2 Numeric Performance Requirements

danger

This Reference Manual output has not been verified, and may contain omissions or errors. Report any problems on the tracking issue

Implementation Requirements

1

Implementations shall provide a user-selectable mode in which the accuracy and other numeric performance requirements detailed in the following subclauses are observed. This mode, referred to as the strict mode, may or may not be the default mode; it directly affects the results of the predefined arithmetic operations of real types and the results of the subprograms in children of the Numerics package, and indirectly affects the operations in other language defined packages. Implementations shall also provide the opposing mode, which is known as the relaxed mode.

1.a
reason

On the assumption that the users of an implementation that does not support the Numerics Annex have no particular need for numerical performance, such an implementation has no obligation to meet any particular requirements in this area. On the other hand, users of an implementation that does support the Numerics Annex are provided with a way of ensuring that their programs achieve a known level of numerical performance and that the performance is portable to other such implementations. The relaxed mode is provided to allow implementers to offer an efficient but not fully accurate alternative in the case that the strict mode entails a time overhead that some users may find excessive. In some of its areas of impact, the relaxed mode may be fully equivalent to the strict mode.

1.b
implementation note

The relaxed mode may, for example, be used to exploit the implementation of (some of) the elementary functions in hardware, when available. Such implementations often do not meet the accuracy requirements of the strict mode, or do not meet them over the specified range of parameter values, but compensate in other ways that may be important to the user, such as their extreme speed.

1.c
ramification

For implementations supporting the Numerics Annex, the choice of mode has no effect on the selection of a representation for a real type or on the values of attributes of a real type.

Implementation Permissions

2

Either mode may be the default mode.

2.a
implementation defined

Whether the strict mode or the relaxed mode is the default.

3/5

The two modes can be one and the same.

Extensions to Ada 83

3.a

The choice between strict and relaxed numeric performance was not available in Ada 83.

G.2.1 Model of Floating Point Arithmetic

1

In the strict mode, the predefined operations of a floating point type shall satisfy the accuracy requirements specified here and shall avoid or signal overflow in the situations described. This behavior is presented in terms of a model of floating point arithmetic that builds on the concept of the canonical form (see A.5.3).

Static Semantics

2

Associated with each floating point type is an infinite set of model numbers. The model numbers of a type are used to define the accuracy requirements that have to be satisfied by certain predefined operations of the type; through certain attributes of the model numbers, they are also used to explain the meaning of a user-declared floating point type declaration. The model numbers of a derived type are those of the parent type; the model numbers of a subtype are those of its type.

3

The model numbers of a floating point type T are zero and all the values expressible in the canonical form (for the type T), in which mantissa has T'Model_Mantissa digits and exponent has a value greater than or equal to T'Model_Emin. (These attributes are defined in G.2.2.)

3.a
discussion

The model is capable of describing the behavior of most existing hardware that has a mantissa-exponent representation. As applied to a type T, it is parameterized by the values of T'Machine_Radix, T'Model_Mantissa, T'Model_Emin, T'Safe_First, and T'Safe_Last. The values of these attributes are determined by how, and how well, the hardware behaves. They in turn determine the set of model numbers and the safe range of the type, which figure in the accuracy and range (overflow avoidance) requirements.

3.b/5

In hardware that is free of arithmetic anomalies, T'Model_Mantissa, T'Model_Emin, T'Safe_First, and T'Safe_Last will yield the same values as T'Machine_Mantissa, T'Machine_Emin, T'Base'First, and T'Base'Last, respectively, and the model numbers in the safe range of the type T will coincide with the machine numbers of the type T. In less perfect hardware, it is not possible for the model-oriented attributes to have these optimal values, since the hardware, by definition, and therefore the implementation, cannot conform to the stringencies of the resulting model; in this case, the values yielded by the model-oriented parameters have to be made more conservative (that is, have to be penalized), with the result that the model numbers are more widely separated than the machine numbers, and the safe range is a subrange of the base range. The implementation will then be able to conform to the requirements of the weaker model defined by the sparser set of model numbers and the smaller safe range.

4

A model interval of a floating point type is any interval whose bounds are model numbers of the type. The model interval of a type T associated with a value v is the smallest model interval of T that includes v. (The model interval associated with a model number of a type consists of that number only.)

Implementation Requirements

5

The accuracy requirements for the evaluation of certain predefined operations of floating point types are as follows.

5.a
discussion

This subclause does not cover the accuracy of an operation of a static expression; such operations have to be evaluated exactly (see 4.9). It also does not cover the accuracy of the predefined attributes of a floating point subtype that yield a value of the type; such operations also yield exact results (see 3.5.8 and A.5.3).

6

An operand interval is the model interval, of the type specified for the operand of an operation, associated with the value of the operand.

7

For any predefined arithmetic operation that yields a result of a floating point type T, the required bounds on the result are given by a model interval of T (called the result interval) defined in terms of the operand values as follows:

8
  • The result interval is the smallest model interval of T that includes the minimum and the maximum of all the values obtained by applying the (exact) mathematical operation to values arbitrarily selected from the respective operand intervals.
9

The result interval of an exponentiation is obtained by applying the above rule to the sequence of multiplications defined by the exponent, assuming arbitrary association of the factors, and to the final division in the case of a negative exponent.

10

The result interval of a conversion of a numeric value to a floating point type T is the model interval of T associated with the operand value, except when the source expression is of a fixed point type with a small that is not a power of T'Machine_Radix or is a fixed point multiplication or division either of whose operands has a small that is not a power of T'Machine_Radix; in these cases, the result interval is implementation defined.

10.a
implementation defined

The result interval in certain cases of fixed-to-float conversion.

11

For any of the foregoing operations, the implementation shall deliver a value that belongs to the result interval when both bounds of the result interval are in the safe range of the result type T, as determined by the values of T'Safe_First and T'Safe_Last; otherwise,

12
  • if T'Machine_Overflows is True, the implementation shall either deliver a value that belongs to the result interval or raise Constraint_Error;
  • 13
  • if T'Machine_Overflows is False, the result is implementation defined.
13.a
implementation defined

The result of a floating point arithmetic operation in overflow situations, when the Machine_Overflows attribute of the result type is False.

14/5

For any predefined relation on operands of a floating point type T, the implementation may deliver any value (that is, either True or False) obtained by applying the (exact) mathematical comparison to values arbitrarily chosen from the respective operand intervals.

15

The result of a membership test is defined in terms of comparisons of the operand value with the lower and upper bounds of the given range or type mark (the usual rules apply to these comparisons).

Implementation Permissions

16

If the underlying floating point hardware implements division as multiplication by a reciprocal, the result interval for division (and exponentiation by a negative exponent) is implementation defined.

16.a
implementation defined

The result interval for division (or exponentiation by a negative exponent), when the floating point hardware implements division as multiplication by a reciprocal.

Wording Changes from Ada 83

16.b

The Ada 95 model numbers of a floating point type that are in the safe range of the type are comparable to the Ada 83 safe numbers of the type. There is no analog of the Ada 83 model numbers. The Ada 95 model numbers, when not restricted to the safe range, are an infinite set.

Inconsistencies With Ada 83

16.c

Giving the model numbers the hardware radix, instead of always a radix of two, allows (in conjunction with other changes) some borderline declared types to be represented with less precision than in Ada 83 (that is, with single precision, whereas Ada 83 would have used double precision). Because the lower precision satisfies the requirements of the model (and did so in Ada 83 as well), this change is viewed as a desirable correction of an anomaly, rather than a worrisome inconsistency. (Of course, the wider representation chosen in Ada 83 also remains eligible for selection in Ada 95.)

16.d/5

As an example of this phenomenon, assume that Float is represented in single precision and that a double precision type is also available. Also assume hexadecimal hardware with clean properties, for example certain IBM hardware. Then,

16.e

type T is digits Float'Digits range -Float'Last .. Float'Last;

16.f

results in T being represented in double precision in Ada 83 and in single precision in Ada 95. The latter is intuitively correct; the former is counterintuitive. The reason why the double precision type is used in Ada 83 is that Float has model and safe numbers (in Ada 83) with 21 binary digits in their mantissas, as is required to model the hypothesized hexadecimal hardware using a binary radix; thus Float'Last, which is not a model number, is slightly outside the range of safe numbers of the single precision type, making that type ineligible for selection as the representation of T even though it provides adequate precision. In Ada 95, Float'Last (the same value as before) is a model number and is in the safe range of Float on the hypothesized hardware, making Float eligible for the representation of T.

Extensions to Ada 83

16.g

Giving the model numbers the hardware radix allows for practical implementations on decimal hardware.

Wording Changes from Ada 83

16.h

The wording of the model of floating point arithmetic has been simplified to a large extent.

G.2.2 Model-Oriented Attributes of Floating Point Types

1

In implementations that support the Numerics Annex, the model-oriented attributes of floating point types shall yield the values defined here, in both the strict and the relaxed modes. These definitions add conditions to those in A.5.3.

Static Semantics

2

For every subtype S of a floating point type T:

3/2

S'Model_Mantissa
Yields the number of digits in the mantissa of the canonical form of the model numbers of T (see A.5.3). The value of this attribute shall be greater than or equal to
3.1/2

d · log(10) / log(T'Machine_Radix)⌉ + g

3.2/2
where d is the requested decimal precision of T, and g is 0 if T'Machine_Radix is a positive power of 10 and 1 otherwise. In addition, T'Model_Mantissa shall be less than or equal to the value of T'Machine_Mantissa. This attribute yields a value of the type universal_integer.
3.a
ramification

S'Model_Epsilon, which is defined in terms of S'Model_Mantissa (see A.5.3), yields the absolute value of the difference between one and the next model number of the type T above one. It is equal to or larger than the absolute value of the difference between one and the next machine number of the type T above one.

4

S'Model_Emin
Yields the minimum exponent of the canonical form of the model numbers of T (see A.5.3). The value of this attribute shall be greater than or equal to the value of T'Machine_Emin. This attribute yields a value of the type universal_integer.
4.a
ramification

S'Model_Small, which is defined in terms of S'Model_Emin (see A.5.3), yields the smallest positive (nonzero) model number of the type T.

5

S'Safe_First
Yields the lower bound of the safe range of T. The value of this attribute shall be a model number of T and greater than or equal to the lower bound of the base range of T. In addition, if T is declared by a floating_point_definition or is derived from such a type, and the floating_point_definition includes a real_range_specification specifying a lower bound of lb, then the value of this attribute shall be less than or equal to lb; otherwise, it shall be less than or equal to –10.0 4 · d, where d is the requested decimal precision of T. This attribute yields a value of the type universal_real.
6

S'Safe_Last
Yields the upper bound of the safe range of T. The value of this attribute shall be a model number of T and less than or equal to the upper bound of the base range of T. In addition, if T is declared by a floating_point_definition or is derived from such a type, and the floating_point_definition includes a real_range_specification specifying an upper bound of ub, then the value of this attribute shall be greater than or equal to ub; otherwise, it shall be greater than or equal to 10.0 4 · d, where d is the requested decimal precision of T. This attribute yields a value of the type universal_real.
7

S'Model
Denotes a function (of a parameter X) whose specification is given in A.5.3. If X is a model number of T, the function yields X; otherwise, it yields the value obtained by rounding or truncating X to either one of the adjacent model numbers of T. Constraint_Error is raised if the resulting model number is outside the safe range of S. A zero result has the sign of X when S'Signed_Zeros is True.
8

Subject to the constraints given above, the values of S'Model_Mantissa and S'Safe_Last are to be maximized, and the values of S'Model_Emin and S'Safe_First minimized, by the implementation as follows:

9
  • First, S'Model_Mantissa is set to the largest value for which values of S'Model_Emin, S'Safe_First, and S'Safe_Last can be chosen so that the implementation satisfies the strict-mode requirements of G.2.1 in terms of the model numbers and safe range induced by these attributes.
  • 10
  • Next, S'Model_Emin is set to the smallest value for which values of S'Safe_First and S'Safe_Last can be chosen so that the implementation satisfies the strict-mode requirements of G.2.1 in terms of the model numbers and safe range induced by these attributes and the previously determined value of S'Model_Mantissa.
  • 11/3
  • Finally, S'Safe_First and S'Safe_Last are set (in either order) to the smallest and largest values, respectively, for which the implementation satisfies the strict-mode requirements of G.2.1 in terms of the model numbers and safe range induced by these attributes and the previously determined values of S'Model_Mantissa and S'Model_Emin.
11.a/5
ramification

The following table shows appropriate attribute values for IEEE basic single and double precision types (ANSI/IEEE Std 754-1985, ISO/IEC 60559:2020). Here, we use the names IEEE_Float_32 and IEEE_Float_64, the names that would typically be declared in package Interfaces, in an implementation that supports IEEE arithmetic. In such an implementation, the attributes would typically be the same for Standard.Float and Long_Float, respectively.

11.b

Attribute IEEE_Float_32 IEEE_Float_64 11.c 'Machine_Radix 2 2 'Machine_Mantissa 24 53 'Machine_Emin -125 -1021 'Machine_Emax 128 1024 'Denorm True True 'Machine_Rounds True True 'Machine_Overflows True/False True/False 'Signed_Zeros should be True should be True 11.d 'Model_Mantissa (same as 'Machine_Mantissa) (same as 'Machine_Mantissa) 'Model_Emin (same as 'Machine_Emin) (same as 'Machine_Emin) 'Model_Epsilon 2.0**(-23) 2.0**(-52) 'Model_Small 2.0**(-126) 2.0**(-1022) 'Safe_First -2.0**128*(1.0-2.0**(-24)) -2.0**1024*(1.0-2.0**(-53)) 'Safe_Last 2.0**128*(1.0-2.0**(-24)) 2.0**1024*(1.0-2.0**(-53)) 11.e 'Digits 6 15 'Base'Digits (same as 'Digits) (same as 'Digits) 11.f 'First (same as 'Safe_First) (same as 'Safe_First) 'Last (same as 'Safe_Last) (same as 'Safe_Last) 'Size 32 64

11.g

Note: 'Machine_Overflows can be True or False, depending on whether the Ada implementation raises Constraint_Error or delivers a signed infinity in overflow and zerodivide situations (and at poles of the elementary functions).

Wording Changes from Ada 95

11.h/2

Corrected the definition of Model_Mantissa to match that given in 3.5.8.

G.2.3 Model of Fixed Point Arithmetic

1

In the strict mode, the predefined arithmetic operations of a fixed point type shall satisfy the accuracy requirements specified here and shall avoid or signal overflow in the situations described.

Implementation Requirements

2

The accuracy requirements for the predefined fixed point arithmetic operations and conversions, and the results of relations on fixed point operands, are given below.

2.a
discussion

This subclause does not cover the accuracy of an operation of a static expression; such operations have to be evaluated exactly (see 4.9).

3

The operands of the fixed point adding operators, absolute value, and comparisons have the same type. These operations are required to yield exact results, unless they overflow.

4

Multiplications and divisions are allowed between operands of any two fixed point types; the result has to be (implicitly or explicitly) converted to some other numeric type. For purposes of defining the accuracy rules, the multiplication or division and the conversion are treated as a single operation whose accuracy depends on three types (those of the operands and the result). For decimal fixed point types, the attribute T'Round may be used to imply explicit conversion with rounding (see 3.5.10).

5

When the result type is a floating point type, the accuracy is as given in G.2.1. For some combinations of the operand and result types in the remaining cases, the result is required to belong to a small set of values called the perfect result set; for other combinations, it is required merely to belong to a generally larger and implementation-defined set of values called the close result set. When the result type is a decimal fixed point type, the perfect result set contains a single value; thus, operations on decimal types are always fully specified.

5.a
implementation defined

The definition of close result set, which determines the accuracy of certain fixed point multiplications and divisions.

6/5

When one operand of a fixed-fixed multiplication or division is of type universal_real, that operand is not implicitly converted in the usual sense, since the context does not determine a unique target type, but the accuracy of the result of the multiplication or division (that is, whether the result has to belong to the perfect result set or merely the close result set) depends on the value of the operand of type universal_real and on the types of the other operand and of the result.

6.a/5
discussion

We need not consider here the multiplication or division of two such operands, since in that case either the operation is evaluated exactly (that is, it is an operation of a static expression all of whose operators are of a root numeric type) or it is considered to be an operation of a floating point type.

7

For a fixed point multiplication or division whose (exact) mathematical result is v, and for the conversion of a value v to a fixed point type, the perfect result set and close result set are defined as follows:

8
  • If the result type is an ordinary fixed point type with a small of s,
9
  • if v is an integer multiple of s, then the perfect result set contains only the value v;
  • 10
  • otherwise, it contains the integer multiple of s just below v and the integer multiple of s just above v.
11
  • The close result set is an implementation-defined set of consecutive integer multiples of s containing the perfect result set as a subset.
  • 12
  • If the result type is a decimal type with a small of s,
13
  • if v is an integer multiple of s, then the perfect result set contains only the value v;
  • 14/3
  • otherwise, if truncation applies, then it contains only the integer multiple of s in the direction toward zero, whereas if rounding applies, then it contains only the nearest integer multiple of s (with ties broken by rounding away from zero).
15
  • The close result set is an implementation-defined set of consecutive integer multiples of s containing the perfect result set as a subset.
15.a
ramification

As a consequence of subsequent rules, this case does not arise when the operand types are also decimal types.

16
  • If the result type is an integer type,
17
  • if v is an integer, then the perfect result set contains only the value v;
  • 18
  • otherwise, it contains the integer nearest to the value v (if v lies equally distant from two consecutive integers, the perfect result set contains the one that is further from zero).
19
  • The close result set is an implementation-defined set of consecutive integers containing the perfect result set as a subset.
20/5

The result of a fixed point multiplication or division shall belong either to the perfect result set or to the close result set, as described below, if overflow does not occur. In the following cases, if the result type is a fixed point type, let s be its small; otherwise, that is when the result type is an integer type, let s be 1.0.

21/5
  • For a multiplication or division neither of whose operands is of type universal_real, let l and r be the smalls of the left and right operands. For a multiplication, if (l · r) / s is an integer or the reciprocal of an integer (the smalls are said to be “compatible” in this case), the result shall belong to the perfect result set; otherwise, it belongs to the close result set. For a division, if l / (r · s) is an integer or the reciprocal of an integer (that is, the smalls are compatible), the result shall belong to the perfect result set; otherwise, it belongs to the close result set.
21.a
ramification

When the operand and result types are all decimal types, their smalls are necessarily compatible; the same is true when they are all ordinary fixed point types with binary smalls.

22
  • For a multiplication or division having one universal_real operand with a value of v, note that it is always possible to factor v as an integer multiple of a “compatible” small, but the integer multiple may be “too big”. If there exists a factorization in which that multiple is less than some implementation-defined limit, the result shall belong to the perfect result set; otherwise, it belongs to the close result set.
22.a
implementation defined

Conditions on a universal_real operand of a fixed point multiplication or division for which the result shall be in the perfect result set.

23/5

A multiplication P * Q of an operand of a fixed point type F by an operand of type Integer, or vice versa, and a division P / Q of an operand of a fixed point type F by an operand of type Integer, are also allowed. In these cases, the result has the type of F; explicit conversion of the result is never required. The accuracy required in these cases is the same as that required for a multiplication F(P * Q) or a division F(P / Q) obtained by interpreting the operand of the integer type to have a fixed point type with a small of 1.0.

24

The accuracy of the result of a conversion from an integer or fixed point type to a fixed point type, or from a fixed point type to an integer type, is the same as that of a fixed point multiplication of the source value by a fixed point operand having a small of 1.0 and a value of 1.0, as given by the foregoing rules. The result of a conversion from a floating point type to a fixed point type shall belong to the close result set. The result of a conversion of a universal_real operand to a fixed point type shall belong to the perfect result set.

25

The possibility of overflow in the result of a predefined arithmetic operation or conversion yielding a result of a fixed point type T is analogous to that for floating point types, except for being related to the base range instead of the safe range. If all of the permitted results belong to the base range of T, then the implementation shall deliver one of the permitted results; otherwise,

26
  • if T'Machine_Overflows is True, the implementation shall either deliver one of the permitted results or raise Constraint_Error;
  • 27
  • if T'Machine_Overflows is False, the result is implementation defined.
27.a
implementation defined

The result of a fixed point arithmetic operation in overflow situations, when the Machine_Overflows attribute of the result type is False.

Inconsistencies With Ada 83

27.b

Since the values of a fixed point type are now just the integer multiples of its small, the possibility of using extra bits available in the chosen representation for extra accuracy rather than for increasing the base range would appear to be removed, raising the possibility that some fixed point expressions will yield less accurate results than in Ada 83. However, this is partially offset by the ability of an implementation to choose a smaller default small than before. Of course, if it does so for a type T then T'Small will have a different value than it previously had.

27.c

The accuracy requirements in the case of incompatible smalls are relaxed to foster wider support for nonbinary smalls. If this relaxation is exploited for a type that was previously supported, lower accuracy could result; however, there is no particular incentive to exploit the relaxation in such a case.

Wording Changes from Ada 83

27.d

The fixed point accuracy requirements are now expressed without reference to model or safe numbers, largely because the full generality of the former model was never exploited in the case of fixed point types (particularly in regard to operand perturbation). Although the new formulation in terms of perfect result sets and close result sets is still verbose, it can be seen to distill down to two cases:

27.e
  • a case where the result must be the exact result, if the exact result is representable, or, if not, then either one of the adjacent values of the type (in some subcases only one of those adjacent values is allowed);
  • 27.f
  • a case where the accuracy is not specified by the language.

Wording Changes from Ada 2012

27.g/5
correction

Reworded the fixed*integer accuracy requirements to clarify that the only allowed integer type in such operations is Standard.Integer. We make this correction as readers of the Reference Manual have been confused on this point.

G.2.4 Accuracy Requirements for the Elementary Functions

1

In the strict mode, the performance of Numerics.Generic_Elementary_Functions shall be as specified here.

Implementation Requirements

2

When an exception is not raised, the result of evaluating a function in an instance EF of Numerics.Generic_Elementary_Functions belongs to a result interval, defined as the smallest model interval of EF.Float_Type that contains all the values of the form f · (1.0 + d), where f is the exact value of the corresponding mathematical function at the given parameter values, d is a real number, and |d| is less than or equal to the function's maximum relative error. The function delivers a value that belongs to the result interval when both of its bounds belong to the safe range of EF.Float_Type; otherwise,

3
  • if EF.Float_Type'Machine_Overflows is True, the function either delivers a value that belongs to the result interval or raises Constraint_Error, signaling overflow;
  • 4
  • if EF.Float_Type'Machine_Overflows is False, the result is implementation defined.
4.a
implementation defined

The result of an elementary function reference in overflow situations, when the Machine_Overflows attribute of the result type is False.

5

The maximum relative error exhibited by each function is as follows:

6
  • 2.0 · EF.Float_Type'Model_Epsilon, in the case of the Sqrt, Sin, and Cos functions;
  • 7
  • 4.0 · EF.Float_Type'Model_Epsilon, in the case of the Log, Exp, Tan, Cot, and inverse trigonometric functions; and
  • 8
  • 8.0 · EF.Float_Type'Model_Epsilon, in the case of the forward and inverse hyperbolic functions.
9

The maximum relative error exhibited by the exponentiation operator, which depends on the values of the operands, is (4.0 + |Right · log(Left)| / 32.0) · EF.Float_Type'Model_Epsilon.

10

The maximum relative error given above applies throughout the domain of the forward trigonometric functions when the Cycle parameter is specified. When the Cycle parameter is omitted, the maximum relative error given above applies only when the absolute value of the angle parameter X is less than or equal to some implementation-defined angle threshold, which shall be at least EF.Float_Type'Machine_Radix ⌊EF.Float_Type'Machine_Mantissa/2⌋. Beyond the angle threshold, the accuracy of the forward trigonometric functions is implementation defined.

10.a
implementation defined

The value of the angle threshold, within which certain elementary functions, complex arithmetic operations, and complex elementary functions yield results conforming to a maximum relative error bound.

10.b
implementation defined

The accuracy of certain elementary functions for parameters beyond the angle threshold.

10.c
implementation note

The angle threshold indirectly determines the amount of precision that the implementation has to maintain during argument reduction.

11/5

The prescribed results specified in A.5.1 for certain functions at particular parameter values take precedence over the maximum relative error bounds; effectively, they narrow to a single value the result interval allowed by the maximum relative error bounds. Additional rules with a similar effect are given by Table G.1 for the inverse trigonometric functions, at particular parameter values for which the mathematical result is possibly not a model number of EF.Float_Type (or is, indeed, even transcendental). In each table entry, the values of the parameters are such that the result lies on the axis between two quadrants; the corresponding accuracy rule, which takes precedence over the maximum relative error bounds, is that the result interval is the model interval of EF.Float_Type associated with the exact mathematical result given in the table.

12/1

This paragraph was deleted.

13

The last line of the table is meant to apply when EF.Float_Type'Signed_Zeros is False; the two lines just above it, when EF.Float_Type'Signed_Zeros is True and the parameter Y has a zero value with the indicated sign.

14

Table G.1: Tightly Approximated Elementary Function ResultsFunctionValue of XValue of YExact Result
when Cycle
Specified
Exact Result
when Cycle
Omitted
Arcsin1.0n.a.Cycle/4.0π/2.0Arcsin–1.0n.a.–Cycle/4.0–π/2.0Arccos0.0n.a.Cycle/4.0π/2.0Arccos–1.0n.a.Cycle/2.0πArctan and Arccot0.0positiveCycle/4.0π/2.0Arctan and Arccot0.0negative–Cycle/4.0–π/2.0Arctan and Arccotnegative+0.0Cycle/2.0πArctan and Arccotnegative–0.0–Cycle/2.0–πArctan and Arccotnegative0.0Cycle/2.0πThe amount by which the result of an inverse trigonometric function is allowed to spill over into a quadrant adjacent to the one corresponding to the principal branch, as given in A.5.1, is limited. The rule is that the result belongs to the smallest model interval of EF.Float_Type that contains both boundaries of the quadrant corresponding to the principal branch. This rule also takes precedence over the maximum relative error bounds, effectively narrowing the result interval allowed by them.

15

Finally, the following specifications also take precedence over the maximum relative error bounds:

16
  • The absolute value of the result of the Sin, Cos, and Tanh functions never exceeds one.
  • 17
  • The absolute value of the result of the Coth function is never less than one.
  • 18
  • The result of the Cosh function is never less than one.

Implementation Advice

19

The versions of the forward trigonometric functions without a Cycle parameter should not be implemented by calling the corresponding version with a Cycle parameter of 2.0*Numerics.Pi, since this will not provide the required accuracy in some portions of the domain. For the same reason, the version of Log without a Base parameter should not be implemented by calling the corresponding version with a Base parameter of Numerics.e.

19.a.1/2
implementation advice

For elementary functions, the forward trigonometric functions without a Cycle parameter should not be implemented by calling the corresponding version with a Cycle parameter. Log without a Base parameter should not be implemented by calling Log with a Base parameter.

Wording Changes from Ada 83

19.a

The semantics of Numerics.Generic_Elementary_Functions differs from Generic_Elementary_Functions as defined in ISO/IEC DIS 11430 (for Ada 83) in the following ways related to the accuracy specified for strict mode:

19.b
  • The maximum relative error bounds use the Model_Epsilon attribute instead of the Base'Epsilon attribute.
  • 19.c
  • The accuracy requirements are expressed in terms of result intervals that are model intervals. On the one hand, this facilitates the description of the required results in the presence of underflow; on the other hand, it slightly relaxes the requirements expressed in ISO/IEC DIS 11430.

G.2.5 Performance Requirements for Random Number Generation

1

In the strict mode, the performance of Numerics.Float_Random and Numerics.Discrete_Random shall be as specified here.

Implementation Requirements

2

Two different calls to the time-dependent Reset procedure shall reset the generator to different states, provided that the calls are separated in time by at least one second and not more than fifty years.

3

The implementation's representations of generator states and its algorithms for generating random numbers shall yield a period of at least 231–2; much longer periods are desirable but not required.

4/5

The implementations of Numerics.Float_Random.Random and Numerics.Discrete_Random.Random shall pass at least 85% of the individual trials in a suite of statistical tests. For Numerics.Float_Random, the tests are applied directly to the floating point values generated (that is, they are not converted to integers first), while for Numerics.Discrete_Random they are applied to the generated values of various discrete types. Each test suite performs 6 different tests, with each test repeated 10 times, yielding a total of 60 individual trials. An individual trial is deemed to pass if the chi-square value (or other statistic) calculated for the observed counts or distribution falls within the range of values corresponding to the 2.5 and 97.5 percentage points for the relevant degrees of freedom (that is, it shall be neither too high nor too low). For the purpose of determining the degrees of freedom, measurement categories are combined whenever the expected counts are fewer than 5.

4.a
implementation note

In the floating point random number test suite, the generator is reset to a time-dependent state at the beginning of the run. The test suite incorporates the following tests, adapted from D. E. Knuth, The Art of Computer Programming, vol. 2: Seminumerical Algorithms. In the descriptions below, the given number of degrees of freedom is the number before reduction due to any necessary combination of measurement categories with small expected counts; it is one less than the number of measurement categories.

4.b
  • Proportional Distribution Test (a variant of the Equidistribution Test). The interval 0.0 .. 1.0 is partitioned into K subintervals. K is chosen randomly between 4 and 25 for each repetition of the test, along with the boundaries of the subintervals (subject to the constraint that at least 2 of the subintervals have a width of 0.001 or more). 5000 random floating point numbers are generated. The counts of random numbers falling into each subinterval are tallied and compared with the expected counts, which are proportional to the widths of the subintervals. The number of degrees of freedom for the chi-square test is K–1.
  • 4.c
  • Gap Test. The bounds of a range A .. B, with 0.0 ≤ A < B ≤ 1.0, are chosen randomly for each repetition of the test, subject to the constraint that 0.2 ≤ BA ≤ 0.6. Random floating point numbers are generated until 5000 falling into the range A .. B have been encountered. Each of these 5000 is preceded by a “gap” (of length greater than or equal to 0) of consecutive random numbers not falling into the range A .. B. The counts of gaps of each length from 0 to 15, and of all lengths greater than 15 lumped together, are tallied and compared with the expected counts. Let P = BA. The probability that a gap has a length of L is (1–P) L · P for L ≤ 15, while the probability that a gap has a length of 16 or more is (1–P) 16. The number of degrees of freedom for the chi-square test is 16.
  • 4.d
  • Permutation Test. 5000 tuples of 4 different random floating point numbers are generated. (An entire 4-tuple is discarded in the unlikely event that it contains any two exactly equal components.) The counts of each of the 4! = 24 possible relative orderings of the components of the 4-tuples are tallied and compared with the expected counts. Each of the possible relative orderings has an equal probability. The number of degrees of freedom for the chi-square test is 23.
  • 4.e
  • Increasing-Runs Test. Random floating point numbers are generated until 5000 increasing runs have been observed. An “increasing run” is a sequence of random numbers in strictly increasing order; it is followed by a random number that is strictly smaller than the preceding random number. (A run under construction is entirely discarded in the unlikely event that one random number is followed immediately by an exactly equal random number.) The decreasing random number that follows an increasing run is discarded and not included with the next increasing run. The counts of increasing runs of each length from 1 to 4, and of all lengths greater than 4 lumped together, are tallied and compared with the expected counts. The probability that an increasing run has a length of L is 1/L! – 1/(L+1)! for L ≤ 4, while the probability that an increasing run has a length of 5 or more is 1/5!. The number of degrees of freedom for the chi-square test is 4.
  • 4.f
  • Decreasing-Runs Test. The test is similar to the Increasing Runs Test, but with decreasing runs.
  • 4.g
  • Maximum-of-t Test (with t = 5). 5000 tuples of 5 random floating point numbers are generated. The maximum of the components of each 5-tuple is determined and raised to the 5th power. The uniformity of the resulting values over the range 0.0 .. 1.0 is tested as in the Proportional Distribution Test.
4.h
implementation note

In the discrete random number test suite, Numerics.Discrete_Random is instantiated as described below. The generator is reset to a time-dependent state after each instantiation. The test suite incorporates the following tests, adapted from D. E. Knuth (op. cit.) and other sources. The given number of degrees of freedom for the chi-square test is reduced by any necessary combination of measurement categories with small expected counts, as described above.

4.i
  • Equidistribution Test. In each repetition of the test, a number R between 2 and 30 is chosen randomly, and Numerics.Discrete_Random is instantiated with an integer subtype whose range is 1 .. R. 5000 integers are generated randomly from this range. The counts of occurrences of each integer in the range are tallied and compared with the expected counts, which have equal probabilities. The number of degrees of freedom for the chi-square test is R–1.
  • 4.j
  • Simplified Poker Test. Numerics.Discrete_Random is instantiated once with an enumeration subtype representing the 13 denominations (Two through Ten, Jack, Queen, King, and Ace) of an infinite deck of playing cards. 2000 “poker” hands (5-tuples of values of this subtype) are generated randomly. The counts of hands containing exactly K different denominations (1 ≤ K ≤ 5) are tallied and compared with the expected counts. The probability that a hand contains exactly K different denominations is given by a formula in Knuth. The number of degrees of freedom for the chi-square test is 4.
  • 4.k
  • Coupon Collector's Test. Numerics.Discrete_Random is instantiated in each repetition of the test with an integer subtype whose range is 1 .. R, where R varies systematically from 2 to 11. Integers are generated randomly from this range until each value in the range has occurred, and the number K of integers generated is recorded. This constitutes a “coupon collector's segment” of length K. 2000 such segments are generated. The counts of segments of each length from R to R+29, and of all lengths greater than R+29 lumped together, are tallied and compared with the expected counts. The probability that a segment has any given length is given by formulas in Knuth. The number of degrees of freedom for the chi-square test is 30.
  • 4.l
  • Craps Test (Lengths of Games). Numerics.Discrete_Random is instantiated once with an integer subtype whose range is 1 .. 6 (representing the six numbers on a die). 5000 craps games are played, and their lengths are recorded. (The length of a craps game is the number of rolls of the pair of dice required to produce a win or a loss. A game is won on the first roll if the dice show 7 or 11; it is lost if they show 2, 3, or 12. If the dice show some other sum on the first roll, it is called the point, and the game is won if and only if the point is rolled again before a 7 is rolled.) The counts of games of each length from 1 to 18, and of all lengths greater than 18 lumped together, are tallied and compared with the expected counts. For 2 ≤ S ≤ 12, let D S be the probability that a roll of a pair of dice shows the sum S, and let Q S(L) = D S · (1 – (D S + D 7)) L–2 · (D S + D 7). Then, the probability that a game has a length of 1 is D 7 + D 11 + D 2 + D 3 + D 12 and, for L > 1, the probability that a game has a length of L is Q 4(L) + Q 5(L) + Q 6(L) + Q 8(L) + Q 9(L) + Q 10(L). The number of degrees of freedom for the chi-square test is 18.
  • 4.m
  • Craps Test (Lengths of Passes). This test is similar to the last, but enough craps games are played for 3000 losses to occur. A string of wins followed by a loss is called a pass, and its length is the number of wins preceding the loss. The counts of passes of each length from 0 to 7, and of all lengths greater than 7 lumped together, are tallied and compared with the expected counts. For L ≥ 0, the probability that a pass has a length of L is W L · (1–W), where W, the probability that a game ends in a win, is 244.0/495.0. The number of degrees of freedom for the chi-square test is 8.
  • 4.n
  • Collision Test. Numerics.Discrete_Random is instantiated once with an integer or enumeration type representing binary bits. 15 successive calls on the Random function are used to obtain the bits of a 15-bit binary integer between 0 and 32767. 3000 such integers are generated, and the number of collisions (integers previously generated) is counted and compared with the expected count. A chi-square test is not used to assess the number of collisions; rather, the limits on the number of collisions, corresponding to the 2.5 and 97.5 percentage points, are (from formulas in Knuth) 112 and 154. The test passes if and only if the number of collisions is in this range.

G.2.6 Accuracy Requirements for Complex Arithmetic

1

In the strict mode, the performance of Numerics.Generic_Complex_Types and Numerics.Generic_Complex_Elementary_Functions shall be as specified here.

Implementation Requirements

2/5

When an exception is not raised, the result of evaluating a real function of an instance CT of Numerics.Generic_Complex_Types (that is, a function that yields a value of subtype CT.Real'Base or CT.Imaginary) belongs to a result interval defined as for a real elementary function (see G.2.4).

3/5

When an exception is not raised, each component of the result of evaluating a complex function of such an instance, or of an instance of Numerics.Generic_Complex_Elementary_Functions obtained by instantiating the latter with CT (that is, a function that yields a value of subtype CT.Complex), also belongs to a result interval. The result intervals for the components of the result are either defined by a maximum relative error bound or by a maximum box error bound. When the result interval for the real (resp., imaginary) component is defined by maximum relative error, it is defined as for that of a real function, relative to the exact value of the real (resp., imaginary) part of the result of the corresponding mathematical function. When defined by maximum box error, the result interval for a component of the result is the smallest model interval of CT.Real that contains all the values of the corresponding part of f · (1.0 + d), where f is the exact complex value of the corresponding mathematical function at the given parameter values, d is complex, and |d| is less than or equal to the given maximum box error. The function delivers a value that belongs to the result interval (or a value both of whose components belong to their respective result intervals) when both bounds of the result interval(s) belong to the safe range of CT.Real; otherwise,

3.a
discussion

The maximum relative error could be specified separately for each component, but we do not take advantage of that freedom here.

3.b
discussion

Note that f · (1.0 + d) defines a small circular region of the complex plane centered at f, and the result intervals for the real and imaginary components of the result define a small rectangular box containing that circle.

3.c
reason

Box error is used when the computation of the result risks loss of significance in a component due to cancellation.

3.d
ramification

The components of a complex function that exhibits bounded relative error in each component have to have the correct sign. In contrast, one of the components of a complex function that exhibits bounded box error may have the wrong sign, since the dimensions of the box containing the result are proportional to the modulus of the mathematical result and not to either component of the mathematical result individually. Thus, for example, the box containing the computed result of a complex function whose mathematical result has a large modulus but lies very close to the imaginary axis might well straddle that axis, allowing the real component of the computed result to have the wrong sign. In this case, the distance between the computed result and the mathematical result is, nevertheless, a small fraction of the modulus of the mathematical result.

4
  • if CT.Real'Machine_Overflows is True, the function either delivers a value that belongs to the result interval (or a value both of whose components belong to their respective result intervals) or raises Constraint_Error, signaling overflow;
  • 5
  • if CT.Real'Machine_Overflows is False, the result is implementation defined.
5.a
implementation defined

The result of a complex arithmetic operation or complex elementary function reference in overflow situations, when the Machine_Overflows attribute of the corresponding real type is False.

6/2

The error bounds for particular complex functions are tabulated in Table G.2. In the table, the error bound is given as the coefficient of CT.Real'Model_Epsilon.

7/1

This paragraph was deleted.

8

Table G.2: Error Bounds for Particular Complex FunctionsFunction or OperatorNature of
Result
Nature of
Bound
Error BoundModulusrealmax. rel. error3.0Argumentrealmax. rel. error4.0Compose_From_Polarcomplexmax. rel. error3.0"*" (both operands complex)complexmax. box error5.0"/" (right operand complex)complexmax. box error13.0Sqrtcomplexmax. rel. error6.0Logcomplexmax. box error13.0Exp (complex parameter)complexmax. rel. error7.0Exp (imaginary parameter)complexmax. rel. error2.0Sin, Cos, Sinh, and Coshcomplexmax. rel. error11.0Tan, Cot, Tanh, and Cothcomplexmax. rel. error35.0inverse trigonometriccomplexmax. rel. error14.0inverse hyperboliccomplexmax. rel. error14.0The maximum relative error given above applies throughout the domain of the Compose_From_Polar function when the Cycle parameter is specified. When the Cycle parameter is omitted, the maximum relative error applies only when the absolute value of the parameter Argument is less than or equal to the angle threshold (see G.2.4). For the Exp function, and for the forward hyperbolic (resp., trigonometric) functions, the maximum relative error given above likewise applies only when the absolute value of the imaginary (resp., real) component of the parameter X (or the absolute value of the parameter itself, in the case of the Exp function with a parameter of pure-imaginary type) is less than or equal to the angle threshold. For larger angles, the accuracy is implementation defined.

8.a
implementation defined

The accuracy of certain complex arithmetic operations and certain complex elementary functions for parameters (or components thereof) beyond the angle threshold.

9

The prescribed results specified in G.1.2 for certain functions at particular parameter values take precedence over the error bounds; effectively, they narrow to a single value the result interval allowed by the error bounds for a component of the result. Additional rules with a similar effect are given below for certain inverse trigonometric and inverse hyperbolic functions, at particular parameter values for which a component of the mathematical result is transcendental. In each case, the accuracy rule, which takes precedence over the error bounds, is that the result interval for the stated result component is the model interval of CT.Real associated with the component's exact mathematical value. The cases in question are as follows:

10
  • When the parameter X has the value zero, the real (resp., imaginary) component of the result of the Arccot (resp., Arccoth) function is in the model interval of CT.Real associated with the value π/2.0.
  • 11
  • When the parameter X has the value one, the real component of the result of the Arcsin function is in the model interval of CT.Real associated with the value π/2.0.
  • 12
  • When the parameter X has the value –1.0, the real component of the result of the Arcsin (resp., Arccos) function is in the model interval of CT.Real associated with the value –π/2.0 (resp., π).
12.a
discussion

It is possible to give many other prescribed results in which a component of the parameter is restricted to a similar model interval when the parameter X is appropriately restricted to an easily testable portion of the domain. We follow the proposed ISO/IEC standard for Generic_Complex_Elementary_Functions (for Ada 83) in not doing so, however.

13/2

The amount by which a component of the result of an inverse trigonometric or inverse hyperbolic function is allowed to spill over into a quadrant adjacent to the one corresponding to the principal branch, as given in G.1.2, is limited. The rule is that the result belongs to the smallest model interval of CT.Real that contains both boundaries of the quadrant corresponding to the principal branch. This rule also takes precedence over the maximum error bounds, effectively narrowing the result interval allowed by them.

14

Finally, the results allowed by the error bounds are narrowed by one further rule: The absolute value of each component of the result of the Exp function, for a pure-imaginary parameter, never exceeds one.

Implementation Advice

15

The version of the Compose_From_Polar function without a Cycle parameter should not be implemented by calling the corresponding version with a Cycle parameter of 2.0*Numerics.Pi, since this will not provide the required accuracy in some portions of the domain.

15.a.1/2
implementation advice

For complex arithmetic, the Compose_From_Polar function without a Cycle parameter should not be implemented by calling Compose_From_Polar with a Cycle parameter.

Wording Changes from Ada 83

15.a

The semantics of Numerics.Generic_Complex_Types and Numerics.Generic_Complex_Elementary_Functions differs from Generic_Complex_Types and Generic_Complex_Elementary_Functions as defined in ISO/IEC CDs 13813 and 13814 (for Ada 83) in ways analogous to those identified for the elementary functions in G.2.4. In addition, we do not generally specify the signs of zero results (or result components), although those proposed standards do.