I chatted with Andrew at POPL and he asked me to file this issue, about a floating-point arithmetic trick that isn't verified/available in VCFloat today.
It is sometimes useful to know that the product of two floating-point numbers, or of a floating-point number and an integer, is exact if the numbers have many zeros at the start/end. For example, if you have a multiplication 3 * x, you know that 3 only has the last two bits set, so if x ends in a zero bit, the multiplication is exact. This trick is prominently used in math libraries, in a few ways.
One common use case is in Cody-Waite range reduction. This algorithm computes x % PI (or another constant) in several steps:
- First, divide
x by PI and cast to integer k
- Subtract
PI * k
- To handle rounding error in
PI, have another constant PI_2 (so that PI_2 is real PI minus float PI), and subtract PI_2 * k
This algorithms works only if the product PI * k is exact; otherwise, that step introduces a larger error than the PI_2 * k correction. To guarantee this, the PI constant doesn't use all 53 bits; instead, it uses, say, only 33 bits, and then k can be as large as 20 bits meaning that it works for |x| < 2^20 * PI. There's a tradeoff between the number of bits used for the PI constants and the number of bits used for k, and the implementor can choose values based on their desired performance tradeoffs for different inputs.
That case uses exact integer-float multiplication, but another use case of this exact multiplication is float-float. Consider acos in fdlibm, where (in the else branch) we compute:
s = sqrt(z);
df = s;
__LO(df) = 0;
c = (z-df*df)/(s+df);
Here the s and c variables are a double-double representation of the square root of z. To compute this, we first compute s directly and then correct for the error by squaring it, subtracting from z, and rescaling. But first, we define df which drops the lower half of s, so that the squaring step is done exactly. Then z - df*df is also exact through Sterbenz subtraction, which is important to preserving the error. If the multiply isn't exact, the error in df * df is as large as z - df*df, meaning that c is basically totally useless.
I'm not sure what it would take to add exact multiplication to VCFloat, but a POPL'18 paper on verifying math libraries tracks bit widths of various values and uses that as a verification condition for exact multiplies, perhaps something similar is possible.
I chatted with Andrew at POPL and he asked me to file this issue, about a floating-point arithmetic trick that isn't verified/available in VCFloat today.
It is sometimes useful to know that the product of two floating-point numbers, or of a floating-point number and an integer, is exact if the numbers have many zeros at the start/end. For example, if you have a multiplication
3 * x, you know that3only has the last two bits set, so ifxends in a zero bit, the multiplication is exact. This trick is prominently used in math libraries, in a few ways.One common use case is in Cody-Waite range reduction. This algorithm computes
x % PI(or another constant) in several steps:xbyPIand cast to integerkPI * kPI, have another constantPI_2(so thatPI_2is real PI minus float PI), and subtractPI_2 * kThis algorithms works only if the product
PI * kis exact; otherwise, that step introduces a larger error than thePI_2 * kcorrection. To guarantee this, thePIconstant doesn't use all 53 bits; instead, it uses, say, only 33 bits, and thenkcan be as large as 20 bits meaning that it works for|x| < 2^20 * PI. There's a tradeoff between the number of bits used for thePIconstants and the number of bits used fork, and the implementor can choose values based on their desired performance tradeoffs for different inputs.That case uses exact integer-float multiplication, but another use case of this exact multiplication is float-float. Consider
acosinfdlibm, where (in theelsebranch) we compute:Here the
sandcvariables are a double-double representation of the square root ofz. To compute this, we first computesdirectly and then correct for the error by squaring it, subtracting fromz, and rescaling. But first, we definedfwhich drops the lower half ofs, so that the squaring step is done exactly. Thenz - df*dfis also exact through Sterbenz subtraction, which is important to preserving the error. If the multiply isn't exact, the error indf * dfis as large asz - df*df, meaning thatcis basically totally useless.I'm not sure what it would take to add exact multiplication to VCFloat, but a POPL'18 paper on verifying math libraries tracks bit widths of various values and uses that as a verification condition for exact multiplies, perhaps something similar is possible.