RP2 Floating point accuracy #13075
Replies: 21 comments 49 replies
-
when I was testing esp32 vs RP2040 for Floating point calculation for Kalman filter algorithm i found out on Rasppbery pi forum some info about speed overall it is poor. I found that RP2040 ( with cortex M0 ) do not have FPU and from SDK docs it should meet IEEE 754 standard., But from My tests precision are not as good as in FPU and errors expand fast (due to type of math operation) - in RP2040 floating-point arithmetic routines is based on interpolators. |
Beta Was this translation helpful? Give feedback.
-
Two related discussions are here and here.
So the problem should be in 1, the internal FP routines.
Good question. I did your python version of precision test @GitHubsSilverBullet here on a Blackpill (stm32f411) board and it shows exactly the same results. I would interpret that in agreement with the above, that the basic FP calculations involved (addition, division by powers of 2) are the same as on the Pico and the FP representation is the same too. So, looking for the cause of the problem: Could the trigonometric computations differ? I seem to remember that for the FP math of the Pico a special library was acquired, that is extraordinarily fast. It is not the usual math lib of the GNU C compiler. I did not find this reference now, though. |
Beta Was this translation helpful? Give feedback.
-
I ran the Python test code of @GitHubsSilverBullet on various MicroPython boards. At boards with 32 bit floats, whether hardware or software including the RP2, the results were those of the second result list, at boards with 64 bit floats (double), it returned the first result list. |
Beta Was this translation helpful? Give feedback.
-
So here is the accuracy test I suggested, to make sure basic addition was accurate: import array
import uctypes
from machine import mem32
a = array.array('f', 1 for _ in range(32))
abase = uctypes.addressof(a)
for i in range(1,26):
a[i] = a[i-1] + 1 / (1 << i)
ai = mem32[abase + 4*i]
print(i, f"{a[i]:20.15f} {ai-0x3f800000:08x}") with result
which demonstrates that the internal math is working fine for this test case, but the output formatter is, indeed , limited by its own 32 bit precision |
Beta Was this translation helpful? Give feedback.
-
The problem is not with the output formatter: my application produces as output integers being datetimes expressed in seconds. It can be reproduced by importing this module. You only need to look at the sun rise time of day 0. This is consistent within ~20s on all platforms except RP2, where the error is 5 minutes. The code has one function @jimmo I'm pretty confident that the trig functions are called with reasonable argument values (smaller than +-4π radians). |
Beta Was this translation helpful? Give feedback.
-
@peterhinch I went and looked at the module you posted. The 'quad' function is strongly subject to roundoff error. You should use the method in 'Numerical Recipes', in which you always compute the larger root in absolute magnitude (if b is negative, use -b + sqrt(disc), and if b is positive, use -b - sqrt(disc)), and then compute the other root as c/first root (since the a*c is the product of the roots). This is not susceptible to roundoff error. Here is a modified version of 'quad()' which is formally stable: def quad(ym, yz, yp):
# See Astronomy on the PC P48-49
# finds the parabola throuh the three points (-1,ym), (0,yz), (1, yp)
# and returns the values of x where the parabola crosses zero
# (roots of the quadratic)
# and the number of roots (0, 1 or 2) within the interval [-1, 1]
nz = 0
a = 0.5 * (ym + yp) - yz
b = 0.5 * (yp - ym)
c = yz
xe = -b / (2 * a)
ye = (a * xe + b) * xe + c
dis = b * b - 4.0 * a * c # discriminant of y=a*x^2 +bx +c
if dis > 0: # parabola has roots
if b < 0:
z2 = (-b + sqrt(dis)) / (2*a) # z2 is larger root in magnitude
else:
z2 = (-b - sqrt(dis)) / (2*a)
z1 = (c/a) / z2 # z1 is always closer to zero
if fabs(z1) <= 1.0:
nz += 1
if fabs(z2) <= 1.0:
nz += 1
if z1 < -1.0:
z1 = z2
return nz, z1, z2, ye I checked, though, that this isn't actually your problem. This gives exactly the same answers as your quad, so apparently it is never dealing with roots that result from subtracting nearly-equal numbers. |
Beta Was this translation helpful? Give feedback.
-
Data from this paper showing the errors in IEEE 754 single-precision format (binary32) computations relative to the least bit of resolution reveal that the precision quite often is not optimal. (It is much worse for more special functions btw.)
I think that a combination of mpmath on the PC (for computing exact values of sufficiently high resolution) together with belay (for transferring the results of the same computations on the RPi Pico board transparently to the PC) might allow for a reproduction of such data with relatively low effort. |
Beta Was this translation helpful? Give feedback.
-
@peterhinch I will possibly upload a version of this in the afternoon that solves the roundoff problem. I think it is in the sin_alt routine, where a large number (51544.5) is subtracted from the complete time, including the hour. If that subtraction is done from the (integer, exact) mjd, and since 51544.5 also has an exact representation, one has a much smaller remaining number that one is adding the fractional hour to. This should gain at least a few bits of precision in the calculation of 't'.
I am not next to my pico right now to see if it helps.
Here is the relevant snippet (Not sure why this refuses to format... it is fine in preview):
```python
def lmstt(self, t):
# Takes the mjd and the longitude (west negative) and then returns
# the local sidereal time in hours. Im using Meeus formula 11.4
# instead of messing about with UTo and so on
# modified to use the pre-computed 't' value from sin_alt
d = t * 36525
lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - t * t * t / 38710000
return (lst % 360) / 15.0 + self.long / 15
def sin_alt(self, hour, func):
# Returns the sine of the altitude of the object (moon or sun)
# at an hour relative to the current date (mjd)
mjd0 = self.mjd + hour / 24.0
t0 = (mjd0 - 51544.5) / 36525.0
mjd1 = (self.mjd - 51544.5) + hour / 24.0
t1 = mjd1 / 36525.0
print(f"sin_alt mjd0={mjd0:.7f} t0={t0:.7f} mjd1={mjd1:.7f} t1={t1:.7f}")
dec, ra = func(t1)
# hour angle of object: one hour = 15 degrees. Note lmst() uses longitude
tau = 15.0 * (self.lmstt(t1) - ra)
# sin(alt) of object using the conversion formulas
salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
return salt
```
|
Beta Was this translation helpful? Give feedback.
-
I have reduced the code to a much simpler script which produces a single floating point value. It eliminates arguably contentious issues like the interpolator, epoch and time dependence. Feel free to cut and paste this into other platforms: from math import sin, cos, sqrt, atan, radians
LAT = 53.29756504536339 # Local defaults
LONG = -2.102811634540558
def frac(x):
return x % 1
def minisun(t):
p2 = 6.283185307
coseps = 0.91748
sineps = 0.39778
M = p2 * frac(0.993133 + 99.997361 * t)
DL = 6893.0 * sin(M) + 72.0 * sin(2 * M)
L = p2 * frac(0.7859453 + M / p2 + (6191.2 * t + DL) / 1296000)
SL = sin(L)
X = cos(L)
Y = coseps * SL
Z = sineps * SL
RHO = sqrt(1 - Z * Z)
dec = (360.0 / p2) * atan(Z / RHO)
ra = (48.0 / p2) * atan(Y / (X + RHO))
if ra < 0:
ra += 24
return dec, ra
class RiSet:
def __init__(self, lat=LAT, long=LONG): # Local defaults
self.sglat = sin(radians(lat))
self.cglat = cos(radians(lat))
self.long = long
self.mjd = 60276 # MJD 28 Nov 2023
print("Value", self.rise())
def lmst(self, mjd):
d = mjd - 51544.5
t = d / 36525.0
lst = 280.46061837 + 360.98564736629 * d + 0.000387933 * t * t - t * t * t / 38710000
return (lst % 360) / 15.0 + self.long / 15
def sin_alt(self, hour, func):
mjd = self.mjd + hour / 24.0
t = (mjd - 51544.5) / 36525.0
dec, ra = func(t)
tau = 15.0 * (self.lmst(mjd) - ra)
salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
return salt
def rise(self):
sinho = sin(radians(-0.833))
yp = self.sin_alt(9, minisun) - sinho
return yp
r = RiSet() Results on everything I could lay my hands on: 64-bit FPUnix 0.1236665384827108 (presumed accurate) 32-bit FPPyboard 1.1 0.1252849 (+1.31%) OutliersESP8266 0.116815 (-5.5%) Interesting that ESP8266 is similarly challenged. |
Beta Was this translation helpful? Give feedback.
-
I could add: |
Beta Was this translation helpful? Give feedback.
-
I guessed, that the Changing this function for a printout of deviations: def sin_alt(self, hour, func):
mjd = self.mjd + hour / 24.0
t = (mjd - 51544.5) / 36525.0
dec, ra = func(t)
dec_prec = -21.28489097872491
ra_prec = 16.264695874889682
print('dec, ra:', dec, ra, 'deviations (%):', 100*(dec-dec_prec)/dec_prec, 100*(ra-ra_prec)/ra_prec)
tau = 15.0 * (self.lmst(mjd) - ra)
salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
tau_prec = -44.06301269320757
salt_prec = 0.10912845798021381
print('tau, salt:', tau, salt, 'deviations (%):', 100*(tau-tau_prec)/tau_prec, 100*(salt-salt_prec)/salt_prec)
return salt where the
on the Pico |
Beta Was this translation helpful? Give feedback.
-
So, I think I have at least cornered the numerical problem. In the lmst function,
is very subject to roundoff. For very tiny differences in the input, it gives wildly different ( a few degrees) results, which of course add up to minutes of time error. I will see if I can figure out a way to make it numerically much more stable in single precision. |
Beta Was this translation helpful? Give feedback.
-
@peterhinch @rkompass I rewrote two routines to eliminate roundoff error and remove a large modular reduction. Try pasting these into the module (I've slightly cleaned it up since my first post): def lmstt(self, t):
# Takes the mjd and the longitude (west negative) and then returns
# the local sidereal time in hours. Im using Meeus formula 11.4
# instead of messing about with UTo and so on
# modified to use the pre-computed 't' value from sin_alt
d = t * 36525
df = frac(d)
c1 = 360
c2 = 0.98564736629
dsum = c1 * df + c2 * d #no large integer * 360 here
lst = 280.46061837 + dsum + 0.000387933 * t * t - t * t * t / 38710000
return (lst % 360) / 15.0 + self.long / 15
def sin_alt(self, hour, func):
# Returns the sine of the altitude of the object (moon or sun)
# at an hour relative to the current date (mjd)
mjd1 = (self.mjd - 51544.5) + hour / 24.0
t1 = mjd1 / 36525.0
#print(f"sin_alt mjd0={mjd0:.7f} t0={t0:.9f} mjd1={mjd1:.7f} t1={t1:.9f}")
dec, ra = func(t1)
# hour angle of object: one hour = 15 degrees. Note lmst() uses longitude
tau = 15.0 * (self.lmstt(t1) - ra)
# sin(alt) of object using the conversion formulas
salt = self.sglat * sin(radians(dec)) + self.cglat * cos(radians(dec)) * cos(radians(tau))
return salt |
Beta Was this translation helpful? Give feedback.
-
The RP2 Pico transfers floats wrongly into the internal representation:
gives this is a deviation of 4 in least significant bit units. Given the many constants in the code such errors may sum up. Edit:
|
Beta Was this translation helpful? Give feedback.
-
@mendenm @rkompass I'm deeply impressed :) RP2 now produces 0.1236907, within 0.02% of the 64 bit result. PYBD is now within 0.003%. I had been led to believe that the source of the algorithms, "Astronomy on the Personal Computer" by Montenbruck and Pfleger, was authoritative. It's certainly full of maths, sunrise and set being the nursery slopes. Clearly there is room for improvement in the algorithms (including |
Beta Was this translation helpful? Give feedback.
-
If it turns out to be an RP2040 FP library issue (which it's increasingly looking like it may not be), the library used is Mark Owen's Qfplib-M0. Mark is extremely approachable. |
Beta Was this translation helpful? Give feedback.
-
Realistically, you can't do much better than that under any conditions, since local topography can make a much bigger change than that! We live in a bowl near a lake, with hills all around us. Normal calculations are way off.
Incidentally, for a different project, which needed sunrise and sunset times, I used the 'astral' package:
https://astral.readthedocs.io/en/latest/
but that was on a regular RPi4 with double precision math. I have not evaluated it at reduced precision ever.
Edit: I just looked at astral. It has the same problem with not pre-reducing the integer part of the day count. It will give bad results on a single-precision machine, too. Of course, it was never written for that. However, I wonder if I should suggest the change, since it is harmless on normal machines, and an improvement on microcontrollers.
From astral.julian
```python
def geom_mean_long_sun(juliancentury: float) -> float:
"""Calculate the geometric mean longitude of the sun"""
l0 = 280.46646 + juliancentury * (36000.76983 + 0.0003032 * juliancentury)
return l0 % 360.0
```
Marcus
|
Beta Was this translation helpful? Give feedback.
-
I had the feeling that the computations were still unnecessarily involved: from math import pi, sin, cos, sqrt, atan, radians
for i in range(20):
z = i/20
p2 = 2* pi # 6.283185307
rho = sqrt(1 - z*z)
dec = (360.0 / p2) * atan(z / rho)
sr = sin(radians(dec))
print('z, sin(radians(dec))', z, sr)
for i in range(20):
z = i/20
p2 = 2* pi # 6.283185307
rho = sqrt(1 - z*z)
dec = (360.0 / p2) * atan(z / rho)
sr = cos(radians(dec))
print('rho, cos(radians(dec))', rho, sr) demonstrate that the atan(z / rho) is not necessary at all, because it is fed into sin() and cos() (after a conversion from radians to degrees and back again) So dec (declination) is interesting for itself, but not needed for the computations? |
Beta Was this translation helpful? Give feedback.
-
The esp8266 is off because it uses truncated 30-bit floating point numbers, via For rp2... I'm not sure, will need investigation. |
Beta Was this translation helpful? Give feedback.
-
What a famous discussion! Thank you all guy! ☺ |
Beta Was this translation helpful? Give feedback.
-
I did a string to float function in python. This is to exclude errors from wrong constants on the RP2 Pico.
If you confirm that it works well: Should I post that to the issue too, as workaround? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am in the late stages of developing some code for calculating sun and moon rise and set times and moon phases. There is a much floating point maths going on so I set about benchmarking it on various platforms. To my surprise the computed results varied. There were expected small variations between 64-bit and 32-bit platforms: around 20s on today's sunrise time. But the RP2 had a more significant error of 5 minutes on all the calculated values.
The 32-bit platforms were in extremely close mutual agreement, whether FP is implemented in hardware or software. RP2 was the only outlier.
Is this a known issue?
Beta Was this translation helpful? Give feedback.
All reactions