Skip to content

samd21: time.monotonic() slows down under pressure #876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
notro opened this issue May 27, 2018 · 12 comments
Closed

samd21: time.monotonic() slows down under pressure #876

notro opened this issue May 27, 2018 · 12 comments

Comments

@notro
Copy link

notro commented May 27, 2018

I don't know if this is by (unfortunate) design or not, but time.monotonic() slows down under pressure.

I run a script to measure RTC accuracy and it slows down the monotonic clock.

This snippet is run on the board through the REPL to give me the moment that the second ticks over:

import time

lt1 = time.localtime()
lt2 = lt1
i = 0
while lt1.tm_sec == lt2.tm_sec:
    lt2 = time.localtime()
    i += 1
m = time.monotonic()
print("%%s:%%.6f:%%d" %% ( str(tuple(lt2)), m, i ))

This is the resulting test script output:

-0.005966 secs / 0.000025 days = -238.187512 secs/day (-2756.80 ppm) : 27 May 2018 22:38:30 (2286.009 - 2285.562 = 0.447) : 127
-0.001031 secs / 0.000037 days = -28.206318 secs/day (-326.46 ppm) : 27 May 2018 22:38:31 (2286.456 - 2286.009 = 0.447) : 124

The important part is the last two numbers: monotonic has elapsed ~0.5 seconds and the loop has run 127/124 times.

If I add a 100ms delay in the loop:

import time

lt1 = time.localtime()
lt2 = lt1
i = 0
while lt1.tm_sec == lt2.tm_sec:
    lt2 = time.localtime()
    i += 1
    time.sleep(0.1)
m = time.monotonic()
print("%%s:%%.6f:%%d" %% ( str(tuple(lt2)), m, i ))

Then I get that monotonic has elapsed ~1 second and the loop has run 7/8 times:

+0.055680 secs / 0.000026 days = +2114.643333 secs/day (+24475.04 ppm) : 27 May 2018 22:40:08 (2366.929 - 2366.016 = 0.913) : 7
+0.010000 secs / 0.000038 days = +260.184927 secs/day (+3011.40 ppm) : 27 May 2018 22:40:09 (2367.941 - 2366.929 = 1.013) : 8
-0.025755 secs / 0.000050 days = -510.800835 secs/day (-5912.05 ppm) : 27 May 2018 22:40:10 (2368.938 - 2367.941 = 0.997) : 8
+0.029896 secs / 0.000061 days = +487.286023 secs/day (+5639.88 ppm) : 27 May 2018 22:40:11 (2369.850 - 2368.938 = 0.911) : 7
-0.026185 secs / 0.000074 days = -355.902557 secs/day (-4119.24 ppm) : 27 May 2018 22:40:12 (2370.868 - 2369.850 = 1.019) : 8

This is with a 10ms sleep:

+0.006152 secs / 0.000025 days = +243.827366 secs/day (+2822.08 ppm) : 27 May 2018 22:56:36 (3344.759 - 3343.982 = 0.777) : 54
+0.001943 secs / 0.000037 days = +52.725427 secs/day (+610.25 ppm) : 27 May 2018 22:56:37 (3345.534 - 3344.759 = 0.775) : 55
+0.007818 secs / 0.000048 days = +161.654050 secs/day (+1871.00 ppm) : 27 May 2018 22:56:38 (3346.313 - 3345.534 = 0.779) : 53

Does anyone have an explanation for this?

@tannewt
Copy link
Member

tannewt commented May 29, 2018

What do you think the problem is? I don't think its a matter of monotonic being incorrect but rather how quickly the loop runs. The slower each iteration runs, the coarser the time it takes to find the seconds boundary.

Monotonic is driven by an internal interrupt based on the SysTick timer. The only way I'd expect it to lose time is if interrupts are disabled for more than a millisecond.

@tannewt tannewt added this to the Long term milestone May 29, 2018
@notro
Copy link
Author

notro commented May 29, 2018

The thing is, monotonic should always have elapsed one second or more, never (much) less. I'm using monotonic to measure the time it takes for one second to elapse on the RTC.

I have now run the same test on a Metro M4 Express, and it doesn't show the same problem.
This is with no sleeping in the loop: +700 loop iterations and ~1.0 second elapsed on monotonic:

+0.005894 secs / 0.000025 days = +234.605899 secs/day (+2715.35 ppm) : 29 May 2018 00:00:02 (56083.937 - 56082.935 = 1.001) : 767
+0.001815 secs / 0.000037 days = +49.392845 secs/day (+571.68 ppm) : 29 May 2018 00:00:03 (56084.952 - 56083.937 = 1.016) : 761
-0.002347 secs / 0.000048 days = -48.521483 secs/day (-561.59 ppm) : 29 May 2018 00:00:04 (56085.954 - 56084.952 = 1.001) : 770
-0.007010 secs / 0.000060 days = -116.838150 secs/day (-1352.29 ppm) : 29 May 2018 00:00:05 (56086.950 - 56085.954 = 0.997) : 759
-0.011479 secs / 0.000072 days = -160.283728 secs/day (-1855.14 ppm) : 29 May 2018 00:00:06 (56087.952 - 56086.950 = 1.001) : 768
-0.005442 secs / 0.000083 days = -65.464984 secs/day (-757.70 ppm) : 29 May 2018 00:00:07 (56088.953 - 56087.952 = 1.001) : 757
-0.009664 secs / 0.000095 days = -101.999383 secs/day (-1180.55 ppm) : 29 May 2018 00:00:08 (56089.969 - 56088.953 = 1.016) : 770
-0.014307 secs / 0.000106 days = -134.497285 secs/day (-1556.68 ppm) : 29 May 2018 00:00:09 (56090.965 - 56089.969 = 0.997) : 759
-0.018378 secs / 0.000118 days = -155.750179 secs/day (-1802.66 ppm) : 29 May 2018 00:00:10 (56091.967 - 56090.965 = 1.001) : 768
-0.012985 secs / 0.000130 days = -100.261177 secs/day (-1160.43 ppm) : 29 May 2018 00:00:11 (56092.968 - 56091.967 = 1.001) : 757
-0.017704 secs / 0.000141 days = -125.438034 secs/day (-1451.83 ppm) : 29 May 2018 00:00:12 (56093.969 - 56092.968 = 1.001) : 765
-0.022001 secs / 0.000153 days = -144.020932 secs/day (-1666.91 ppm) : 29 May 2018 00:00:13 (56094.985 - 56093.969 = 1.016) : 759

I can upload a test so people can try and reproduce when I've done with tools/cpboard.py, which the test depends upon.

@tannewt
Copy link
Member

tannewt commented May 29, 2018

I don't think that's what your code above is doing. I think its measuring the time until the first seconds change from script start up. I don't see where it waits to ensure the first monotonic is captured on a seconds change. Do you have an updated version of the code?

@notro
Copy link
Author

notro commented May 29, 2018

The script might be a bit convoluted since it has evolved over time without getting a cleanup.
Basically what it does is:

  • Set the RTC to the same time as the Raspberry Pi
  • Wait for the RTC to tick a new second, report/print the struct_time tuple, the monotic value and the number of iterations
  • Print the diff between the clock on the Pi and the board, print the diff between the current monotonic and the previous one

There is some dump code in here that I've used when only checking the diff once a minute to see if the amount of printing over USB could have anything to do with my discrepancies.
There is also some code to remove the inital clock diff that I needed before I were able to increase RTC precision when setting the clock.

import sys
sys.path.append('.')
import time
import cpboard


def wait_for_tick_tm(board, field):
    c = """
import time

lt1 = time.localtime()
lt2 = lt1
i = 0
while lt1.%s == lt2.%s:
    lt2 = time.localtime()
    i += 1
    #time.sleep(0.01)
m = time.monotonic()
print("%%s:%%.6f:%%d" %% ( str(tuple(lt2)), m, i ))

""" % (field, field)

    retry = 10
    while True:
        s = None
        try:
            s = board.exec(c.strip(), timeout=70)
            if s:
                break
        except CPboardError as e:
            if retry == 0:
                raise e
        retry -= 1
        if retry < 8:
            print('retry(%d): %s' % (retry, s))

    try:
        vals = s.split(b':')
        t = eval(vals[0])
        lt2 = time.struct_time(t)
        m = eval(vals[1])
        i = eval(vals[2])
    except Exception as e:
        print("s=%s" % s)
        raise e

    return lt2, m, i

def wait_for_tick(board):
    return wait_for_tick_tm(board, 'tm_sec')

def wait_for_tick_minute(board):
    return wait_for_tick_tm(board, 'tm_min')

def sync_time(board, verbose=False):
    # Start in 2 seconds
    t = time.localtime(time.time() + 2)

    w = time.mktime(t) - time.time()
    if verbose:
        print("wait: %f secs" % w)
    time.sleep(w)

    cmd = 'import rtc; rtc.RTC().datetime = %s' % str(tuple(t))
    if verbose:
        print(cmd)
    board.exec(cmd)

    return t


def measure_ppm(board, localtime_base):
        dbg = True
        #dbg = False

        time_base = time.mktime(localtime_base)

        tstr = time.strftime("%d %b %Y %H:%M:%S", localtime_base)
        print("Base time: %s" % tstr)

        first_time_diff = None
        monotonic_prev = 0

        time_diff_prev = None
        dump_lines = [''] * 70
        dump = None

        while True:
            localtime_remote, monotonic, i = wait_for_tick(board)
            time_local = time.time()

            time_remote = time.mktime(localtime_remote)
            time_diff = time_remote - time_local

            # Strip off the fact that the clocks don't run in sync
            if first_time_diff is None:
                first_time_diff = time_diff
                print("\n\nFirst diff: % .6f secs\n\n" % first_time_diff)

            time_diff -= first_time_diff

            if time_diff_prev is None:
                time_diff_prev = time_diff

            if dump is None and abs(time_diff) > abs(time_diff_prev) + 2:
                dump = True
            time_diff_prev = time_diff

            db = time_local - time_base
            days = db / (24 * 60 * 60)
            if days:
                dv = time_diff / days
            else:
                dv = 0

            s = "%+.6f secs / %.6f days = %+.6f secs/day (%+.2f ppm) : %s (%.3f - %.3f = %.3f) : %d" % \
                      (time_diff, days, dv, (dv / (24 * 60 * 60) ) * 10**6,
                       time.strftime("%d %b %Y %H:%M:%S", localtime_remote),
                       monotonic, monotonic_prev, monotonic - monotonic_prev,
                       i)

            if dbg or not (int(time_local) % 60):

                if dump:
                    dump = False
                    print("\ndump 70--->")
                    for line in dump_lines:
                        print(line)
                    print("<---dump\n")

                    #
                    print(s)
                    break

                print(s)
                monotonic_prev = monotonic

            dump_lines.pop(0)
            dump_lines.append(s)


print('Board:', sys.argv[1])

board = cpboard.CPboard(sys.argv[1])
board.open()

localtime_base = sync_time(board, True)

measure_ppm(board, localtime_base)

@tannewt
Copy link
Member

tannewt commented May 29, 2018

Ah, its possible that the USB code is starving the SysTick interrupt. Try a similar test that doesn't use USB actively while measuring monotonic vs the RTC.

dhalbert pushed a commit that referenced this issue Jun 8, 2018
We now track the last time the background task ran and bail on the
PulseIn if it starves the background work. In practice, this
happens after the numbers from pulsein are no longer accurate.

This also adjusts interrupt priorities so most are the lowest level
except for the tick and USB interrupts.

Fixes #516 and #876
@dhalbert
Copy link
Collaborator

@notro could you retest? Thanks. @tannewt redid the interrupt priorities so that Systick is now at the highest prio instead of the lowest (!) as it was before.

@notro
Copy link
Author

notro commented Jun 12, 2018

I'm still getting the same result.

Here's a stripped down version of my example:

code.py:

import time

lt1 = time.localtime()
lt2 = lt1
m1 = time.monotonic()
while True:
    i = 0
    while lt1.tm_sec == lt2.tm_sec:
        lt2 = time.localtime()
        i += 1
        #time.sleep(0.1)
    m2 = time.monotonic()
    print("time: %s monotonic:%.6f iterations:%d" % ( str(tuple(lt2)[:6]), m2 - m1, i ))
    lt1 = lt2
    m1 = m2

Result with no sleep:

time: (2000, 1, 1, 0, 8, 48) monotonic:0.017944 iterations:13
time: (2000, 1, 1, 0, 8, 49) monotonic:0.343994 iterations:153
time: (2000, 1, 1, 0, 8, 50) monotonic:0.310059 iterations:158
time: (2000, 1, 1, 0, 8, 51) monotonic:0.353027 iterations:150
time: (2000, 1, 1, 0, 8, 52) monotonic:0.354980 iterations:146
time: (2000, 1, 1, 0, 8, 53) monotonic:0.355957 iterations:149
time: (2000, 1, 1, 0, 8, 54) monotonic:0.322021 iterations:155
time: (2000, 1, 1, 0, 8, 55) monotonic:0.344971 iterations:150

Result with 100ms sleep:

time: (2000, 1, 1, 0, 9, 1) monotonic:0.902954 iterations:9
time: (2000, 1, 1, 0, 9, 2) monotonic:1.011963 iterations:10
time: (2000, 1, 1, 0, 9, 3) monotonic:0.911011 iterations:9
time: (2000, 1, 1, 0, 9, 4) monotonic:1.010010 iterations:10
time: (2000, 1, 1, 0, 9, 5) monotonic:0.910034 iterations:9
time: (2000, 1, 1, 0, 9, 6) monotonic:1.010010 iterations:10

@notro
Copy link
Author

notro commented Jun 26, 2018

I just tried the test case in the previous post on an M4 (without sleep) and the number of iteration varies quite a lot:

Adafruit CircuitPython 3.0.0-rc.0-25-gd7d132d5a-dirty on 2018-06-26; Adafruit Metro M4 Express with samd51j19
>>>
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
time: (2000, 1, 1, 0, 0, 39) monotonic:0.244003 iterations:403
time: (2000, 1, 1, 0, 0, 40) monotonic:1.003006 iterations:548
time: (2000, 1, 1, 0, 0, 41) monotonic:1.000000 iterations:1120
time: (2000, 1, 1, 0, 0, 42) monotonic:1.000992 iterations:396
time: (2000, 1, 1, 0, 0, 43) monotonic:0.999008 iterations:953
time: (2000, 1, 1, 0, 0, 44) monotonic:1.002991 iterations:387
time: (2000, 1, 1, 0, 0, 45) monotonic:1.000000 iterations:977
time: (2000, 1, 1, 0, 0, 46) monotonic:1.000000 iterations:372
time: (2000, 1, 1, 0, 0, 47) monotonic:1.001007 iterations:949
time: (2000, 1, 1, 0, 0, 48) monotonic:1.000992 iterations:371
time: (2000, 1, 1, 0, 0, 49) monotonic:1.000000 iterations:952
time: (2000, 1, 1, 0, 0, 50) monotonic:1.001999 iterations:370
time: (2000, 1, 1, 0, 0, 51) monotonic:0.998001 iterations:949
Traceback (most recent call last):
  File "code.py", line 9, in <module>
KeyboardInterrupt:



Press any key to enter the REPL. Use CTRL-D to reload.

@Dar-Scott
Copy link

Dar-Scott commented Jan 15, 2020

Perhaps stating the obvious... I am new to ARM, CP implementation, and even github... I am today seeing this same behavior.
I looked at the systick handler. (I think it does some unneeded things; see below.) I don't see a problem. I didn't dive down into some functions such as one related to the file system. Perhaps those can be moved to software triggered exceptions (maybe using the STIR, if available, for an atomic trigger). Those can then run at a lower priority. From my limited glance, I don't think the handler run time is too long.
Since the systick handler now runs at highest priority (and presumably nothing else does; this can be checked, I guess), the problem is not because of higher priority interrupts or bumping priority. Given those assumptions, I say the loss of time is caused by the disabling of interrupts for long periods. Perhaps a style guide is needed here, or, if there is one, attention to it is needed. Interrupts should be disabled for tight short times as needed, not for large operations. Priority shifting can be considered as an alternative.
(As far as those unneeded things in the systick handler... I don't think the control register needs to be read. And even if it does, the read is atomic, so it doesn't need to be wrapped with interrupt disable/enable. And since the systick runs at highest priority, the wrapping will, I think, have no behavior effect.)

@Dar-Scott
Copy link

I tinkered with the test script and it seems that time.localtime() or time.time() takes about 4.46 ms and during its execution 3 tick interrupts seem to be lost.

@Dar-Scott
Copy link

Oh, the 4.5 ms for time.time() and the 3 lost SYSTICKs are from testing on Feather M0 Express.

@tannewt
Copy link
Member

tannewt commented Jul 14, 2020

I'm going to assume this is fixed by the switch to using the RTC for timing. Please comment if it's not fixed.

@tannewt tannewt closed this as completed Jul 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants