Skip to content

I2C bus error after while - OSError: [Errno 5] Input/output error #3376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
electriczity opened this issue Sep 6, 2020 · 17 comments · Fixed by #3545
Closed

I2C bus error after while - OSError: [Errno 5] Input/output error #3376

electriczity opened this issue Sep 6, 2020 · 17 comments · Fixed by #3545
Assignees
Milestone

Comments

@electriczity
Copy link

electriczity commented Sep 6, 2020

Hi, I used Adafruit Feather STM32F405 Express board with CircuitPython 6.0.0-alpha.3.

When I read continually range from the VL53L0X sensor, then after while this error occurs:

Traceback (most recent call last):
File "code.py", line 54, in
File "/lib/adafruit_vl53l0x.py", line 600, in getRange
File "/lib/adafruit_vl53l0x.py", line 324, in _read_u8
File "/lib/adafruit_vl53l0x.py", line 323, in _read_u8
File "adafruit_bus_device/i2c_device.py", line 102, in write
OSError: [Errno 5] Input/output error

Continually means the loop contains the starting of range-finding for 6 sensors without delay, then I read the range from them, some delay ~100ms, and the loop starts again. Each sensor have a different address, and they are on the same I2C bus.

Note: I use .py version of the sensor library because of splitting "range" function into "start range-finding" and "readout the range". The same error appears when using a stable version of the VL53L0X library compiled or not.

I think this will happen with any other I2C device when someone tries to read out data quickly.

Simmilar problem: issues 2635 edit: maybe only visually

@tannewt tannewt added this to the Long term milestone Sep 8, 2020
@tannewt
Copy link
Member

tannewt commented Sep 8, 2020

Could you please post your code.py so we might be able to reproduce?

I suspect the STM HAL is returning a more useful error code to us. Are you interested in helping debug this? Thanks!

@electriczity
Copy link
Author

electriczity commented Sep 9, 2020

Hi, use the example code:

https://github.com/adafruit/Adafruit_CircuitPython_VL53L0X/blob/master/examples/vl53l0x_simpletest.py

set the delay at the end of loop to 0.1 sec, or lower. This should be enough, and the error should manifest itself in a reasonable time. Try code for a few minutes.

Because of this I2C bug I can not use Circuitpython. I use different development environment now, so I can not test this issue again to provide a small example. If issue will not be reproducible with "simpletest" example with small delay, then I will try to provide the example of this issue.

@dhalbert
Copy link
Collaborator

dhalbert commented Sep 9, 2020

Just to know: have you tried this on any other CircuitPython boards? The #2635 bug is peculiar to the I2C device, as you saw.

@electriczity
Copy link
Author

electriczity commented Sep 9, 2020

@dhalbert No, I do not have any other Circuitpython board.

edit: At first glance, the #2635 problem looked similar, but it probably not.

@kevinjwalters
Copy link

I'm not sure if it's the same thing, but my first venture into i2c also involved this error message and sensor but on a different board. In my case I had removed a 100ms pause when translating the code and that seemed to trigger it after a while. Gemma M0 gives OSError: 5 on i2c.writeto to tinyLiDAR. Perhaps it's worth verifying the VL53L0X isn't in one of the slower ranging modes?

@electriczity
Copy link
Author

electriczity commented Sep 10, 2020

@kevinjwalters I used default settings. The distance measurement took approx. 35ms (I can't remember exactly). So, the 100ms delay should be perfectly fine. Wrong assumption. Reading of property sensor.range has all necessary delays inside. So, the user can use any delay, and the program must work.

@electriczity
Copy link
Author

electriczity commented Sep 10, 2020

I found some time. The simplest code is:

import time
import board
import busio
import adafruit_vl53l0x


i2c = busio.I2C(board.SCL, board.SDA)
vl53 = adafruit_vl53l0x.VL53L0X(i2c)

while True:
    print("Range: {0} mm".format(vl53.range))
    time.sleep(0.1)

serial output:
log.txt

Sometimes it takes longer. With shorter time.sleep(...), the error pops up in a shorter time.
For example, with time.sleep(0.05) it take about 10 seconds for an error to pop up.

@mdroberts1243
Copy link

mdroberts1243 commented Sep 27, 2020

I can confirm this problem is with the STM32F405 Feather and 6.0 alpha (latest). I can reproduce the problem quite easily with the AS7341 sensor and its simpletest. Reduce the sleep time to have the problem manifest quicker. It is even easier to reproduce the problem using the SH1107 quirk and the SH1107 128x64 OLED FeatherWing. When using the OLED I just move a lable around on the screen pseudo randomly to quicken the

I've seen I/O errors on write and write_then_readinto.

I have not been able to reproduce the problem with the Grand Central M4 Express and CP 6.0 alpha, successfully running the SH1107 display all night long (Playing the game-of-life).

I could not reproduce the problem with "Adafruit CircuitPython 5.3.1 on 2020-07-13; Feather STM32F405 Express with STM32F405RG". Using the AS7341 simpletest running all night long.

So it seems to be specific to the STM Feather and the latest CP 6.0 alpha.

@DavePutz
Copy link
Collaborator

DavePutz commented Oct 4, 2020

I did some testing, and it looks like this error is happening due to some real inconsistencies in the return values from HAL_GetTick(). The errors I am seeing are caused by timeouts in I2C_WaitOnBTFFlagUntilTimeout(). Instrumenting that function in st_driver/STM32F4xx_HAL_Driver/Src/stm32f4xx_hal_i2c.c I see times when the the value from HAL_GetTick() is less than the starting ticks (and it's not due to overflow, as it is a 32-bit int and the values were far below the max). Other times I got values roughly 10,000 ticks larger than the start; but calling HAL_GetTick()
again when we returned in common_hal_busio_i2c_write() showed smaller (and valid) values. Right now I do not have a good explanation for this, unless we are affecting the clock somewhere else in the code.

@hierophect
Copy link
Collaborator

hierophect commented Oct 5, 2020

I don't quite know how I missed this issue, but I have a possible idea of where it could be. We need to override the HAL_GetTick() system functions in order to return time properly with low power code, and this replacement function actually performs slightly differently directly after startup vs after the clocks have been initialized. Immediately after startup, HAL_GetTick must use the SystemTick, or it is impossible to correctly initialize the Low Speed External oscillator (LSE), which is required to actually use the RTC to acquire non-systick system time. After the LSE is initialized, we switch to RTC timekeeping and disable the Systick so that the low power code can take effect. However, at this tradeoff, the numbers returned by these two functions will not be consistent.

It's possible something else in the core is messing with the Systick or some other element of this system. I can hop on a debugger this week and see if I can replicate this issue.

@hierophect hierophect self-assigned this Oct 5, 2020
@DavePutz
Copy link
Collaborator

DavePutz commented Oct 5, 2020 via email

@DavePutz
Copy link
Collaborator

DavePutz commented Oct 7, 2020

This is not only an I2C problem. The issue is in port_get_raw_ticks(), which can return a value less than the previous value.
This simple Python script can demonstrate using monotonic_ns, which directly calls port_get_raw_ticks() :

import time

a=time.monotonic_ns()
while True:
    b=time.monotonic_ns()
    if b<a:
        print(f"Got {b} was {a}")
    a=b
 

This does not reproduce the issue as quickly as the I2C bus does, but I get output like
Got 11268000061036 was 11268976562500
The difference of 976,501,464, when divided by the 976562 factor in common_hal_time_monotonic_ns(), comes out
to right about 1 second. This matches the instrumentation I put in HAL_GetTick() that showed values being returned
that were 1024 millseconds less than previous values.
Reading the ST documentation at https://www.st.com/resource/en/application_note/dm00226326-using-the-hardware-realtime-clock-rtc-and-the-tamper-management-unit-tamp-with-stm32-microcontrollers-stmicroelectronics.pdf
I see that it talks on page 55 about needing to clear the RSF bit, and then wait until it is
set again before reading the calendar registers. @tannewt - is this perhaps what we are seeing?

@jepler
Copy link

jepler commented Oct 8, 2020

The reference manual 26.3.6 has a number of caveats about how the RTC must be read for the content of the 3 registers to be synchronized. Maybe we're "[making] read accesses to the calendar in a time interval smaller than 2 RTCCLK periods"? It does seem that "BYPSHAD control bit is cleared" (it's the power on value and I don't see us changing it)

@tannewt
Copy link
Member

tannewt commented Oct 8, 2020

@DavePutz @jepler Yup, that seems like it could be the issue.

@mdroberts1243
Copy link

mdroberts1243 commented Oct 13, 2020

The game of life has gone through several generations on SH1107 OLED with the STM RTC fix. Much much further than I've ever seen before. This fix seems to fix the I2C issues I've been seeing.

@veng1
Copy link

veng1 commented Oct 13, 2020

I'm seeing a similar error on a PyPortal where upon start up, an I2C UVC sensor may or may not work. And it appear random. It has worked for hours at a time and then just refuse to start with a "Value Error: No I2C device at address:39"

Which brings up a question: How does one decide if everything is up to date? It seems like a daily reload of CP and all libraries shouldn't be required every time there might be a suspicion that the libraries are out of date. Is there tool that can look at a CP disk and advise if anything needs an update?

@tannewt
Copy link
Member

tannewt commented Oct 13, 2020

@veng1 I suspect the PyPortal is unrelated to this because it is a SAMD51. Check out circup for updating: https://github.com/adafruit/circup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants