Skip to content

I2C Bus error leaves board unrecoverable without power down #2635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Panometric opened this issue Jan 14, 2020 · 23 comments
Open

I2C Bus error leaves board unrecoverable without power down #2635

Panometric opened this issue Jan 14, 2020 · 23 comments

Comments

@Panometric
Copy link

In my example, in a loop reading the device as fast as possible, and doing other unrelated things..

while 1:
     acceleration = cpx.acceleration
    .....

You can get an error:


Traceback (most recent call last):
  File "code.py", line 63, in <module>
  File "adafruit_circuitplayground/circuit_playground_base.py", line 261, in acceleration
  File "adafruit_lis3dh.py", line 159, in acceleration
  File "adafruit_lis3dh.py", line 328, in _read_register
  File "adafruit_lis3dh.py", line 327, in _read_register
  File "adafruit_bus_device/i2c_device.py", line 82, in readinto
OSError: [Errno 5] Input/output error

Press any key to enter the REPL. Use CTRL-D to reload.soft reboot

Once this occurs, the I2C device bus is stuck because the device is holding the bus. The board never recovers unless you power it down. Every restart just issues a RuntimeError. This could happen on any I2C device.


Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
  File "code.py", line 3, in <module>
  File "adafruit_circuitplayground/__init__.py", line 29, in <module>
  File "adafruit_circuitplayground/express.py", line 75, in <module>
  File "adafruit_circuitplayground/express.py", line 72, in __init__
  File "adafruit_circuitplayground/circuit_playground_base.py", line 110, in __init__
RuntimeError: SDA or SCL needs a pull up

The industry standard response to this is to bit-bang single clocks onto SCLK until the bus is released by the slave. This should be done at start, or even on error to recover the bus silently. It usually requires disconnecting the peripheral temporarily, using the SCLK os GPIO and then reconnecting.

@ladyada
Copy link
Member

ladyada commented Jan 14, 2020

yeah this is a thing that happened with CODAL & the LIS3DH. very odd that this chip is particularly wierded out sometimes!
@dhalbert see email thread "I2C lockup" with peli and folks

@dhalbert dhalbert transferred this issue from adafruit/Adafruit_CircuitPython_BusDevice Feb 19, 2020
@dhalbert dhalbert added this to the 5.x.0 - Features milestone Feb 19, 2020
@dhalbert dhalbert modified the milestones: 5.x.0 - Features, 5.0.0 Feb 19, 2020
@dhalbert
Copy link
Collaborator

Deferring this to after 5.0.0. It's a very good idea, but we want to vet it against a bunch of I2C devices first, since there may be a few that are pathological.

CODAL fix:
https://github.com/lancaster-university/codal-samd/blob/cplay_master_i2c_hack/src/ZI2C.cpp#L10

Another discussion:
https://www.raspberrypi.org/forums/viewtopic.php?t=241491

@dhalbert dhalbert mentioned this issue Feb 20, 2020
@creston-bob
Copy link

In a project using the Microchip PIC32MZ microcontroller I discovered that the first two versions (v1 & v2) have a hardware bug in the silicon that causes the I2C to hang occasionally. I dug into the matter and found the errata sheets indicated that v1 & v2 had this flaw but it was apparently fixed in v3. After trying out all Microchip's suggestions for fixing the hung I2C from within my program code (and failing), I eventually concluded that the ONLY possible fix was a power reset (which works reliably). I have also discovered that without doing anything, eventually the I2C will recover by itself but it can take many hours before that happens. As a solution, I have an Arduino board listening to the data stream coming from the PIC to detect the problem and rectify it. So how do I detect the I2C malfunction? Simple ... I've programmed a PIC32 onboard timer interrupt that checks the SCL and SDA pins after every transaction ... if the pins are both high (idle mode) then the I2C transaction was successful but if either pin is low then we have a malfunction and the integer 1 is appended to the data being sent to the server. The integer is sliced off the data after being checked to see it it's a 1 or 0. This scheme works extremely well but has made me wary of I2C malfunctions from any silicon. I suspect that's what is happening in this particular circumstance and this is how you fix it.

@creston-bob
Copy link

I forgot to mention that the Arduino board I mentioned in my previous comment energizes a relay to power reset the PIC32MZ board. The power is connected through the relay's NC contact which is opened for a few seconds to cause the power reset.

@kevinjwalters
Copy link

I just got this by pressing reset button on a CLUE (alpha) then trying to load clue object:

Adafruit CircuitPython 5.0.0-rc.0 on 2020-02-26; Adafruit CLUE nRF52840 Express                                                    with nRF52840
>>>
>>>
>>>
>>> import board
>>> from adafruit_clue import clue
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "adafruit_clue.py", line 886, in <module>
  File "adafruit_clue.py", line 172, in __init__
RuntimeError: SDA or SCL needs a pull up
>>>
>>>
>>>
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
  File "code.py", line 46, in <module>
  File "adafruit_clue.py", line 886, in <module>
  File "adafruit_clue.py", line 172, in __init__
RuntimeError: SDA or SCL needs a pull up



Press any key to enter the REPL. Use CTRL-D to reload.
Adafruit CircuitPython 5.0.0-rc.0 on 2020-02-26; Adafruit CLUE nRF52840 Express with nRF52840
>>>

@kevinjwalters
Copy link

kevinjwalters commented Mar 9, 2020

Another example of a CLUE in forums being discussed with @caternuson: Adafruit Forums: clue_display_sensor_data.py not working. Reading from the accelerometer (LSM6DS33) could be implicated. Same RunTimeError after the problem:

    Adafruit CircuitPython 5.0.0 on 2020-03-02; Adafruit CLUE nRF52840 Express with nRF52840
    >>>import board
    >>> i2c = board.I2C()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RuntimeError: SDA or SCL needs a pull up

@kevinjwalters
Copy link

kevinjwalters commented Mar 9, 2020

Why does the stack trace show _read_register twice??

File "adafruit_lis3dh.py", line 328, in _read_register
File "adafruit_lis3dh.py", line 327, in _read_register

If that's from this code,

323    def _read_register(self, register, length):
324        self._buffer[0] = register & 0xFF
325        with self._i2c as i2c:
326            i2c.write(self._buffer, start=0, end=1)
327            i2c.readinto(self._buffer, start=0, end=length)
328            return self._buffer

the return statement is on line 328. Does this result from some sort of optimisation around return statements?

Presumably CP handles return statements within with ok?

This does not look like micropython#2056 btw.

@jepler
Copy link

jepler commented Mar 17, 2020

@kevinjwalters yes, it seems to be an implementation detail of micropython/circuitpython that a 'with' statement creates an extra line in a traceback. I don't think anything about that repeated line in the traceback is important to the problem at hand.

@kevinjwalters
Copy link

kevinjwalters commented Mar 22, 2020

Got another case of something similar here, copied some update .py files onto a CLUE and it seemed to get stuck running the code, control-c shows this stack trace (tried it three times):

Adafruit CircuitPython 5.0.0 on 2020-03-02; Adafruit CLUE nRF52840 Express with nRF52840
>>>
>>>
>>>
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
  File "code.py", line 47, in <module>
  File "adafruit_clue.py", line 886, in <module>
  File "adafruit_clue.py", line 207, in __init__
  File "adafruit_lsm6ds.py", line 220, in __init__
  File "adafruit_bus_device/i2c_device.py", line 68, in __init__
  File "adafruit_bus_device/i2c_device.py", line 166, in __probe_for_device
KeyboardInterrupt:



Press any key to enter the REPL. Use CTRL-D to reload.

It recovered after a power-cycle (unplugging USB).

@kevinjwalters
Copy link

kevinjwalters commented Oct 22, 2020

I've got some code reading the LIS3MDL magnetometer (only) frequently (around 1000 samples a second) and that has behaving a bit strangely at times with the value freezing, noted in adafruit/Adafruit_CircuitPython_LIS3MDL#4

It's now got worse and code hasn't run despite a few control-c and reloads and is stuck here each time:

Adafruit CircuitPython 5.3.1 on 2020-07-13; Adafruit CLUE nRF52840 Express with nRF52840
>>>
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
  File "code.py", line 157, in <module>
  File "adafruit_lis3mdl.py", line 229, in __init__
  File "adafruit_bus_device/i2c_device.py", line 68, in __init__
  File "adafruit_bus_device/i2c_device.py", line 170, in __probe_for_device
KeyboardInterrupt:

This has restored it to a working state:

Adafruit CircuitPython 5.3.1 on 2020-07-13; Adafruit CLUE nRF52840 Express with nRF52840
>>> from microcontroller import reset
>>> reset()

@creston-bob
Copy link

creston-bob commented Oct 22, 2020 via email

@Panometric
Copy link
Author

@creston-bob There is a soft solution. An I2C slave is simply a set of shift registers with a state machine. By adding clocks until the SDA goes high, the device can be recovered, and typically no data is lost.

With the psuedo code below I can typically recover the bus in < 1mS, without a reset, it sometimes takes longer and is somewhat device dependent on the size of transactions it allows. This code is being used for an LSM6DSO XL/Gyro

  //Set SCL to GPIO Out, Open collector
  //Set SDA to GPIO In
  solved= IS SDA High?
  for(tries=0;tries<=11 && !solved;tries++)
  {
		for(clocks=1; clocks <28; clocks++ )
		{
			/// Write SCL Low/Hi
			WritePin SCL low for 25-100uS;
			WritePin SCL high for 25-100uS;

			solved=Read SDA Pin
			if (solved) break;
		}
		if (solved) break;
  }
  /// Sets SCL/SDA back to I2C peripheral
 // Reset the I2C peripheral


@creston-bob
Copy link

creston-bob commented Oct 23, 2020 via email

@dhalbert
Copy link
Collaborator

We are planning to try to detect I2C bus hangups at a low level and do a toggling forced reset as necessary. Some of this could be done in a port-independent way, but some of the timeouts need to be done in the low-level drivers for each port. In some cases we have to modify the manufacturer-supplied libraries. See #2635 (comment).

@creston-bob
Copy link

creston-bob commented Oct 23, 2020 via email

@kyrreaa
Copy link

kyrreaa commented Mar 6, 2023

Just thought I'd let you know that the LSM6DSO has both I2C and SPI support and I am using it with SPI. I still get the hang. I suspect it happens if my code is at some stage of communicating with the chip while being OTA upgraded causing a reset of the MCU or being reprogrammed with a debugger at the "wrong time". I have not been successful in recovering without interrupting the LSM6DSO power as it does not have a reset pin.

@Panometric
Copy link
Author

@kyrreaa This should not be the same with SPI. A SPI devices bus state is reset every time the CS transitions. So as long as you are toggling CS, it should be OK. I2C devices get stuck in a state because they only have the two wires. The only way to fix it is to clock them back into idle, with an unknown number of clock cycles.

@dhalbert
Copy link
Collaborator

dhalbert commented Mar 6, 2023

For problematic devices that can hang, it's good to be able to power-cycle them. We have controllable I2C power on a numbe of boards, for power-saving reasons. Or, if the device is fairly low power consumption, you could power them from a GPIO pin.

@kyrreaa
Copy link

kyrreaa commented Mar 6, 2023

Normally I'd agree with you @Panometric, but real world experience has thought me differently. This also why I do have transistor-controlled supply to some SPI or I2C devices that lack reset pin (@dhalbert). On I2C it is even harder as they can be back-powered by the I2C pullups making it very annoying.
Feeding extra clock cycles seem to work for some devices on I2C but not all in my experience.
In my case the CS is indeed being controlled and I have verified this with a oscilloscope. Yet, once the device stops responding it is done.

It would be interesting to narrow down exactly when some devices hang like this, but that requires a lot of time and usually that is not an item available in abundance.

@ilikecake
Copy link

Was this ever implemented in Circuit Python? I am seeing a stuck bus issue with I2C using:
Adafruit CircuitPython 8.2.8 on 2023-11-16; Adafruit Feather ESP32S3 4MB Flash 2MB PSRAM with ESP32S3

In a very specific situation, the clock line gets stuck low. In my case, I am trying to initialize an SHT40 sensor, but no sensor is present on the bus.

try:
    sht = adafruit_sht4x.SHT4x(i2c)
except:
    print("No SHT40 device detected")

This causes the code to crash next time I attempt to use the bus

Traceback (most recent call last):
  File "code.py", line 209, in <module>
  File "i2c_expanders/digital_inout.py", line 55, in switch_to_output
  File "i2c_expanders/digital_inout.py", line 99, in direction
  File "i2c_expanders/PCA9555.py", line 129, in iodir
  File "i2c_expanders/i2c_expander.py", line 80, in _read_u16le
OSError: [Errno 116] ETIMEDOUT

I can fix this specific error by trying to write again to the bus:

while not i2c.try_lock():
    pass
try:
    i2c.writeto(0x00, b"") 
except:
    pass
i2c.unlock()

This code does not actually seem to send a byte of data on the bus as I expected it would, but it does fix the problem in this case. However, this fix seems very specific, and a more general fix to detect a stuck bus and automatically correct it would be great. However, I am pretty sure that a general fix would have to happen at a level below the python code.

Good I2C transaction:
image

Bad I2C transaction:
image

@gedeondt
Copy link

gedeondt commented Apr 24, 2024

I think I have a similar issue. Trying to use focaltouch library with my m5 cores3 device. It works for a few seconds and then:

OSError: [Errno 116] ETIMEDOUT

I have tried to modify the original code of the library and a can improve a bit repeating reads and things like that but I dont get it to work properly.

It fails mainly when you try to read more than 6 bytes in a row.

After some consecutive fails the feedback changes and it says that there is a problem with de pullup resistors.

@dhalbert
Copy link
Collaborator

dhalbert commented Apr 24, 2024

@gedeondt Please try CircuitPython 9.1.0-beta.1 if you have not already. Espressif has fixed some ESP32-S3 I2C bugs. There is another fixed bug in the works but it is not yet backported to any ESP-IDF releases.

@gedeondt
Copy link

Thanks @dhalbert. I am using the 9.1.0-beta. Ok so I will wait for the bug to be backported. I was wandering If it could be a hardware malfunction but the DEMO app that came installed worked perfectly regarding the touchscreen so I guess it is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants