-
Notifications
You must be signed in to change notification settings - Fork 1.3k
I2C Bus error leaves board unrecoverable without power down #2635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
yeah this is a thing that happened with CODAL & the LIS3DH. very odd that this chip is particularly wierded out sometimes! |
Deferring this to after 5.0.0. It's a very good idea, but we want to vet it against a bunch of I2C devices first, since there may be a few that are pathological. Another discussion: |
In a project using the Microchip PIC32MZ microcontroller I discovered that the first two versions (v1 & v2) have a hardware bug in the silicon that causes the I2C to hang occasionally. I dug into the matter and found the errata sheets indicated that v1 & v2 had this flaw but it was apparently fixed in v3. After trying out all Microchip's suggestions for fixing the hung I2C from within my program code (and failing), I eventually concluded that the ONLY possible fix was a power reset (which works reliably). I have also discovered that without doing anything, eventually the I2C will recover by itself but it can take many hours before that happens. As a solution, I have an Arduino board listening to the data stream coming from the PIC to detect the problem and rectify it. So how do I detect the I2C malfunction? Simple ... I've programmed a PIC32 onboard timer interrupt that checks the SCL and SDA pins after every transaction ... if the pins are both high (idle mode) then the I2C transaction was successful but if either pin is low then we have a malfunction and the integer 1 is appended to the data being sent to the server. The integer is sliced off the data after being checked to see it it's a 1 or 0. This scheme works extremely well but has made me wary of I2C malfunctions from any silicon. I suspect that's what is happening in this particular circumstance and this is how you fix it. |
I forgot to mention that the Arduino board I mentioned in my previous comment energizes a relay to power reset the PIC32MZ board. The power is connected through the relay's NC contact which is opened for a few seconds to cause the power reset. |
I just got this by pressing reset button on a CLUE (alpha) then trying to load
|
Another example of a CLUE in forums being discussed with @caternuson: Adafruit Forums: clue_display_sensor_data.py not working. Reading from the accelerometer (LSM6DS33) could be implicated. Same
|
Why does the stack trace show
If that's from this code,
the Presumably CP handles This does not look like micropython#2056 btw. |
@kevinjwalters yes, it seems to be an implementation detail of micropython/circuitpython that a 'with' statement creates an extra line in a traceback. I don't think anything about that repeated line in the traceback is important to the problem at hand. |
Got another case of something similar here, copied some update
It recovered after a power-cycle (unplugging USB). |
I've got some code reading the LIS3MDL magnetometer (only) frequently (around 1000 samples a second) and that has behaving a bit strangely at times with the value freezing, noted in adafruit/Adafruit_CircuitPython_LIS3MDL#4 It's now got worse and code hasn't run despite a few control-c and reloads and is stuck here each time:
This has restored it to a working state:
|
As I commented a while back, an I2C error is usually caused by a transaction not returning to the IDLE state with BOTH Clock and Data in a high state. If your I2C circuit gets hung or “freezes”, get out your multimeter and see if the two lines are BOTH in a high state (IDLE) or if one is high and other is low (not in IDLE). This is often an error at the silicon level of the microcontroller or sensor component and CANNOT be fixed with any type of soft reset — no Control-C or software twiddling will return the I2C circuit to an IDLE state — ONLY A HARD POWER REBOOT of the microcontroller and/or the sensor will fix this problem because it reinitializes the component at the silicon level.
In a weather station I built a few years ago, a bug in the I2C circuit of a Microchip PIC32 microcontroller randomly caused a couple of sensors not to be read because the PIC I2C circuit would fail to return to IDLE following a transaction. I experimented relentlessly with this until I discovered the above reality which turned out to be a documented silicon bug for that generation of microcontroller which, I believe, is now fixed.
It’s easy to have the microcontroller read the state of the I2C circuit after each transaction to check if the I2C circuit is in an IDLE state and to signal to the upstream destination that the PIC needs a reboot. The upstream device (a single board computer) would then open a relay contact to break the power to the PIC for a couple of seconds — voila! — I2C freeze gone. Still works reliably to this day, a few years later.
… On Oct 22, 2020, at 11:15 AM, kevinjwalters ***@***.***> wrote:
I've got some code reading the LIS3MDL magnetometer (only) frequently (around 1000 samples a second) and that has behaving a bit strangely at times with the value freezing, noted in adafruit/Adafruit_CircuitPython_LIS3MDL#4 <adafruit/Adafruit_CircuitPython_LIS3MDL#4>
It's now got worse and code hasn't run despite a few control-c and reloads and is stuck here each time:
Adafruit CircuitPython 5.3.1 on 2020-07-13; Adafruit CLUE nRF52840 Express with nRF52840
>>>
soft reboot
Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
Traceback (most recent call last):
File "code.py", line 157, in <module>
File "adafruit_lis3mdl.py", line 229, in __init__
File "adafruit_bus_device/i2c_device.py", line 68, in __init__
File "adafruit_bus_device/i2c_device.py", line 170, in __probe_for_device
KeyboardInterrupt:
|
@creston-bob There is a soft solution. An I2C slave is simply a set of shift registers with a state machine. By adding clocks until the SDA goes high, the device can be recovered, and typically no data is lost. With the psuedo code below I can typically recover the bus in < 1mS, without a reset, it sometimes takes longer and is somewhat device dependent on the size of transactions it allows. This code is being used for an LSM6DSO XL/Gyro
|
That’s nice if you have I2C circuitry that is accessible from machine code or higher … no doubt this approach you have will work just fine for some microcontrollers but it won’t work universally with all. In the Microchip case I encountered they had posted several software solutions that might work to overcome the problem but the PIC32 chip I was using did NOT respond to ANY of the proposed solutions. That’s why they eventually posted a silicon error for that chip version that could only be fixed via a power reset. The silicon WAS fixed in the next version of the chip. Microcode within a chip can rarely overcome fundamental malfunctions in the underlying silicon. If the silicon is locked up in an error state it will most likely have to be rebooted (hard reset, not soft reset) to get back to normal functionality.
So the answer is to try everything possible in your program, including the sample code in this thread, to recover but a hard reset might be the only solution. Or changing to a different microcontroller part number that doesn’t have the problem. ;-)
… On Oct 23, 2020, at 10:59 AM, Mike Mitchell ***@***.***> wrote:
@creston-bob <https://github.com/creston-bob> There is a soft solution. An I2C slave is simply a set of shift registers with a state machine. By adding clocks until the SDA goes high, the device can be recovered, and typically no data is lost.
With the psuedo code below I can typically recover the bus in < 1mS, without a reset, it sometimes takes longer and is somewhat device dependent on the size of transactions it allows. This code is being used for an LSM6DSO XL/Gyro
//Set SCL to GPIO Out, Open collector
//Set SDA to GPIO In
solved= IS SDA High?
for(tries=0;tries<=11 && !solved;tries++)
{
for(clocks=1; clocks <28; clocks++ )
{
/// Write SCL Low/Hi
WritePin SCL low for 25-100uS;
WritePin SCL high for 25-100uS;
solved=Read SDA Pin
if (solved) break;
}
if (solved) break;
}
/// Sets SCL/SDA back to I2C peripheral
// Reset the I2C peripheral
|
We are planning to try to detect I2C bus hangups at a low level and do a toggling forced reset as necessary. Some of this could be done in a port-independent way, but some of the timeouts need to be done in the low-level drivers for each port. In some cases we have to modify the manufacturer-supplied libraries. See #2635 (comment). |
You shouldn’t have any problem detecting a bus hangup … as I mentioned, you can do that with a multimeter … but the trick is how to get the I2C port reset back to an IDLE state. Some microcontrollers will likely be easy with a reset signal (or your code-level example), others not so easy. In the mentioned PIC32 case, there simply wasn’t any form of reset signal that overcame the hangup of the I2C silicon circuitry. Obviously, that’s a rare case but it was interesting and instructive. ;-)
Since I2C takes two to tango, the slave device can also cause a hangup in some cases. Recovering an IDLE condition at the microcontroller may, or may not, recover the sensor’s functionality, it depends on its internal design.
Just some thoughts, hope they’ll be helpful. ;-)
… On Oct 23, 2020, at 12:23 PM, Dan Halbert ***@***.***> wrote:
We are planning to try to detect I2C bus hangups at a low level and do a toggling forced reset as necessary. Some of this could be done in a port-independent way, but some of the timeouts need to be done in the low-level drivers for each port. In some cases we have to modify the manufacturer-supplied libraries. See #2635 (comment) <#2635 (comment)>.
|
Just thought I'd let you know that the LSM6DSO has both I2C and SPI support and I am using it with SPI. I still get the hang. I suspect it happens if my code is at some stage of communicating with the chip while being OTA upgraded causing a reset of the MCU or being reprogrammed with a debugger at the "wrong time". I have not been successful in recovering without interrupting the LSM6DSO power as it does not have a reset pin. |
@kyrreaa This should not be the same with SPI. A SPI devices bus state is reset every time the CS transitions. So as long as you are toggling CS, it should be OK. I2C devices get stuck in a state because they only have the two wires. The only way to fix it is to clock them back into idle, with an unknown number of clock cycles. |
For problematic devices that can hang, it's good to be able to power-cycle them. We have controllable I2C power on a numbe of boards, for power-saving reasons. Or, if the device is fairly low power consumption, you could power them from a GPIO pin. |
Normally I'd agree with you @Panometric, but real world experience has thought me differently. This also why I do have transistor-controlled supply to some SPI or I2C devices that lack reset pin (@dhalbert). On I2C it is even harder as they can be back-powered by the I2C pullups making it very annoying. It would be interesting to narrow down exactly when some devices hang like this, but that requires a lot of time and usually that is not an item available in abundance. |
Was this ever implemented in Circuit Python? I am seeing a stuck bus issue with I2C using: In a very specific situation, the clock line gets stuck low. In my case, I am trying to initialize an SHT40 sensor, but no sensor is present on the bus.
This causes the code to crash next time I attempt to use the bus
I can fix this specific error by trying to write again to the bus:
This code does not actually seem to send a byte of data on the bus as I expected it would, but it does fix the problem in this case. However, this fix seems very specific, and a more general fix to detect a stuck bus and automatically correct it would be great. However, I am pretty sure that a general fix would have to happen at a level below the python code. |
I think I have a similar issue. Trying to use focaltouch library with my m5 cores3 device. It works for a few seconds and then: OSError: [Errno 116] ETIMEDOUT I have tried to modify the original code of the library and a can improve a bit repeating reads and things like that but I dont get it to work properly. It fails mainly when you try to read more than 6 bytes in a row. After some consecutive fails the feedback changes and it says that there is a problem with de pullup resistors. |
@gedeondt Please try CircuitPython 9.1.0-beta.1 if you have not already. Espressif has fixed some ESP32-S3 I2C bugs. There is another fixed bug in the works but it is not yet backported to any ESP-IDF releases. |
Thanks @dhalbert. I am using the 9.1.0-beta. Ok so I will wait for the bug to be backported. I was wandering If it could be a hardware malfunction but the DEMO app that came installed worked perfectly regarding the touchscreen so I guess it is not. |
In my example, in a loop reading the device as fast as possible, and doing other unrelated things..
You can get an error:
Once this occurs, the I2C device bus is stuck because the device is holding the bus. The board never recovers unless you power it down. Every restart just issues a RuntimeError. This could happen on any I2C device.
The industry standard response to this is to bit-bang single clocks onto SCLK until the bus is released by the slave. This should be done at start, or even on error to recover the bus silently. It usually requires disconnecting the peripheral temporarily, using the SCLK os GPIO and then reconnecting.
The text was updated successfully, but these errors were encountered: