-
-
Notifications
You must be signed in to change notification settings - Fork 36.1k
Description
The problem
I have a TCP modbus device that does simple polling of some modbus registers. This works quite well, until once every few hours the modbus interface doesn't respond to a request on time (for whatever reason). The problem is not the timed out request, but the modbus stack seems to enter a state where it does not recover from the failure: Every subsequent request will also fail, even though the modbus interface itself continues to operate as it should. It appears that in fact no more requests are actually being sent to the modbus interface.
When this bug strikes, the modbus entities no longer update, and one debug log message is emitted on every update. There is no proper error signalled, it seems. Debug message:
2023-08-29 08:55:58.170 DEBUG (SyncWorker_1) [homeassistant.components.modbus.modbus] Pymodbus: hub1: Modbus Error: [Connection] Failed to connect[ModbusTcpClient(192.168.1.186:8899)]
If I restart the HA process, or use other software to interact with the modbus interface, it works perfectly. Also, tcpdump confirms that once the bug strikes, no more request are sent over the wire to the modbus interface.
I have tried to debug the problem, which was a bit difficult because it can take many hours for the error condition to happen to trigger the bug. However, I can artificially inject an error that I believe triggers the bug by modifying pymodbus/client/tcp.py method recv around line 209 to the following:
self.foo -= 1
while recv_size > 0:
try:
wait = [self.socket]
if self.foo<=0:
wait = []
self.foo = 1000
ready = select.select(wait, [], [], end - time_)
Before this, e.g. in the constructor, self.foo is initialized to 10 or so. The idea is to have HA settle, then force a single "no response" timeout from select. Obviously the modbus request where this fault injection occurs will fail. The bug however is that all subsequent requests also fail.
As far as I can tell, what happens is that the TCP socket is closed in pymodbus/transaction.py:251 and is then never even attempted to be opened again. I don't understand if (triggering) re-opening the socket should happen from inside pymodbus or if it's the ha modbus component that should act on this.
What version of Home Assistant Core has the issue?
core-2023.9.0.dev0
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant Core
Integration causing the issue
modbus
Link to integration documentation on our website
https://www.home-assistant.io/integrations/modbus/
Diagnostics information
No response
Example YAML snippet
No response
Anything in the logs that might be useful for us?
2023-08-29 08:55:58.170 DEBUG (SyncWorker_1) [homeassistant.components.modbus.modbus] Pymodbus: hub1: Modbus Error: [Connection] Failed to connect[ModbusTcpClient(192.168.1.186:8899)]Additional information
No response