Skip to content

modbus component fails to deal with TCP timeout #99257

@frodef

Description

@frodef

The problem

I have a TCP modbus device that does simple polling of some modbus registers. This works quite well, until once every few hours the modbus interface doesn't respond to a request on time (for whatever reason). The problem is not the timed out request, but the modbus stack seems to enter a state where it does not recover from the failure: Every subsequent request will also fail, even though the modbus interface itself continues to operate as it should. It appears that in fact no more requests are actually being sent to the modbus interface.

When this bug strikes, the modbus entities no longer update, and one debug log message is emitted on every update. There is no proper error signalled, it seems. Debug message:

2023-08-29 08:55:58.170 DEBUG (SyncWorker_1) [homeassistant.components.modbus.modbus] Pymodbus: hub1: Modbus Error: [Connection] Failed to connect[ModbusTcpClient(192.168.1.186:8899)]

If I restart the HA process, or use other software to interact with the modbus interface, it works perfectly. Also, tcpdump confirms that once the bug strikes, no more request are sent over the wire to the modbus interface.

I have tried to debug the problem, which was a bit difficult because it can take many hours for the error condition to happen to trigger the bug. However, I can artificially inject an error that I believe triggers the bug by modifying pymodbus/client/tcp.py method recv around line 209 to the following:

    self.foo -= 1
    while recv_size > 0:
        try:
            wait = [self.socket]
            if self.foo<=0:
                wait = []
                self.foo = 1000
            ready = select.select(wait, [], [], end - time_)

Before this, e.g. in the constructor, self.foo is initialized to 10 or so. The idea is to have HA settle, then force a single "no response" timeout from select. Obviously the modbus request where this fault injection occurs will fail. The bug however is that all subsequent requests also fail.

As far as I can tell, what happens is that the TCP socket is closed in pymodbus/transaction.py:251 and is then never even attempted to be opened again. I don't understand if (triggering) re-opening the socket should happen from inside pymodbus or if it's the ha modbus component that should act on this.

What version of Home Assistant Core has the issue?

core-2023.9.0.dev0

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Core

Integration causing the issue

modbus

Link to integration documentation on our website

https://www.home-assistant.io/integrations/modbus/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2023-08-29 08:55:58.170 DEBUG (SyncWorker_1) [homeassistant.components.modbus.modbus] Pymodbus: hub1: Modbus Error: [Connection] Failed to connect[ModbusTcpClient(192.168.1.186:8899)]

Additional information

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions