Skip to content
This repository was archived by the owner on Oct 23, 2022. It is now read-only.
This repository was archived by the owner on Oct 23, 2022. It is now read-only.

Flaky multi-node tests #134

@koivunej

Description

@koivunej

Running the pubsub tests repeatedly like:

while target/debug/deps/pubsub-c48a8707038c37a4 publish_between_two_nodes; do echo; echo '...'; echo; done

(Your executable name will vary, use cargo test -- publish_between_two_nodes to find out)

With all logging at level at trace this does produce at least two kinds of failures. Seen errors:

  • [2020-04-02T09:41:35Z DEBUG libp2p_tcp] Dropped TCP connection to undeterminate peer when the 10s timeout started
  • [2020-04-02T09:41:08Z DEBUG libp2p_secio] error during secio handshake IoError(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" }) even after sending pubsub rpc's back and forth but before the 10s timeout started

In a highly unscientific "lets run these in multiple shells until they fail, then inspect reason" both seem to happen as often. I think I'll just reference this comment in a tracking issue, this might already be better handled with the just released 0.17 of libp2p. Being resilient on disconnects in the test is one possibility but I am used to running tests in Linux and Windows over loopback without much of an issue. Then again, my systems have probably never been as loaded as any public CI server.

By running other tests than the pubsub_between_two_nodes they can also fail but I didn't recover any information on those as 5 tests were executed in parallel and only one of them turned logging on so let it be a small data point.

Originally posted by @koivunej in #133 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    CIbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions