Skip to content

Failed to write data to connection IPv4Address ... #706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bastienhubert opened this issue Apr 6, 2022 · 5 comments
Closed

Failed to write data to connection IPv4Address ... #706

bastienhubert opened this issue Apr 6, 2022 · 5 comments

Comments

@bastienhubert
Copy link

bastienhubert commented Apr 6, 2022

Hello guys,

I encounter the following problem since a few weeks, each morning this error occurs:

Failed to write data to connection IPv4Address(('X', X)) (IPv4Address(('X', X)))

  • Neo4j version: Community 4.3.9

  • Neo4j Mode: Single instance

  • Driver version: python 3.8 - neo4j-driver 4.4.1 - flask 1.1.2 - gunicorn 20.0.4

  • Operating system: Server are hosted on Scalingo with Ubuntu 20.04

  • Steps to reproduce: Do a call to our Flask API to write in the DB.

  • Expected behavior: Call succeeds without any error

  • Actual behavior: Call succeeds but with a previous "Failed to write data" error

Additional info:

  • This error appeared after updating our Database from Neo4j 3.5.2 to Neo4j 4.3.9, and our neo4j-driver from 1.7.2 to 4.4.1

  • This errors occurs each day for the very first writing call in the database.

  • This error is currently non blocking, the connection seems simply lost and the Gunicorn app executes the code correctly after raising this error, here is an example of the logs:

`2022-04-06 10:20:36.186129103 +0200 CEST[web-1] Failed to write data to connection IPv4Address(('X', X)) (IPv4Address(('X', X)))

2022-04-06 10:20:36.183971778 +0200 CEST[web-1] 10.0.0.48 - - [06/Apr/2022:08:20:36 +0000] "POST /similarity/profile-async HTTP/1.1" 200 89 "-" "-"

2022-04-06 10:20:36.649252548 +0200 CEST[web-1] 2022-04-06 08:20:36.648783 (+ 1225.547708 sec) refresh_nb_matching_for_all_missions started!

2022-04-06 10:20:41.710179194 +0200 CEST[web-1] 2022-04-06 08:20:41.709843 (+ 5.06106 sec) refresh_nb_matching_for_all_missions done!

2022-04-06 10:20:41.711310984 +0200 CEST[web-1] Matching Profile Done`

  • I already checked a few posts and contacted GrapheneDB (which hosts our Neo4j databases) and tried a few solutions:
  1. Checking if there is no trouble on the GrapheneDB side

  2. Closing each session at the end of each transaction in our API

  3. Checking if there is no multi processing on our APP (seems to not be the case)

  4. Changing the default max_connection_lifetime to 10 minutes

  5. Changing the connection protocol, previous was "bolt", we currently are using "bolt+s"

I tried to be as clear as possible but don't hesitate if you need any additional info !

Thanks in advance for your help, have a nice day :)

@robsdedude
Copy link
Member

Hello and thanks for reaching out.

I'll have a closer look at this tomorrow, but until then, could you please enable debug logging and post the logs up until the failure here? When you redact IP addresses, could you please replace them with a unique identifier (e.g., only remove the first two bytes or something)?

@bastienhubert
Copy link
Author

Hello, thanks for your fast answer.

I just enabled the logging debug on our test server, I'm currently not able to reproduce the error, it already happens once today before enabling the debug logging.

I will let you know and will transmit the logs ASAP when it will occur again, probably tomorrow morning.

@bastienhubert
Copy link
Author

bastienhubert commented Apr 8, 2022

Hello, the error occured this night at 04:21:44.737134761, here are the logs:

2022-04-08 04:20:33.616489640 +0200 CEST [web-1] 2022-04-08 02:20:33.616174 (+ 290.128911 sec) refresh_nb_matching_for_all_missions started!
2022-04-08 04:20:33.616327957 +0200 CEST [web-1] [#9FFC] State: TX_READY_OR_TX_STREAMING > READY
2022-04-08 04:20:33.616314785 +0200 CEST [web-1] [#9FFC] S: SUCCESS {'bookmark': 'FB:kcwQQedIFYgOTkSXKSGp/km2csoAHJV0kA=='}
2022-04-08 04:20:33.611848585 +0200 CEST [web-1] [#9FFC] C: COMMIT
2022-04-08 04:20:33.611815802 +0200 CEST [web-1] [#9FFC] S: SUCCESS {'stats': {'properties-set': 1}, 'type': 'rw', 't_last': 1, 'db': 'graph.db'}
2022-04-08 04:20:33.645418435 +0200 CEST [web-1] [#9FFC] C: PULL {'n': 1000}
2022-04-08 04:20:33.645238456 +0200 CEST [web-1] [#9FFC] C: RUN '\n OPTIONAL MATCH (m:Mission {id:$mission_id})<-[r_sr:SIMILARITY_RATIO]-(:Profile)<-[:HAS_PROFILE]-(:User)\n WHERE r_sr.matching >= $min_matching\n WITH m, count(r_sr) AS nbMatchingFreelancers\n SET m.nbMatchingFreelancers = nbMatchingFreelancers\n ' {'min_matching': 60, 'mission_id': '843488d8-41a0-4286-ac51-ec630abfbf2d'} {}
2022-04-08 04:20:33.644869270 +0200 CEST [web-1] [#9FFC] State: READY > TX_READY_OR_TX_STREAMING
2022-04-08 04:20:33.644854257 +0200 CEST [web-1] [#9FFC] S: SUCCESS {}
2022-04-08 04:20:33.621375576 +0200 CEST [web-1] [#9FFC] C: BEGIN {}
2022-04-08 04:20:33.621064798 +0200 CEST [web-1] [#9FFC] State: STREAMING > READY
2022-04-08 04:20:33.621062737 +0200 CEST [web-1] [#9FFC] S: SUCCESS {'bookmark': 'FB:kcwQQedIFYgOTkSXKSGp/km2csoAHJV0kA==', 'type': 'r', 't_last': 1, 'db': 'graph.db'}
2022-04-08 04:20:33.620771311 +0200 CEST [web-1] [#9FFC] S: RECORD * 1
2022-04-08 04:20:33.620570120 +0200 CEST [web-1] [#9FFC] State: READY > STREAMING
2022-04-08 04:20:33.670058935 +0200 CEST [web-1] Matching Profile Done
2022-04-08 04:20:33.670017851 +0200 CEST [web-1] 2022-04-08 02:20:33.669693 (+ 0.053519 sec) refresh_nb_matching_for_all_missions done!
2022-04-08 04:20:33.669708604 +0200 CEST [web-1] [#9FFC] State: TX_READY_OR_TX_STREAMING > READY
2022-04-08 04:20:33.669705210 +0200 CEST [web-1] [#9FFC] S: SUCCESS {'bookmark': 'FB:kcwQQedIFYgOTkSXKSGp/km2csoAHJV4kA=='}
2022-04-08 04:20:33.666080132 +0200 CEST [web-1] [#9FFC] C: COMMIT
2022-04-08 04:20:33.666022437 +0200 CEST [web-1] [#9FFC] S: SUCCESS {'stats': {'properties-set': 1}, 'type': 'w', 't_last': 0, 'db': 'graph.db'}
2022-04-08 04:20:33.665803569 +0200 CEST [web-1] [#9FFC] S: SUCCESS {'t_first': 18, 'fields': [], 'qid': 0}
2022-04-08 04:21:44.736829340 +0200 CEST [web-1] [#C660] C: GOODBYE
2022-04-08 04:21:44.736820647 +0200 CEST [web-1] [#C660] C: removing old connection (closed=False, defunct=False, stale=True, in_use=False)
2022-04-08 04:21:44.737134761 +0200 CEST [web-1] Failed to write data to connection IPv4Address(('my-test-db.graphenedb.com', 24786)) (IPv4Address(('X.X.189.114', 24786)))
2022-04-08 04:21:44.737132569 +0200 CEST [web-1] [#C660] [Errno 32] Broken pipe
2022-04-08 04:21:44.748938137 +0200 CEST [web-1] [#C660] C:
2022-04-08 04:21:44.765720037 +0200 CEST [web-1] [#AB80] C: RUN '\n MATCH (mr:MatchingRole)-[:HAS_MATCHING_ROLE_CLUSTER]->(mrc:MatchingRoleCluster)\n MATCH (mrc)-[r_hpf:HAS_PENALIZATION_FACTOR]->(pf:PenalizationFactor)\n MATCH (mrc)-[r_hvf:HAS_VALORIZATION_FACTOR]->(vf:ValorizationFactor)\n RETURN mr.id as id, r_hpf.lang[0] as penalizationLang, r_hvf.soft[0] as valorizationSoft\n ' {} {}
2022-04-08 04:21:44.765512696 +0200 CEST [web-1] [#AB80] State: CONNECTED > READY
2022-04-08 04:21:44.765451644 +0200 CEST [web-1] [#AB80] S: SUCCESS {'server': 'Neo4j/4.3.9', 'connection_id': 'bolt-37444', 'hints': {}}
2022-04-08 04:21:44.762857300 +0200 CEST [web-1] [#AB80] C: HELLO {'user_agent': 'neo4j-python/4.4.1 Python/3.8.5-final-0 (linux)', 'scheme': 'basic', 'principal': 'neo4j', 'credentials': '*******'}
2022-04-08 04:21:44.762662423 +0200 CEST [web-1] [#AB80] S: 0x00000304
2022-04-08 04:21:44.760483639 +0200 CEST [web-1] [#AB80] C: 0x00020404 0x00000104 0x00000004 0x00000003
2022-04-08 04:21:44.760482257 +0200 CEST [web-1] [#AB80] C: 0x6060B017
2022-04-08 04:21:44.754492305 +0200 CEST [web-1] [#AB80] C: my-test-db.graphenedb.com
2022-04-08 04:21:44.752790308 +0200 CEST [web-1] [#0000] C: 35.180.189.114:24786
2022-04-08 04:21:44.750233021 +0200 CEST [web-1] [#0000] C: my-test-db.graphenedb.com:24786
2022-04-08 04:21:44.749530680 +0200 CEST [web-1] [#0000] C:
2022-04-08 04:21:44.765872737 +0200 CEST [web-1] [#AB80] C: PULL {'n': 1000}
2022-04-08 04:21:44.768700561 +0200 CEST [web-1] [#AB80] S: SUCCESS {'bookmark': 'FB:kcwQQedIFYgOTkSXKSGp/km2csoAHJWckA==', 'type': 'r', 't_last': 0, 'db': 'graph.db'}
2022-04-08 04:21:44.768467419 +0200 CEST [web-1] [#AB80] State: READY > STREAMING
2022-04-08 04:21:44.768416836 +0200 CEST [web-1] [#AB80] S: SUCCESS {'t_first': 1, 'fields': ['id', 'penalizationLang', 'valorizationSoft']}

Thanks in advance

@robsdedude
Copy link
Member

Thanks for the logs and sorry for the late reply, I was at a conference last week.

After reading through the logs here is what happens:

  • The driver tries to pick up a connection from the pool.
  • While doing so it encounters a stale connection (there are several events that can mark a connection as stale, but if you are not using LDAP or SSO, it's most likely that the connection exceeded its max_connection_lifetime)
  • It closes the connection, but before doing so, it's polite and sends a GOODBYE message to the server.
  • This message fails with [Errno 32] Broken pipe. Probably because the server or some proxy or any network device already killed the connection for being idle for too long.
  • The driver ignored this, because it was about to ditch that connection anyway.
  • It continues to go through the connection pool, finds a non-stale connection, uses it for the query, and all lived happily ever after.

So what I recommend doing really, is not looking all too much into the driver logs. If there is a condition the driver can't handle, its not the logs that will notify you about it, but an exception that is being raised.

Since this is not a bug, I will close this issue. But please feel free to keep commenting if you have further questions.

@bastienhubert
Copy link
Author

Hello, thanks for your help and your answer ! It's now really clear, have a nice day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants