-
Notifications
You must be signed in to change notification settings - Fork 197
Differentiate fatal errors from non-fatal errors in logs #892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reaching out. You've linked issue 706 and I'd like to repeat what I've written there
The driver's logs are not where you should look for fatal exceptions. The driver will raise an exception if it can't handle something. So the logging you should probably filter for is log messages from the client application using the driver and not the driver it self. The driver's logs might then be used to drill further into the issue. |
Thanks for writing this out, this is very helpful to know. To offer a bit more context, our setup pages the on-call if the number of error logs exceeds a given (usually low) threshold, because error logs usually indicate something urgent that should be looked into immediately. Since these neo4j driver error log messages recover and don't require an urgent response (since there really isn't anything that the driver user can do to respond to it), I feel like it would make sense for these non-fatals to be at "warn" level instead. I totally understand if that's not the opinion of this project though -- if this is the case, would it make sense to document this behavior explicitly somewhere? I recall that earlier versions of the driver (e.g. 1.7.6) emitted warnings and not errors so this appears to be a change in behavior. |
I'd like to echo @achantavy issue, for us, non-fatal or no, all log messages of error or above are being sent to our reporting tool and we are getting alot of messages from the neo4j sync driver around failed write connections. We are using neo4j ~5.14 This has taken time and effort to discover that they're non-fatal and unless we manage these somehow they continue to confuse other developers and contribute towards our usage for our monitoring service. If these are indeed non-fatal is there a reason to not lower them to warnings? alternatively could there be more granularity over the loggers you use? for example 'neo4j.sync.io', that way I can specifically set the verbosity for the driver to critical without risking missing a significant error from another part of the system. |
Python allows you to easily configure logging per logger. The driver uses the logger name "neo4j". So you can simply do something along the lines of import logging
neo4j_logger = logging.getLogger("neo4j")
neo4j_logger.setLevel(logging.CRITICAL) |
Hi @robsdedude, my concern with this approach is inadvertantly supressing fatal-errors which would be important to have visibility of. In genreral I would also expect to be of the error verbosity level. Can I confirm the neo4j-driver only uses CRITICAL for such errors? |
@robsdedude I've submitted a PR that implements hierarchical logging to tackle our issue: #997, this would ensure we are not supressing fatal-errors whilst giving us the control to reduce the verbosity of the driver logs. |
The driver doesn't log fatal errors for you. If the driver encounters some error state which you, as the user, need to be aware of, it'll raise an exception. That's imo the right mechanism for such things, not logging. You may then choose to log those in your application as you see fit. If you think of it further, requiring libraries to only log errors if they are fatal is futile. There might be a library A wrapping another library B and implementing some retry logic around it. How could B then possibly know if an error is fatal or not. Side note: the driver in itself has multiple layers doing such abstractions. Retry logic sits in a level around raw connections. The connections have absolutely no idea whether their failure will be retried or not. So in essence the request boils down to: the error log level (and potentially warning as well) is only usable at the very top application level where you can be sure your code isn't consumed as a library. |
Hey @robsdedude , let me chime in on that one as I had the same issue. I understand your perspective although I'm not sure if I agree with it for a couple of reasons, let me list them here:
That's my two cents, as I've mentioned above, I get your idea here and it's nifty. However, I'm afraid that it might be dangerous if you basically suggest to users to disable the library's logging or keep it at the critical level. That way you also get rid of the way to communicate with a developer using warning/error/critical messages |
Thanks for your views @rafaljanicki
I still think all the logs generated by the driver are meant to be consumed with caution. They'll tell you what's going on inside the driver. And an exception raised from the OS to the driver is at least a warning1, depending on if that portion of the driver knows how to handle it or not. Again, for anything external to the driver, we'll use return values, raise exceptions, emit warnings (not logging), etc. I cannot (as explained above) log from the perspective of the driver user as I don't know their code, but only from the driver's perspective. I'm considering to change the default level of the Footnotes
|
Hi @robsdedude thankyou for engaging on this issue. would there be any issue in reviewing my PR which implements hierarchical loggers? In anycase, If your proposed solution is for us to not worry about errors of error level and below I can set the neo4j root logger to CRITICAL. But i'd like to emphasise the broader implication this has for our organisation. Setting the neo4j logger to critical is not an insignificant change in so far as the default assumption we have is that any error, regardless of library or application, indicates a need for action or change in configuration. Misinterpretations like these lead to internal discussions and confusion and time spent debugging the code and searching for github issues like this one. For instance, I've spent a number of hours now combing through the python neo4j driver documentation to understand whether I should be doing something differently in order to manage the driver lifecycle because of this error. This would be typical for our developers and we either have to educate internally or add this to our standard template of logging overrides with a note as to why neo4j specifically is being set to critical. I appreciate that all might be out of the scope of the neo4j library but hope it factors into your considerations. For me personally, similar to botocore, i'd quite like to be able to specifically set the logging of the bolt module specifically through something like Best regards, Oliver |
Sorry I see you have replied to my PR, thankyou |
It would at least reflect how the library uses logging: don't look at it unless you want to drill into the driver. Would it satisfy your needs @OliverFarren @rafaljanicki @achantavy ? Do you see a problem with it? |
That'd definitely satisfy my case, thanks @robsdedude ! |
@robsdedude i'd prefer the default wasn't set to critical. We frequently lower the verbosity to DEBUG and find the transaction logs useful. If the verbosity defaults to critical we would have to explicitly set the logger to debug which runs in opposition to how we manage other library loggers (I can't think of another library we use that doesn't by default have the logger level set to {DEBUG, INFO}). This would also have impact for when we upgrade the neo4j package in services since we'd need to make application changes. Personally i'd prefer the hierarchical loggers that way I could set the bolt logger to CRITICAL explicitly whilst being able to keep view the transaction logs and it doesn't alter the behaviour when upgrading the library |
With the merging of #997 i'm happy to set the child logger to critical in my applications. Thanks! |
I'll close the issue now as I think we've discussed the topic fairly well, added hierarchical logging to the driver to aid more selective silencing. If anyone has arguments (pro or con) that have not yet been mentioned, please feel free to keep commenting. |
Feature Request
As seen in #706, it is possible for error messages like
Failed to write data to connection IPv4Address(('X', X)) (IPv4Address(('X', X)))
to not actually be associated with a transaction failure. For example, it's possible for the application to have a connection fail, have the driver emit that message, pick a new connection from the pool, and continue happily completing the transaction. For searchability, it'd be very useful to know if a given log message resulted in a transaction failing.Pitch
As a dev building an app that uses the neo4j python driver, I would like to know if log messages like "Failed to write data to connection IPv4Address(('X', X)) (IPv4Address(('X', X)))" actually caused a transaction to fail. Typically when I get paged I will filter for logs at ERROR level and if there are multiple entries of "Failed to write data to connection IPv4Address(('X', X)) (IPv4Address(('X', X)))" that don't actually cause transaction failures then it's confusing and adds debugging time.
Oh for additional information, I am currently using Python driver 4.4.5.
The text was updated successfully, but these errors were encountered: