Skip to content

Differentiate fatal errors from non-fatal errors in logs #892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
achantavy opened this issue Jan 20, 2023 · 16 comments
Closed

Differentiate fatal errors from non-fatal errors in logs #892

achantavy opened this issue Jan 20, 2023 · 16 comments

Comments

@achantavy
Copy link

achantavy commented Jan 20, 2023

Feature Request

As seen in #706, it is possible for error messages like Failed to write data to connection IPv4Address(('X', X)) (IPv4Address(('X', X))) to not actually be associated with a transaction failure. For example, it's possible for the application to have a connection fail, have the driver emit that message, pick a new connection from the pool, and continue happily completing the transaction. For searchability, it'd be very useful to know if a given log message resulted in a transaction failing.

Pitch

As a dev building an app that uses the neo4j python driver, I would like to know if log messages like "Failed to write data to connection IPv4Address(('X', X)) (IPv4Address(('X', X)))" actually caused a transaction to fail. Typically when I get paged I will filter for logs at ERROR level and if there are multiple entries of "Failed to write data to connection IPv4Address(('X', X)) (IPv4Address(('X', X)))" that don't actually cause transaction failures then it's confusing and adds debugging time.

Oh for additional information, I am currently using Python driver 4.4.5.

@robsdedude
Copy link
Member

robsdedude commented Jan 30, 2023

Thanks for reaching out. You've linked issue 706 and I'd like to repeat what I've written there

So what I recommend doing really, is not looking all too much into the driver logs. If there is a condition the driver can't handle, its not the logs that will notify you about it, but an exception that is being raised.

The driver's logs are not where you should look for fatal exceptions. The driver will raise an exception if it can't handle something. So the logging you should probably filter for is log messages from the client application using the driver and not the driver it self. The driver's logs might then be used to drill further into the issue.

@achantavy
Copy link
Author

achantavy commented Jan 30, 2023

Thanks for writing this out, this is very helpful to know.

To offer a bit more context, our setup pages the on-call if the number of error logs exceeds a given (usually low) threshold, because error logs usually indicate something urgent that should be looked into immediately.

Since these neo4j driver error log messages recover and don't require an urgent response (since there really isn't anything that the driver user can do to respond to it), I feel like it would make sense for these non-fatals to be at "warn" level instead.

I totally understand if that's not the opinion of this project though -- if this is the case, would it make sense to document this behavior explicitly somewhere? I recall that earlier versions of the driver (e.g. 1.7.6) emitted warnings and not errors so this appears to be a change in behavior.

@OliverFarren
Copy link
Contributor

I'd like to echo @achantavy issue, for us, non-fatal or no, all log messages of error or above are being sent to our reporting tool and we are getting alot of messages from the neo4j sync driver around failed write connections.

We are using neo4j ~5.14

This has taken time and effort to discover that they're non-fatal and unless we manage these somehow they continue to confuse other developers and contribute towards our usage for our monitoring service.

If these are indeed non-fatal is there a reason to not lower them to warnings? alternatively could there be more granularity over the loggers you use? for example 'neo4j.sync.io', that way I can specifically set the verbosity for the driver to critical without risking missing a significant error from another part of the system.

@robsdedude
Copy link
Member

Hi @OliverFarren

Python allows you to easily configure logging per logger. The driver uses the logger name "neo4j". So you can simply do something along the lines of

import logging

neo4j_logger = logging.getLogger("neo4j")
neo4j_logger.setLevel(logging.CRITICAL)

@OliverFarren
Copy link
Contributor

Hi @robsdedude, my concern with this approach is inadvertantly supressing fatal-errors which would be important to have visibility of. In genreral I would also expect to be of the error verbosity level. Can I confirm the neo4j-driver only uses CRITICAL for such errors?

@OliverFarren
Copy link
Contributor

@robsdedude I've submitted a PR that implements hierarchical logging to tackle our issue: #997, this would ensure we are not supressing fatal-errors whilst giving us the control to reduce the verbosity of the driver logs.

@robsdedude
Copy link
Member

The driver doesn't log fatal errors for you. If the driver encounters some error state which you, as the user, need to be aware of, it'll raise an exception. That's imo the right mechanism for such things, not logging. You may then choose to log those in your application as you see fit.

If you think of it further, requiring libraries to only log errors if they are fatal is futile. There might be a library A wrapping another library B and implementing some retry logic around it. How could B then possibly know if an error is fatal or not. Side note: the driver in itself has multiple layers doing such abstractions. Retry logic sits in a level around raw connections. The connections have absolutely no idea whether their failure will be retried or not.

So in essence the request boils down to: the error log level (and potentially warning as well) is only usable at the very top application level where you can be sure your code isn't consumed as a library.

@rafaljanicki
Copy link

Hey @robsdedude , let me chime in on that one as I had the same issue. I understand your perspective although I'm not sure if I agree with it for a couple of reasons, let me list them here:

  1. Libraries can and should log and these logs shouldn't be hidden. The best example I can think of right now is a deprecation warning - let's say you deprecate a function (which you just did in the driver), I'd like to be notified of that not only via changelogs (which I may not always catch due to e.g. ~=) but also via (usually) warning messages. That's what is an industry standard I've noticed and I believe that's a good approach as well. If I were to disable logging for the library, I wouldn't have any means to find out about any warnings
    1a. BTW, you do have deprecation warnings inside the library although they have a DEBUG level. That's weird as it's for functions internally used

  2. If error logging on the driver is not relevant to the end user, why is it enabled by default? What was your's team reasoning for keeping it if, by your rationale, the only way to communicate with the end user is via exceptions?

That's my two cents, as I've mentioned above, I get your idea here and it's nifty. However, I'm afraid that it might be dangerous if you basically suggest to users to disable the library's logging or keep it at the critical level. That way you also get rid of the way to communicate with a developer using warning/error/critical messages

@robsdedude
Copy link
Member

robsdedude commented Nov 29, 2023

Thanks for your views @rafaljanicki

  1. Deprecations are not going thought the logging system though. Instead they are emitted through the warnings system . No actionable item is only going though the logging system.
  2. The logging system does provide value to trace back unexpected conditions. The fact that all error and warning logs show up by default the moment you attach a handler to the root logger is a design decision made by Python

    When a logger is created, the level is set to NOTSET (which causes all messages to be processed when the logger is the root logger, or delegation to the parent when the logger is a non-root logger). Note that the root logger is created with level WARNING.
    -- https://docs.python.org/3/library/logging.html?highlight=logging#logging.Logger.setLevel

I still think all the logs generated by the driver are meant to be consumed with caution. They'll tell you what's going on inside the driver. And an exception raised from the OS to the driver is at least a warning1, depending on if that portion of the driver knows how to handle it or not. Again, for anything external to the driver, we'll use return values, raise exceptions, emit warnings (not logging), etc. I cannot (as explained above) log from the perspective of the driver user as I don't know their code, but only from the driver's perspective.

I'm considering to change the default level of the neo4j logger to critical so that by default, when enabling logging in your application you won't see any driver logs. It seems a bit strange though to work against Python's logging system's defaults like this.

Footnotes

  1. An indication that something unexpected happened, or that a problem might occur in the near future (e.g. ‘disk space low’). The software is still working as expected.
    -- https://docs.python.org/3/library/logging.html?highlight=logging#logging.WARNING

@OliverFarren
Copy link
Contributor

OliverFarren commented Nov 29, 2023

Hi @robsdedude thankyou for engaging on this issue. would there be any issue in reviewing my PR which implements hierarchical loggers?

In anycase, If your proposed solution is for us to not worry about errors of error level and below I can set the neo4j root logger to CRITICAL.

But i'd like to emphasise the broader implication this has for our organisation. Setting the neo4j logger to critical is not an insignificant change in so far as the default assumption we have is that any error, regardless of library or application, indicates a need for action or change in configuration. Misinterpretations like these lead to internal discussions and confusion and time spent debugging the code and searching for github issues like this one.

For instance, I've spent a number of hours now combing through the python neo4j driver documentation to understand whether I should be doing something differently in order to manage the driver lifecycle because of this error. This would be typical for our developers and we either have to educate internally or add this to our standard template of logging overrides with a note as to why neo4j specifically is being set to critical.

I appreciate that all might be out of the scope of the neo4j library but hope it factors into your considerations. For me personally, similar to botocore, i'd quite like to be able to specifically set the logging of the bolt module specifically through something like neo4j.sync.io to CRITICAL as this would get less pushback and require less discussions with our security team.

Best regards, Oliver

@OliverFarren
Copy link
Contributor

Sorry I see you have replied to my PR, thankyou

@robsdedude
Copy link
Member

I'm considering to change the default level of the neo4j logger to critical so that by default, when enabling logging in your application you won't see any driver logs.

It would at least reflect how the library uses logging: don't look at it unless you want to drill into the driver. Would it satisfy your needs @OliverFarren @rafaljanicki @achantavy ? Do you see a problem with it?

@rafaljanicki
Copy link

That'd definitely satisfy my case, thanks @robsdedude !

@OliverFarren
Copy link
Contributor

OliverFarren commented Nov 29, 2023

@robsdedude i'd prefer the default wasn't set to critical.

We frequently lower the verbosity to DEBUG and find the transaction logs useful. If the verbosity defaults to critical we would have to explicitly set the logger to debug which runs in opposition to how we manage other library loggers (I can't think of another library we use that doesn't by default have the logger level set to {DEBUG, INFO}). This would also have impact for when we upgrade the neo4j package in services since we'd need to make application changes.

Personally i'd prefer the hierarchical loggers that way I could set the bolt logger to CRITICAL explicitly whilst being able to keep view the transaction logs and it doesn't alter the behaviour when upgrading the library

@OliverFarren
Copy link
Contributor

OliverFarren commented Dec 4, 2023

With the merging of #997 i'm happy to set the child logger to critical in my applications. Thanks!

@robsdedude
Copy link
Member

I'll close the issue now as I think we've discussed the topic fairly well, added hierarchical logging to the driver to aid more selective silencing.

If anyone has arguments (pro or con) that have not yet been mentioned, please feel free to keep commenting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants