Replies: 1 comment 1 reply
-
|
Hey Simon, I've been working on a related PR for GlueJobOperator deferrable mode missing verbose logs - #63086. I think I know what's causing the issue that you have mentioned — the trigger raises AirflowException directly when the job fails instead of yielding a TriggerEvent, so execute_complete never runs and you lose the detailed error. I'll put together a fix PR in the next couple of days! Can you create an issue for this - that glue job does not return the error message if it fails under deferred state. I will submit a fix for this soon. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Starting this discussion with the intention to share our use case, current issues and to get any ideas or inspiration on how to best proceed.
Background
We are currently using MWAA 2.10.3 to orchestrate among others, Glue jobs. For this we are using the GlueJobOperator to trigger runs of already defined jobs with minimal arguments provided.
Key detail to note is that we are using
deferrable=True, main reason for this is that we have longer running jobs and sensors and we do not want to reserve workers for them over longer periods.Issue
We are using
on_failure_callbackwith a custom implemented function that extracts the error message from the context of a failed task and posts it as a card to our Teams channel.exception = context.get('exception')When a glue job fails while the task is in deferred status it will only pick up that the state has failed and our callback simply extracts "Trigger failure".
This is an issue because in our Teams error notifications we want to immediately be able to see the high level cause of failure. Currently we would need to either go to Glue logs directly or via the Airflow logs.
Possibly solutions
We have considered the following solutions or workarounds
verbose=True
While this should include all detailed logs in our Airflow tasks we are not confident that this will actually solve our issue as status check on the final attempt will still fail. We are also hesitant to enable this as it would further duplicate our existing logs 1:1.
Wrap GlueJobOperator and execute_complete function
This could possible be a good solution to modify the behaviour of that final status check. But we are hesitant to wrap the original operator as that would complicate further MWAA version upgrades for us.
Enhance custom callback to include additional get_job_run call based on job_run_id from context
This is currently our preferred approach with the caveat that the final error message of the Glue job will not be included in the task logs. But it will be included in our error notification in Teams.
Summary
Happy to receive any thoughts or inputs on the described issue. Let me know if I have missed to describe any essential part.
Also interested to know if this type of behaviour would be encouraged to be added to the functionality of GlueJobOperator or if this has been a concious decision to not include.
Beta Was this translation helpful? Give feedback.
All reactions