Skip to content

Conversation

@Avihais12344
Copy link

What changes were proposed in this pull request?

In this PR I want to increase the default report interval of both K8S and Apache Yarn
from 1s to 10s.

Why are the changes needed?

Due to the logs of type:

Application status for (phase: )

Being printed for every 1 seconds, the amount of logs made our Airflow's UI slow,
it also takes too much space as we save the spark logs for future use.
I think the report interval should be increased to make a better balance between notifying the user
that the application is running and not spamming us.
I think it should be done globally to prevent other users to go what we went through:

  1. Getting this bug.
  2. Finding if there's a way to reduce the logs.
  3. Finding this config (I went streight to the source code, only to find it's someplace at the docs).
  4. Updating it in our applications.

If we would update the default, we would ease the use for many users in my opinion.

Does this PR introduce any user-facing change?

Yes, in this PR we increase the report interval from 1 second to 10 seconds.
The log of type:

Application status for (phase: )

Would be printed every 10 seconds and not every 1 second by default.
But the users can change it if they want to, and they shouldn't be affected by it as it's just logging.

How was this patch tested?

I have tesed the patch manually,
I have an Airflow Cluster with docker, a k8s cluster. With that, I have created a spark submit connection,
and created a DAG that uses the spark submit operator
to run spark on my k8s.
The important thing that I have done is that I have added the config:

{
"spark.kubernetes.report.interval": "10s"
}

To my spark application at the spark submit operator conf.
Which made the report interval to increase to 10 seconds from 1 second.

Was this patch authored or co-authored using generative AI tooling?

No.

This is my first PR, if there's a problem, please notify me.

@Avihais12344
Copy link
Author

Avihais12344 commented May 25, 2025

I have enabled the actions in my fork, but I can't rerun the failed job, please help.

@Avihais12344
Copy link
Author

Someone please respond on this?

@Avihais12344
Copy link
Author

Someone please maybe help me?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that there is no silver bullet for all heterogeneous environments, I don't think the change of default values can give us a solution for all.

I'd like to recommend you to apply the proper configuration in your environments (via Airflow) instead of affecting all world-wide community, @Avihais12344 .

@Avihais12344
Copy link
Author

Given that there is no silver bullet for all heterogeneous environments, I don't think the change of default values can give us a solution for all.

I'd like to recommend you to apply the proper configuration in your environments (via Airflow) instead of affecting all world-wide community, @Avihais12344 .

Yes, we have done it. But I still think that printing that log message every 1s is too much. There may not be a silver bullet, but maybe we can get a better default number by calculating the responsiveness we wish for and the average time of a Spark application? @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

It's your opinion which (at least) I disagree with you. In general, there is no way to build a consensus on this for this kind of issue.

I still think that printing that log message every 1s is too much.

@Avihais12344
Copy link
Author

It's your opinion which (at least) I disagree with you. In general, there is no way to build a consensus on this for this kind of issue.

I still think that printing that log message every 1s is too much.

If the talk is about opinions, there is no much I can do. As I still disagree with you.
What do we do now?
@dongjoon-hyun

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the above discussion result, I cast a veto to this PR in order to prevent accidental merging.

@Avihais12344
Copy link
Author

Avihais12344 commented Aug 30, 2025

According to the above discussion result, I cast a veto to this PR in order to prevent accidental merging.

What does it means exactly (I am pretty new to open source)?

@mridulm
Copy link
Contributor

mridulm commented Sep 1, 2025

You can read more here @Avihais12344
As @dongjoon-hyun is a committer and PMC member, he explicitly cast a veto to prevent some other committer from accidentally merging the change - as he disagrees that this change would be useful for the community.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Dec 11, 2025
@github-actions github-actions bot closed this Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants