-
Notifications
You must be signed in to change notification settings - Fork 1.2k
add default conditions to PA to avoid potential race conditions (2nd attempt) #16078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16078 +/- ##
==========================================
- Coverage 80.13% 80.06% -0.07%
==========================================
Files 214 214
Lines 16887 16907 +20
==========================================
+ Hits 13532 13537 +5
- Misses 2996 3011 +15
Partials 359 359 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/retest |
|
/test all |
1 similar comment
|
/test all |
|
/retest |
f9d8138 to
cc8c4c2
Compare
|
/retest |
|
/test all |
cc8c4c2 to
9351ff9
Compare
|
@dprotaso can you take a look, I think I fixed the issue. ran the tests multiple times to check if its flaky |
|
@nader-ziada what was the issue? |
|
the code was not expecting a condition, so assumed it failed even though it was the pending one, check the 2nd commit |
| scaleTargetInitializedCondReason := ps.GetCondition(autoscalingv1alpha1.PodAutoscalerConditionScaleTargetInitialized).GetReason() | ||
| if !ps.IsScaleTargetInitialized() && !resUnavailable && ps.ServiceName != "" && scaleTargetInitializedCondReason != autoscalingv1alpha1.PendingReason { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what is happening here? Generally we're not looking at the Reason string but Status.
To get here PA::Ready=False are we trying to set the right error message?
Another way to think about this can we just look at ScaleTargetInitialized=False to simplify this? Technically not IsScaleTargetInitialized is true when it is set to Unknown and False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new situation here is that the condition is now there on the creation of the resource with the generated defaulting, so I wasn't sure how else to differentiate between the default Unknown (with reason Pending) vs Unknown because something happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default Unknown (with reason Pending) vs Unknown because something happened.
I would view these two scenarios as being the same
| cond := pa.Status.GetCondition(autoscalingv1alpha1.PodAutoscalerConditionReady) | ||
| if cond.IsUnknown() && cond.GetReason() == autoscalingv1alpha1.PendingReason { | ||
| // still at default PA condition, no need to mark anything | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment why we added this? It's not clear why we want to gate setting the Active condition on Ready all of a sudden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm basically trying to skip the check if its the default condition with Reason = Pending that is now set by default when the resource is created, which is different that just Unknown that would cause the PA to be set to Inactive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is different that just Unknown that would cause the PA to be set to Inactive
Unknown means it's deploying or doing whatever. The reason doesn't really matter. If we were setting the status to Inactive while it was unknown then I'd probably continue doing that.
Was this triggering some failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it was setting the PA ready to false which made the revision get into the "Initial scale was never achieved" error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c09b0d0 to
d390044
Compare
|
/retest |
d390044 to
88de82b
Compare
|
/test all |
|
/retest |
|
/test all |
6b815f7 to
2474400
Compare
|
/retest |
|
/test all |
2474400 to
674c400
Compare
|
/retest |
|
/test istio-latest-no-mesh |
674c400 to
f1341d2
Compare
| func withRevisionConditionsGivenPADefault(r *v1.Revision) { | ||
| WithInitRevConditions(r) | ||
| r.Status.MarkActiveUnknown("Deploying", "") | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks identical to allUnknownConditions can we clean up the extra function and simplify the diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, will remove that
|
|
||
| if cond == nil { | ||
| sksCondition := ps.GetCondition(autoscalingv1alpha1.PodAutoscalerConditionSKSReady) | ||
| if cond == nil || (cond.IsUnknown() && sksCondition != nil && sksCondition.IsUnknown()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why we need the extra clauses?
I would think we could simply have cond == nil || cond.IsUnknown()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that there were cases now that the pa has conditions at the start, it was not waiting for the service to get ready, so I was trying to make sure to simulate the previous cases before defaulting pa conditions
| // unavailable here, and have no way of recovering later. | ||
| // If the ResourcesAvailable is already false, don't override the message. | ||
| if !ps.IsScaleTargetInitialized() && !resUnavailable && ps.ServiceName != "" { | ||
| if !ps.IsScaleTargetInitialized() && !resUnavailable && ps.ServiceName != "" && !sksCondition.IsUnknown() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on why we want to add an additional check here?
Probably worth updating the comment as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as other comment, I believe that there were cases now that the pa has conditions at the start, it was not waiting for the service to get ready, so I was trying to make sure to simulate the previous cases before defaulting pa conditions
between revision and autoscaler fix revision lifecyle check for conditions
f1341d2 to
1bf2b31
Compare
|
/retest |
|
@dprotaso added comments |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dprotaso, nader-ziada The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
knative#16078) between revision and autoscaler fix revision lifecyle check for conditions
#1598) * Add default conditions to create PA to avoid potential race conditions (knative#16078) between revision and autoscaler fix revision lifecyle check for conditions * Regen release files --------- Co-authored-by: Nader Ziada <[email protected]>
…onditions (knative#16078)" This reverts commit 561f348.
add default conditions to PA to avoid potential race conditions
between revision and autoscaler
Fixes #16036
Proposed Changes