Chef: Gracefully Handle RFC062 Exit Codes#19155
Conversation
f4b0b8e to
b732ee3
Compare
svanharmelen
left a comment
There was a problem hiding this comment.
Looks good to me, but I do have one suggestion you might have a look at...
|
I noticed this is an old PR, are you still willing to work with me to get the PR ready for merging? In that case please have a look at my suggestion and also please rebase the PR so that the test can (hopefully) run successfully. Lastly it would be great if you can sign the contributor license agreement. Thanks! |
|
@svanharmelen sure, I've also gotten quite a bit better with Go since. I'll give this another whirl. |
b732ee3 to
32495c3
Compare
|
@stevenoneill-jmfamily and others, would we want to attempt a retry here or is the pending reboot notification good enough? Putting this in a retry loop if we get one of those exit codes should be doable, maybe with a configurable sleep. Thoughts for configurables:
@svanharmelen If this is good to go, I'd say merge as MVC support for this. Don't have a timeframe for being able to support those configurables, write the tests, etc. |
|
@bdwyertech thanks for the heads up on this, appreciate it! That sounds reasonable to me, I like that the defaults result in nothing (keep existing behavior). My two bits....from a Chef user pov, I feel like I'd be expecting behavior and names like the current test-kitchen experience. eg, utilizing these variable names:
It'd be cool to have the consistency across products, if possible. If not, i'm still good with it because this feature would be really frickin useful :-) |
svanharmelen
left a comment
There was a problem hiding this comment.
That is indeed a much better solution, but it will needs some tweaks to make it work as you intended 😉
shoekstra
left a comment
There was a problem hiding this comment.
Hey @bdwyertech,
Thanks for this, I just ran into the same issue.. Needed a small fix but works great:
...
null_resource.default (chef): Recipe: test::windows
null_resource.default (chef): * reboot[now] action reboot_now[2020-04-29T23:30:36+02:00] WARN: Rebooting system immediately, requested by 'now'
...
null_resource.default (chef): Chef Infra Client finished, 1/6 resources updated in 19 seconds
null_resource.default (chef): [2020-04-29T23:30:39+02:00] WARN: Rebooting server at a recipe's request. Details: {:delay_mins=>0, :reason=>"Reboot by Chef Infra Client", :timestamp=>2020-04-29 23:30:36 +0200, :requested_by=>"now"}
...
null_resource.default (chef): Chef Infra Client failed. 1 resources updated in 20 seconds
null_resource.default (chef): [2020-04-29T23:30:39+02:00] FATAL: Stacktrace dumped to C:/chef/cache/chef-stacktrace.out
null_resource.default (chef): [2020-04-29T23:30:39+02:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
null_resource.default (chef): [2020-04-29T23:30:39+02:00] FATAL: Chef::Exceptions::Reboot: Rebooting server at a recipe's request. Details: {:delay_mins=>0, :reason=>"Reboot by Chef Infra Client", :timestamp=>2020-04-29 23:30:36 +0200, :requested_by=>"now"}
null_resource.default (chef): Reboot has been scheduled in the run state
null_resource.default: Creation complete after 54s [id=6131578846470282695]
Apply complete! Resources: 1 added, 0 changed, 1 destroyed.
32495c3 to
5a9ebb8
Compare
Codecov Report
|
5a9ebb8 to
6dc94e5
Compare
|
Alright, so I have nothing better to do then clean up the laundry list... @shoekstra @svanharmelen @stevenoneill-jmfamily what do you think of this? I haven't the slightest idea how you would stub something like this in a test... Would love to see how you'd do it. If you want to give this a spin, use a cookbook that triggers a reboot, and set the codes you want in your provisioner block. retry_on_exit_code = [35, 213]
wait_for_retry = 30 // Default (seconds) : time to wait after receiving exit code before retrying connection & provisioning process
max_retries = 1 // Default : number of times to retry |
Signed-off-by: Brian Dwyer <Brian.Dwyer@broadridge.com>
6dc94e5 to
4701ed2
Compare
|
This looks great, but it is also a considerably bigger change than your initial PR and my concern is it'll be harder to get merged in without adding tests. It would be a shame for this PR to stagnate again as a lot of Chef users would benefit from your fix. May I suggest creating a new PR that adds the retry functionality so it doesn't hold up merging your initial change? I think would have a better chance of being merged as is because the proposed change is much smaller. |
Signed-off-by: Brian Dwyer <Brian.Dwyer@broadridge.com>
c812261 to
2f4067c
Compare
Even if MaxRetries is 0, we should still execute the loop one time in order to run the Chef-Client at least once. Also waiting only makes sense when we have `attempts` left. And last but not least we want to exit immediately when the exit code is not in the retry list. So this PR fixes three small issues to make everything work as expected.
This example doesn't really show how these values should be used. The default of retry_on_exit_code is now already when most people want, so this line is not needed in most cases. I think the docs describe the new options just fine, so lets leave this out...
|
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Allows exit code 35 and 37 which do not imply Chef failure.
https://github.com/chef/chef-rfc/blob/master/rfc062-exit-status.md
Should help with #15134 and #16551