-
Notifications
You must be signed in to change notification settings - Fork 34
[release-4-18] OCPBUGS-59883: When Locking PTP Source to One NIC With Dual NIC PTP Synchronization Configured, Incorrect Clock Class Reported via REST API #481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-4.18
Are you sure you want to change the base?
Conversation
116ef40
to
c5380f8
Compare
@aneeshkp: This pull request references Jira Issue OCPBUGS-59883, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Looks good for 4.18. |
pkg/daemon/daemon.go
Outdated
} else if clockState == HOLDOVER || clockState == LOCKED { | ||
// in case of holdover without iface, still need to update clock class for T_G | ||
if p.name != ts2phcProcessName && p.name != syncEProcessName { // TGM announce clock class via events | ||
p.SetPmcCheck(false) // reset pmc check since we are updating clock class here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be best to use consume for all times of setting it to false to make it easier to search of when it happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made changes as suggested
pkg/daemon/daemon.go
Outdated
@@ -961,6 +1011,9 @@ func (p *ptpProcess) cmdStop() { | |||
return | |||
} | |||
p.setStopped(true) | |||
// reset runtime flags | |||
p.SetPmcCheck(false) | |||
atomic.StoreInt32(&p.clockClassRunning, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to use atomic.Bool
to properly signal what this used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok , let me try to replace with bool for better readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
c5380f8
to
610328c
Compare
Signed-off-by: Aneesh Puttur <[email protected]>
610328c
to
efe6e28
Compare
@aneeshkp: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/label backport-risk-assessed |
/ltgm |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aneeshkp, nocturnalastro The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/verified later @Bonnie-Block |
@aneeshkp: This PR has been marked to be verified later by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This commit implements logic to correctly update the clock class
when a FAULTY/RECOVERY condition of the port is detected.
Summary :
Ensure PMC clock-class queries run at most once per configName concurrently.
Trigger PMC on port Faulty/Holdover and recovery transitions to refresh GM class.
Run PMC polling outside the log scanner so we still update when ptp4l is silent.
What’s changed
Per-config single-flight guard:
Added a shared sync.Map keyed by configName to serialize updateClockClass across processes referencing the same config.
updateClockClass now CASes on the shared guard and logs when a run is skipped.
Cleanup: guard entry is removed on cmdStop() of the corresponding process.
Faulty/Holdover and recovery handling:
When the clock enters HOLDOVER or LOCKED without a specific iface, we now trigger updateClockClass (after a short delay) so GM class is refreshed on port fault/recovery events, not just on normal offset updates.
PMC polling outside the scanner:
Introduced/relied on a background loop that consumes a pmcCheck flag and calls updateClockClass independent of log scanner activity.
A periodic ticker sets pmcCheck for ptp4l processes; the background loop handles the PMC call even if ptp4l is not emitting logs.
Release note
Improve clock class update robustness: deduplicate PMC queries per config, update on Faulty/Holdover and recovery, and poll PMC independently of log output.