-
Notifications
You must be signed in to change notification settings - Fork 216
USHIFT-2443: Introduce router IP address listening configuration #3211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@pacevedom: This pull request references USHIFT-2443 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/hold |
"routeAdmissionPolicy" | ||
], | ||
"properties": { | ||
"expose": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the other fields like this are nouns. Maybe we can find a noun similar to advertiseAddress for this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about listenAddress? It also allows NIC names, but always translates to IP addresses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the field called in the kube API where these values end up being copied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be this one: https://github.com/kubernetes/api/blob/master/core/v1/types.go#L5058
Which is under service.status.loadBalancer.ingress.ip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we call it listenAddresses
how confusing will it be that it takes an interface, too? Does listenInterfaces
have the connotation that it could be a NIC name or IP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested listenaddress
because after all everything gets translated into IP addresses, and users can see it in the service itself. It is true that it does not accommodate NIC names very good though. I see listenInterfaces
with a similar issue but on the opposite direction. What do you think of endpoints
? That would serve for anything that accepts network connections.
pkg/config/config.go
Outdated
return addresses, nil | ||
} | ||
|
||
func GetConfiguredAddresses() ([]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a public function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the function. This is used from both validations and the load balancer service controller to obtain all the allowed IP addresses in the host. They may change dynamically without restarting MicroShift.
pkg/config/config.go
Outdated
return addressList, nil | ||
} | ||
|
||
func GetHostNICNames() ([]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a public function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed the function. This is used as the function getting all the allowed ip addresses, interfaces may also change dynamically.
IP: c.NodeIP, | ||
}) | ||
//TODO use annotations instead. | ||
if svc.Name == "router-default" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conditional statement needs a comment explaining why we treat this service in a special way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
@@ -209,3 +238,59 @@ func (c *LoadbalancerServiceController) patchStatus(svc *corev1.Service, newStat | |||
|
|||
return err | |||
} | |||
|
|||
func (c *LoadbalancerServiceController) getRouterIPAddressList() ([]string, error) { | |||
configuredAddresses, err := config.GetConfiguredAddresses() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what sense are these addresses configured? Are they part of the config data the user gave us, or are they present on the host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the already present IPs in the host. Renamed both the function and the variable. I was thinking about a context where configuredXYZ means its in the host and anything coming from the user is in cfg. This can be misleading though.
|
||
for _, ip := range c.IPAddresses { | ||
if !slices.Contains(configuredAddresses, ip) { | ||
klog.Infof("IP address %v not found in the host. Removing it", ip) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The address is just being ignored, right, not removed? Will a reader of the log understand why we're reporting about that address at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on the context. These are the addresses that a user configured, and they were in the service status field for some time.
If somehow the IP disappears from the host then it is effectively removed from the service status. This may qualify better as a warning instead, as it points to a misconfiguration of MicroShift/the host.
It is true though that the removal only happens once. The next execution loop will not remove it because it was not there (provided that the situation persists). Will reword to skip and raise the level to warning.
/test ? |
@pacevedom: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold cancel |
d5a8c59
to
e8e2216
Compare
/test microshift-metal-tests-arm |
2 similar comments
|
/test microshift-metal-tests-arm |
/hold |
the ingress port including listenAddress and port were added to the config.yaml, then restarted the microshift service, but sometimes, couldn't connect to the cluster anymore(could curl the http route with the specified LB ip and http port 10080, failed to curl https route on the specified LB ip and https port 10443) One testing result was below:
|
Tested it with modifying the config.yaml file by adding the ingress part(specifying the listening address 10.44.0.0), after restarted the microshift service: % oc get route % oc rsh centos-pod2 |
/label qe-approved |
@pacevedom: This pull request references USHIFT-2443 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
tests for the ports: % oc get route sh-4.4# curl https://edge1-default.apps.example.com:10443 -k --resolve edge1-default.apps.example.com:10443:10.44.0.0 |
/hold cancel |
klog.Infof("P333. This is something I am waiting for: %v", err) | ||
if err != nil { | ||
klog.Errorf("unable to update default router service status: %v", err) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not retry in some time? I guess this also leaves MicroShift without this "controller"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will include this as part of the initial loop. However, falling into this error would mean no more dynamically changing the load balancer IP addresses. The last good known state would remain until an admin took action by restarting MicroShift. This may be considered as a degraded state, but not failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, but are we communicating this in any other way than logging it? How does the admin should recognize microshift needs restarting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no other way right now but logs and docs. This situation shall be explicitly stated there as a degraded condition alongside a remediation procedure to solve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
} | ||
nicAddresses, err := ipAddressesFromNIC(nicName) | ||
if err != nil { | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above and below we're continuing on "errors" but here we're quiting. What do you think if we'd continue on this error as well (or ignore it in the defaultRouterWatch
) so this "controller" keeps running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say continuing on errors you mean skipping IPs which are not present anymore? This is not an abnormal situation in MicroShift, but in the configuration. If the values you configured are gone then it is the admin who should take action to fix it.
If, however, MicroShift is unable to retrieve host addresses or NICs then this is a different issue. Since its difficult to determine the cause from here, an error log will be written in the top handler and it will not change the last good known state. The next IP change (or an admin restarting something) will trigger this function again.
The key in this controller is avoiding changing the last good known state unless the new one is accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but it seems we're not communicating degraded state in any way (beside logs, but come on)? And the remediation is basically "restart microshift" - either manually or by reconfiguring some IPs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shall be part of the docs. We do not have a way of signaling this kind of condition in MicroShift. Killing the process could have other implications for an app if its unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@pacevedom: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pacevedom, pmtk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Which issue(s) this PR addresses:
Closes #