-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Additional information about scaling a service #178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional information about scaling a service #178
Conversation
[scale](../reference/commandline/service_scale/) the service. | ||
### Re-balancing a service after joining a new or previously failed node | ||
|
||
When you add a new node to a swarm, or a node re-joins after it has been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would "reconnect" instead of "re-join" be an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in love with that, because the verb in the CLI is 'join' rather than 'connect'. In fact, I should probably change the 'add' to 'join' for consistency. I could probably just change 're-join' to 'join'. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, actually, that is the issue, because the node never "left" the swarm, it was only unreachable. When it becomes reachable again, it doesn't "join", it just erm, "dunnowhattocallit".
joining a swarm generates a new cryptographic identity, which isn't the case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronlehmann @stevvooe thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thaJeztah is correct.reconnect
sounds right to me. Otherwise, "...or a node returns..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reconnect
or re-register
, but I am not sure if we use that terminology consistently.
unavailable, the new node does not automatically get a workload if the service | ||
is already running at the desired scale. Notably, when a failed node recovers | ||
and re-joins a swarm, the workloads it was previously running have been | ||
reassigned to other nodes, and it does not automatically take them back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to include reasoning behind why this is the case, as I've found this behavior confusterates people.
@aluzzardi can give you more, but we don't place workload on the new server to prevent healthy services from being interrupted and to avoid dog piling newly joined nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SwarmKit follows a pretty simple rule: No healthy container is ever disrupted unless it absolutely must.
A machine goes down? We move the containers to other machines, they were down anyway
A container crashes? We move it to another machine
However - a new machine comes up? There's no reason for SwarmKit to kill a perfectly fine production mysql container and move it to this new machine just for the sake of rebalancing.
If that container crashes on its own, then SwarmKit will consider redeploying it to the brand new machine in order to rebalance the cluster.
In all of the examples above, SwarmKit has never caused disruption to healthy containers.
In the future, we are planning to provide a flag to users so they can signal that a service can be Preempted
- that is, killed even if perfectly healthy in order to make room for another service or to rebalance the cluster. We'll never do this without user permission though.
/cc @aaronlehmann
Thanks all, this is fantastic info and I will add it to the doc here. On Fri, Oct 14, 2016 at 2:06 PM, Andrea Luzzardi [email protected]
|
OK, I tried to capture the feedback given by @aaronlehmann @stevvooe @aluzzardi . PTAL, thanks! |
LGTM |
|
||
If you are concerned about an even balance of load and don't mind disrupting | ||
running tasks, you can force your swarm to re-balance by temporarily scaling | ||
the service upward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this PR, if accepted, would probably allow rebalancing without having to change the scale; moby/swarmkit#1664
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I can't talk about it until / unless it is. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know; it was just a heads-up 😄
|
||
See also | ||
[`docker service scale`](../reference/commandline/service_scale/) and | ||
[`docker service ps`](../reference/commandline/service_ps/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh! not introduced in this change, but should these links point to the .md
file, so that they will work both on GitHub and on docs.docker.com?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, and also took the file portion out of the in-file links earlier up in the file.
Signed-off-by: Misty Stanley-Jones <[email protected]>
With the +1 from @stevvooe I'm going to merge. |
* Update workflow and add screenshots * Add screenshots
Adding additional information as a follow-on to #148. Related to #105.