-
-
Notifications
You must be signed in to change notification settings - Fork 6
Support [rolling] upgrade of HDFS #362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So, looking at https://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html and testing locally, it looks like the sequence is roughly:
We need to detect whether to enter "upgrade mode", we could do that by storing a Steps 3-5 could be done by adding a check to the end of the STS apply: Step 7 would be simple enough, happens by leaving "upgrade mode". Steps 1/2/6 are the big question marks. We could run them from the operator container, exec into an existing namenode Pod, or spawn a dedicated Job. Generally running from the operator seems like a poor idea, both because of needing to bundle a HDFS client+JVM and because the operators don't have Kerberos identities (still need to look into how JMX is affected by this too?). Running as a Job means that we don't rely on picking a single "admin namenode", but creates another asynchronous lifecycle for us to manage. |
Another MVPier option would be to only add an override to do steps 3-5, leaving the dfsadmin steps (1/2/6) to be taken manually. |
Was hoping it was as simple as an init container, but it looks like there is some choreography involved (with the "wait for" steps. I think there should be a "do it for me automatically option", but if there is some clear risk to that, then it should be opt in (eg: demos can opt-in, customers might be more cautious). Can something in stackablectl here help with said choreography to make the manual steps less of a burden? |
Ultimately, all database upgrades (which is what this is) are risky. I agree that it might make sense to have a safeguard, but we should probably think about that as a platform-wide decision then.
I don't think it'd make much sense. 3-5 comes down to updating the StatefulSets in order, which is managed entirely by the operator. 1/2/6 wouldn't be easier for stackablectl to do than for the operator. Stackablectl also generally isn't really responsible for modifying stacklets at the moment, and I'd be sad to see that change. |
Ah yeah, that makes it more clear.
Sure, but operational tasks can be codified (assuming there are checks at each step to prove it is safe to proceed with the next) and IMO this is what Operators are for. The problem could probably be modeled sufficiently with a Finite State Machine. Maybe it is a tall order to codify operations like this, but this should be the ultimate platform-wide goal. |
I mean, yeah. I agree that I'd like to have as much as possible managed by the operator. I'm just not sure HDFS is special enough to warrant its own rules for when upgrades should be allowed. |
Do we have documentation for this? If so please link it here, if not, why not? And can you please include a snippet that we can use for the release notes for this? |
The docs are at https://docs.stackable.tech/home/nightly/hdfs/usage-guide/upgrading
I suppose, "- The Stackable Operator for HDFS now supports upgrading existing HDFS installations" or something like that. |
Is the functionality specific to 3.3 -> 3.4? |
No, the mechanism is generic. One caveat is that it currently takes the pessimistic approach of applying it to any upgrade, so 3.3.4 -> 3.3.6 would also trigger it.
Yeah that's a good point. Hm. |
|
As of 23.4, when you upgrade your hdfs, e.g. 3.2.2 -> 3.3.4 you run into the error
Ideally we should start a rolling upgrade of all components. Currently you simply cannot upgrade your HDFS without hacking stuff (e.g. no cliOverrides to add
-upgrade
or similar)Edit from the past: At least the upgrade 3.3.4 -> 3.3.6 worked
The text was updated successfully, but these errors were encountered: