Skip to content

Renaming the node role search to warm #17573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 18, 2025

Conversation

vinaykpud
Copy link
Contributor

@vinaykpud vinaykpud commented Mar 11, 2025

In this PR we are renaming the existing Node "Search Role" to "Warm Role"

Description

This is done based on the decision taken as part of the discussion in this thread: #17422 (comment)

Related Issues

Related to #15306
Related to #17422

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
@github-actions github-actions bot added bug Something isn't working Search:Performance labels Mar 11, 2025
@vinaykpud vinaykpud changed the title Renaming search node role to warm Renaming the node role search to warm Mar 11, 2025
@mch2
Copy link
Member

mch2 commented Mar 11, 2025

This is a straight rename correct? It does not resolve #17422 ?

Copy link
Contributor

❌ Gradle check result for 835fd60: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
@vinaykpud
Copy link
Contributor Author

This is a straight rename correct? It does not resolve #17422 ?

Yes, Do you want me to add a new issue for this?

Copy link
Contributor

❌ Gradle check result for b0d026a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vinaykpud vinaykpud closed this Mar 11, 2025
@vinaykpud vinaykpud reopened this Mar 11, 2025
Copy link
Contributor

✅ Gradle check result for b0d026a: SUCCESS

Copy link

codecov bot commented Mar 11, 2025

Codecov Report

Attention: Patch coverage is 92.85714% with 2 lines in your changes missing coverage. Please review.

Project coverage is 72.49%. Comparing base (1166998) to head (d05b81a).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
.../java/org/opensearch/env/NodeRepurposeCommand.java 81.81% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17573      +/-   ##
============================================
+ Coverage     72.48%   72.49%   +0.01%     
+ Complexity    65771    65752      -19     
============================================
  Files          5311     5311              
  Lines        304973   304973              
  Branches      44229    44229              
============================================
+ Hits         221045   221085      +40     
+ Misses        65830    65760      -70     
- Partials      18098    18128      +30     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrross
Copy link
Member

Let's make sure this aligns with what @gbbafna mentioned here before merging.

We'd generally want to do a deprecation cycle before just replacing a term like this. However, it does appear that search was a bit of a misnomer and it is not in wide use beyond searchable snapshots, so I'm not totally opposed with doing this to unblock the usage of the search role where it might actually make more sense.

@gbbafna
Copy link
Contributor

gbbafna commented Mar 12, 2025

@vinaykpud : Can we verify if we upgrade the cluster from 2.x to 3.0 with this change, searchable snapshots continue to work ? There might be few nuances here like changing the cluster-manager first . Want to make sure that rolling restarts works with this change

@vinaykpud
Copy link
Contributor Author

vinaykpud commented Mar 14, 2025

@vinaykpud : Can we verify if we upgrade the cluster from 2.x to 3.0 with this change, searchable snapshots continue to work ? There might be few nuances here like changing the cluster-manager first . Want to make sure that rolling restarts works with this change

Sure @gbbafna

Performed a test to check the cluster upgrade scenario. Using opensearch-cluster-cdk I have setup cluster with 3 cluster manager and 2 data nodes.
Here is the summary of the steps performed.

  1. Created index with 1P and 1R and indexed few documents
  2. Registered a S3 repo for snapshots
  3. took a snapshot
  4. Deleted the index
  5. Updated the node role from data to search by changing the yaml file, restarted the process in both data nodes to make them snapshot search nodes
  6. restored searchable snapshot and made sure shards are assigned
  7. created tarball of the OpenSearch with changes in this PR and copied to all the nodes
  8. Restarted the OS process with latest binaries first in each cluster managers one by one and cluster was healthy.
  9. Then restarted the OS process with latest binaries in each search node one after another.
  10. During this process I had _cat/nodes, _cat/shards, health APIs running in loop for monitoring the cluster.
  11. Cluster din't go Red and upgraded to OS 3.0.0
  12. Also verified by querying _search on the searchable snapshot index and its working.

Let me know if we missed anything here.

cc @mch2 @andrross

@mch2
Copy link
Member

mch2 commented Mar 17, 2025

@vinaykpud : Can we verify if we upgrade the cluster from 2.x to 3.0 with this change, searchable snapshots continue to work ? There might be few nuances here like changing the cluster-manager first . Want to make sure that rolling restarts works with this change

Sure @gbbafna

Performed a test to check the cluster upgrade scenario. Using opensearch-cluster-cdk I have setup cluster with 3 cluster manager and 2 data nodes. Here is the summary of the steps performed.

  1. Created index with 1P and 1R and indexed few documents
  2. Registered a S3 repo for snapshots
  3. took a snapshot
  4. Deleted the index
  5. Updated the node role from data to search by changing the yaml file, restarted the process in both data nodes to make them snapshot search nodes
  6. restored searchable snapshot and made sure shards are assigned
  7. created tarball of the OpenSearch with changes in this PR and copied to all the nodes
  8. Restarted the OS process with latest binaries first in each cluster managers one by one and cluster was healthy.
  9. Then restarted the OS process with latest binaries in each search node one after another.
  10. During this process I had _cat/nodes, _cat/shards, health APIs running in loop for monitoring the cluster.
  11. Cluster din't go Red and upgraded to OS 3.0.0
  12. Also verified by querying _search on the searchable snapshot index and its working.

Let me know if we missed anything here.

cc @mch2 @andrross

So to summarize, you did a successful rolling upgrade from 2.19 to 3.0 but it required cm's to migrate first. Did you not have to update the routing pool logic for bwc? I don't see this in the pr.

    public static RoutingPool getNodePool(DiscoveryNode node) {
        if (node.isWarmNode() || (node.isSearchNode() && (node.getVersion().before(Version.V_3_0_0)))) {
            return REMOTE_CAPABLE;
        }
        return LOCAL_ONLY;
    }

@vinaykpud
Copy link
Contributor Author

vinaykpud commented Mar 17, 2025

@vinaykpud : Can we verify if we upgrade the cluster from 2.x to 3.0 with this change, searchable snapshots continue to work ? There might be few nuances here like changing the cluster-manager first . Want to make sure that rolling restarts works with this change

Sure @gbbafna
Performed a test to check the cluster upgrade scenario. Using opensearch-cluster-cdk I have setup cluster with 3 cluster manager and 2 data nodes. Here is the summary of the steps performed.

  1. Created index with 1P and 1R and indexed few documents
  2. Registered a S3 repo for snapshots
  3. took a snapshot
  4. Deleted the index
  5. Updated the node role from data to search by changing the yaml file, restarted the process in both data nodes to make them snapshot search nodes
  6. restored searchable snapshot and made sure shards are assigned
  7. created tarball of the OpenSearch with changes in this PR and copied to all the nodes
  8. Restarted the OS process with latest binaries first in each cluster managers one by one and cluster was healthy.
  9. Then restarted the OS process with latest binaries in each search node one after another.
  10. During this process I had _cat/nodes, _cat/shards, health APIs running in loop for monitoring the cluster.
  11. Cluster din't go Red and upgraded to OS 3.0.0
  12. Also verified by querying _search on the searchable snapshot index and its working.

Let me know if we missed anything here.
cc @mch2 @andrross

So to summarize, you did a successful rolling upgrade from 2.19 to 3.0 but it required cm's to migrate first. Did you not have to update the routing pool logic for bwc? I don't see this in the pr.

    public static RoutingPool getNodePool(DiscoveryNode node) {
        if (node.isWarmNode() || (node.isSearchNode() && (node.getVersion().before(Version.V_3_0_0)))) {
            return REMOTE_CAPABLE;
        }
        return LOCAL_ONLY;
    }

@mch2 , No, I haven't updated the routing pool logic, it worked without adding any new logic.
But I haven't tried other way ie: First Upgrading Search/Warm Nodes and Upgrading CM's later. We might need to add this logic for that.

@mch2
Copy link
Member

mch2 commented Mar 17, 2025

First Upgrading Search/Warm Nodes and Upgrading CM's later

@vinaykpud No this logic would be on 3.0 cms not 2.x. I was thinking this would provide added safety for mixed cluster cases where we need to make alloc decisions and assign ss shards in that mixed state. Though in the normal upgrade case this isn't required. I think we can go ahead without this.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

❕ Gradle check result for c6d46d2: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT.testSnapshotWithStuckNode
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

✅ Gradle check result for d05b81a: SUCCESS

@mch2 mch2 merged commit 1c86dd1 into opensearch-project:main Mar 18, 2025
31 checks passed
vinaykpud added a commit to vinaykpud/OpenSearch that referenced this pull request Mar 18, 2025
* Renaming search node role to warm

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added Changelog

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* fixed failing tests

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* fixed PR comments

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

---------

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Jun 26, 2025
* Renaming search node role to warm

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added Changelog

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* fixed failing tests

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* fixed PR comments

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

---------

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants