Skip to content

[Flaky Tests] Fixing flaky tests related to derived source tests #18493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

tanik98
Copy link
Contributor

@tanik98 tanik98 commented Jun 11, 2025

Description

Related Issues

Resolves #18485 #18490

  1. org.opensearch.common.lucene.index.DerivedSourceLeafReaderTests:

Issue:

java.lang.IndexOutOfBoundsException: Index 8 out of bounds for length 8
        at __randomizedtesting.SeedInfo.seed([7EFEA17C239307B5:480F885B0D6542F4]:0)
        at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
        at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
        at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
        at java.base/java.util.Objects.checkIndex(Objects.java:385)
        at org.apache.lucene.index.CodecReader$1.document(CodecReader.java:100)
        at org.opensearch.common.lucene.index.DerivedSourceStoredFieldsReader$DerivedSourceStoredFields.document(DerivedSourceStoredFieldsReader.java:108)
        at org.opensearch.common.lucene.index.DerivedSourceLeafReaderTests.testWithRandomDocuments(DerivedSourceLeafReaderTests.java:184)

Fix:

  • Doing force merge with a single segement
  1. org.opensearch.index.engine.InternalEngineTests.testNewChangesSnapshotWithDerivedSource
    org.opensearch.index.engine.InternalEngineTests.testNewChangesSnapshotWithDeleteAndUpdate
java.io.IOException: could not remove the following files (in the order of attempts):
   /var/jenkins/workspace/gradle-check/search/server/build/testrun/test/temp/org.opensearch.index.engine.InternalEngineTests_E3B2B6D726C65840-001/translog-primary-019/translog-2.tlog: java.io.IOException: access denied: /var/jenkins/workspace/gradle-check/search/server/build/testrun/test/temp/org.opensearch.index.engine.InternalEngineTests_E3B2B6D726C65840-001/translog-primary-019/translog-2.tlog
   /var/jenkins/workspace/gradle-check/search/server/build/testrun/test/temp/org.opensearch.index.engine.InternalEngineTests_E3B2B6D726C65840-001/translog-primary-019/translog.ckp: java.io.IOException: access denied: /var/jenkins/workspace/gradle-check/search/server/build/testrun/test/temp/org.opensearch.index.engine.InternalEngineTests_E3B2B6D726C65840-001/translog-primary-019/translog.ckp
   /var/jenkins/workspace/gradle-check/search/server/build/testrun/test/temp/org.opensearch.index.engine.InternalEngineTests_E3B2B6D726C65840-001/translog-primary-019: java.nio.file.DirectoryNotEmptyException: /var/jenkins/workspace/gradle-check/search/server/build/testrun/test/temp/org.opensearch.index.engine.InternalEngineTests_E3B2B6D726C65840-001/translog-primary-019

	at __randomizedtesting.SeedInfo.seed([E3B2B6D726C65840:F3AC8BE61A359754]:0)
	at org.opensearch.common.util.io.IOUtils.rm(IOUtils.java:222)
	at org.opensearch.index.translog.Translog.createEmptyTranslog(Translog.java:2070)
	at org.opensearch.index.translog.Translog.createEmptyTranslog(Translog.java:2038)
	at org.opensearch.index.translog.Translog.createEmptyTranslog(Translog.java:2010)
	at org.opensearch.index.translog.Translog.createEmptyTranslog(Translog.java:2000)
	at org.opensearch.index.engine.EngineTestCase.createEngine(EngineTestCase.java:686)
	at org.opensearch.index.engine.EngineTestCase.createEngine(EngineTestCase.java:673)
	at org.opensearch.index.engine.InternalEngineTests.testNewChangesSnapshotWithDerivedSource(InternalEngineTests.java:8243)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)

Fix:

  • Closing the engine
  1. org.opensearch.recovery.FullRollingRestartIT.testDerivedSourceWithConcurrentUpdatesRollingRestart
Бэс 10, 2025 9:11:07 ЭК com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNING: Uncaught exception in thread: Thread[#4762,opensearch[node_t0][generic][T#1],5,TGRP-FullRollingRestartIT]
java.lang.AssertionError: seqNo [100] was processed twice in generation [5], with different data. prvOp [Index{id='76', seqNo=100, primaryTerm=2, version=3, autoGeneratedIdTimestamp=-1}], newOp [Index{id='76', seqNo=100, primaryTerm=2, version=3, autoGeneratedIdTimestamp=-1}]
	at __randomizedtesting.SeedInfo.seed([33B7FED8D60FC88D]:0)
	at org.opensearch.index.translog.TranslogWriter.assertNoSeqNumberConflict(TranslogWriter.java:341)
	at org.opensearch.index.translog.TranslogWriter.add(TranslogWriter.java:289)
	at org.opensearch.index.translog.Translog.add(Translog.java:579)

Fix:

  • Same doc was getting updated concurrently which was leading into sequence no conflict and eventually timing out.
  • Updating doc sequencially instead of random docs
  1. RecoveryWhileUnderLoadIT
  • These are also failing due to concurrent updates
    Fix
  • Using auto generated doc ids, instead of fix id

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❕ Gradle check result for 7cffee1: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Jun 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.71%. Comparing base (7ab5c59) to head (a46c4f1).
Report is 8 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18493      +/-   ##
============================================
+ Coverage     72.70%   72.71%   +0.01%     
- Complexity    68069    68092      +23     
============================================
  Files          5531     5536       +5     
  Lines        313167   313302     +135     
  Branches      45451    45460       +9     
============================================
+ Hits         227682   227832     +150     
+ Misses        66994    66915      -79     
- Partials      18491    18555      +64     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions bot added bug Something isn't working Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. labels Jun 11, 2025
@tanik98 tanik98 marked this pull request as ready for review June 11, 2025 14:56
@tanik98 tanik98 changed the title [Flaky Tests] Fixing few of the Flaky tests related to derived source tests [Flaky Tests] Fixing flaky tests related to derived source tests Jun 11, 2025
@tanik98 tanik98 force-pushed the tanik-derived-source-tests-fix branch from 7cffee1 to a46c4f1 Compare June 11, 2025 17:24
Copy link
Contributor

✅ Gradle check result for a46c4f1: SUCCESS

@msfroh
Copy link
Contributor

msfroh commented Jun 11, 2025

The fact that this PR has run a couple of Gradle checks that actually passed, it definitely feels like an improvement.

Copy link
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only have a mild concern over the duration passed to Thread#join.

As I mentioned in my other comment, it looks like this PR improves the current flaky test situation. I think we should increase the timeout on Thread#join, but I'd be happy to let that wait for another PR.

@andrross andrross merged commit 674de10 into opensearch-project:main Jun 11, 2025
33 of 34 checks passed
abhita pushed a commit to abhita/OpenSearch that referenced this pull request Jun 17, 2025
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 17, 2025
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 17, 2025
tanik98 added a commit to tanik98/OpenSearch that referenced this pull request Jun 20, 2025
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Build Build Tasks/Gradle Plugin, groovy scripts, build tools, Javadoc enforcement. skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Flaky testNewChangesSnapshotWithDerivedSource
4 participants