-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Use ReadAdvice.RANDOM by default. #13244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This effectively forces index inputs to be either open with a SEQUENTIAL or RANDOM advice, with nothing in between.
I think this idea is not too bad, because as Robert said, unless we merge or flush, access is always random, so readahead is bad. We should still compare results under memory pressure and also without pressure to compare how it behaves. For consistency we should still keep the enum constant, so one can have the option to use another one. Maybe make the default configurable via sysprop? |
Unfortunately, benchmarking the cold index case correctly is not so easy ... I would not trust luceneutil to give accurate results (its queries are synthetically generated). We would rather need a real-world large index (or use Not only realistic queries, but they should be delivered to Lucene accurately by time (i.e. at the actual arrival times that the queries came to the search engine), asynchronously ("open loop") to avoid the coordinated omission bug. |
@mikemccand I wonder if we need to create such a sophisticated benchmark. If we could confirm that performance is not affected when the cache is hot, and better when the cache is cold, maybe that would be good enough? |
Yeah +1 I don't think we should block this change on sophisticated benchmarking! If we "first do no harm" (hot case not affected), and If we can show some improvement (or even no degradation?) in a simple cold benchmark then we should make this simplification! |
Could we still keep the NORMAL ReadAdvice constant and its mappings? We should just change the default! I can add a system property to make it configurable like the other MMapDir options. |
That works for me @uschindler. I updated the code and the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a system property can be added in a separate PR.
Here's a luceneutil run on wikibigall and data hot in the page cache. No difference, which is what was expected I guess:
|
I hacked luceneutil to clear my page cache with diff --git a/src/main/perf/TaskThreads.java b/src/main/perf/TaskThreads.java
index 313664f..b42a1bd 100644
--- a/src/main/perf/TaskThreads.java
+++ b/src/main/perf/TaskThreads.java
@@ -89,6 +89,12 @@ public class TaskThreads {
// Done
break;
}
+ ProcessBuilder pb = new ProcessBuilder("/bin/bash", "/home/jpountz/drop_caches.sh");
+ Process p = pb.start();
+ int code = p.waitFor();
+ if (code != 0) {
+ throw new Error();
+ }
final long t0 = System.nanoTime();
try {
task.go(indexState, taskParser); This gives the following output on wikibigall. Differences are somewhat bigger and some p values are low-ish, e.g.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the benchmarking @jpountz!
Thanks! I will provide a PR to make it configurable. |
See #13264 |
* Lucene 10 changed the IOContext.DEFAULT from sequential to random, which makes sense for search use case: apache/lucene#13244 * place we read a file only once, its better to switch to READONLY(sequential) * this should only be in cases the file is read by the same thread that opened it, e.g. it won't work for RemoteStore that does async upload Signed-off-by: Asim Mahmood <[email protected]>
#17670) * Lucene 10 changed the IOContext.DEFAULT from sequential to random, which makes sense for search use case: apache/lucene#13244 * place we read a file only once, its better to switch to READONLY(sequential) * this should only be in cases the file is read by the same thread that opened it, e.g. it won't work for RemoteStore that does async upload --------- Signed-off-by: Asim Mahmood <[email protected]>
opensearch-project#17670) * Lucene 10 changed the IOContext.DEFAULT from sequential to random, which makes sense for search use case: apache/lucene#13244 * place we read a file only once, its better to switch to READONLY(sequential) * this should only be in cases the file is read by the same thread that opened it, e.g. it won't work for RemoteStore that does async upload --------- Signed-off-by: Asim Mahmood <[email protected]> Signed-off-by: Harsh Kothari <[email protected]>
opensearch-project#17670) * Lucene 10 changed the IOContext.DEFAULT from sequential to random, which makes sense for search use case: apache/lucene#13244 * place we read a file only once, its better to switch to READONLY(sequential) * this should only be in cases the file is read by the same thread that opened it, e.g. it won't work for RemoteStore that does async upload --------- Signed-off-by: Asim Mahmood <[email protected]> Signed-off-by: Harsh Kothari <[email protected]>
This switches the default
ReadAdvice
fromNORMAL
toRANDOM
, which is a better fit for the kind of access pattern that Lucene has. This is expected to reduce page cache trashing and contention on the page table.NORMAL
is still available, but never used by any of the file formats.