[data] Test core performance metrics#40757
[data] Test core performance metrics#40757stephanie-wang merged 26 commits intoray-project:masterfrom
Conversation
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
…into data-metrics-testing
raulchen
left a comment
There was a problem hiding this comment.
Nice improvement. I left a few small comments.
python/ray/data/tests/conftest.py
Outdated
| # Wait for a task to finish to prevent a race condition where not all of | ||
| # the task metrics have been collected yet. | ||
| if expected_metrics.get_task_count() is not None: | ||
| ref = barrier.remote() |
There was a problem hiding this comment.
I guess this doesn't strictly guarantee that all previous tasks metrics are collected. As task metrics are reported from multiple nodes. Is this an issue?
There was a problem hiding this comment.
instead of inserting the barrier. maybe it's more robust to use wait_for_condition to wait for the assert conditions become true.
There was a problem hiding this comment.
Sometimes wait_for_condition is not enough because we also need to check negative conditions (like no tasks executed).
There was a problem hiding this comment.
But yes right now there is a possible race condition here. Need to check if this will be an issue in CI or not.
|
|
||
| class CoreExecutionMetrics: | ||
| def __init__(self, task_count=None, object_store_stats=None, actor_count=None): | ||
| self.task_count = task_count |
There was a problem hiding this comment.
nit, can we make these variables default to empty dicts? so we don't need those None checks.
There was a problem hiding this comment.
This is a bad idea in python: https://stackoverflow.com/questions/26320899/why-is-the-empty-dictionary-a-dangerous-default-value-in-python
| ) | ||
| total_bytes_expected = num_blocks_expected * block_size_expected | ||
|
|
||
| print(f"Expecting {total_bytes_expected} bytes, {num_blocks_expected} blocks") |
There was a problem hiding this comment.
logger.debug? Seems could be verbose.
There was a problem hiding this comment.
I think it's fine since this is only used in testing and it's useful to see the output immediately if the test fails.
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Adds performance testing utilities to assert for Ray core metrics on tasks submitted and objects created. Also adds tests for these metrics on some key operations:
map with dynamic block splitting
.limit/.take
.schema
This acts as a regression test for (at least) the following issues:
[data] Dataset.schema() may get recomputed each time ray-project#37077
[data] .limit() does not truncate execution as expected ray-project#37858
[data] read_images().take(1) is very slow on S3 / pushdown limit() into individual tasks ray-project#38023
---------
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Stacked on #40757 Compute the block size for each operation before applying other optimizer rules that depend on it (SplitReadOutputBlocksRule). This also simplifies the block sizing, so we always propagate an op's target block size to all upstream ops, until we find an op that has a different block size set. Related issue number Closes #41018. --------- Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Why are these changes needed?
Adds performance testing utilities to assert for Ray core metrics on tasks submitted and objects created. Also adds tests for these metrics on some key operations:
This acts as a regression test for (at least) the following issues:
TODO: Also test #38400. I had started this but ran into #41018.
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.