-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
[ArrowStringArray] TST: parametrize str.split tests #41392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrowStringArray] TST: parametrize str.split tests #41392
Conversation
asv_bench/benchmarks/strings.py
Outdated
@@ -230,17 +230,24 @@ def time_contains(self, dtype, regex): | |||
|
|||
class Split: | |||
|
|||
params = [True, False] | |||
param_names = ["expand"] | |||
params = (["str", "string", "arrow_string"], [None, "-", "--"], [True, False]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave out the pat
from this benchmark, to reduce the combinatory explosion of cases. Which pattern is being used shouldn't influence the performance of expanding or not. So I would benchmark them separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. This is from broken off #41085 where sep determines the path taken, either pyarrow.compute.utf8_split_whitespace, pyarrow.compute.split_pattern or object fallback.
so can remove from this PR for now.
these are the current timings for master (will hopefully get improvements for object fallback by breaking off changes in #41372 and also by using pyarrow kernels #41085)
|
Thanks Simon! |
the test (and benchmark) changes broken off from #41085 as a precursor to #41085 and #41372 (which currently makes changes to the str.split path, although I may break that PR up also)