-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fix: Pandas rolling removes imaginary part of complex #47028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/window/test_rolling.py
Outdated
func_name = arithmetic_win_operators | ||
df = DataFrame([1j, 1 + 2j]) | ||
result = getattr( | ||
df.rolling(2).apply(lambda x: print(x) is None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So func_name
should now be tested instead of apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mroeschke, I not able to understand fully. If I not use apply.
func_name = arithmetic_win_operators
df = DataFrame([1j, 1 + 2j])
result = getattr(
df.rolling(2),
func_name,
)()
Will get AssertionError depend on function from arithmetic_win_operators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, the cython code may need modifying to support complex dtypes in this case if an AssertionError is raised (guessing since no traceback was provided)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code I run
func_name = arithmetic_win_operators
df = DataFrame([1j, 1 + 2j])
result = getattr(
df.rolling(2),
func_name,
)()
The 'arithmetic_win_operators' can be sum, mean, median, max, min, var, std, kurt, skew, count, sem (11 possible arithmetic function.
For example, if arithmetic_win_operators='sum', the error is raised that function sum does not support complex dtypes from cython function inside 'pandas/_libs/window/aggregations.pyx'
Traceback (most recent call last):
File "/home/open_source/pandas_2/development/development_5.py", line 26, in <module>
result = getattr(
File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 1974, in sum
return super().sum(*args, engine=engine, engine_kwargs=engine_kwargs, **kwargs)
File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 1407, in sum
return self._apply(window_func, name="sum", **kwargs)
File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 619, in _apply
return self._apply_blockwise(homogeneous_func, name)
File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 487, in _apply_blockwise
res = hfunc(arr)
File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 477, in hfunc
return homogeneous_func(values)
File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 614, in homogeneous_func
result = calc(values)
File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 611, in calc
return func(x, start, end, min_periods, *numba_args)
File "pandas/_libs/window/aggregations.pyx", line 132, in pandas._libs.window.aggregations.roll_sum
ValueError: Buffer dtype mismatch, expected 'const float64_t' but got 'complex float'
Some question. Thus, to make this work. I should write new roll sum function that support complex dtypes in cython. Is this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create fused types (e.g. float + complex) that the current cython functions accept instead which should get this working for the existing functions. You can see the groupby cython code for an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mroeschke Hi
I not sure whether I do it correctly. I try to update roll_mean
to support complex type in this file pandas/_libs/window/aggregations.pyx
ctypedef fused float_complex_t:
float64_t
complex64_t
cdef inline void add_mean(float_complex_t val, Py_ssize_t *nobs, float_complex_t *sum_x,
Py_ssize_t *neg_ct, float_complex_t *compensation,
int64_t *num_consecutive_same_value, float_complex_t *prev_value) nogil:
""" add a value from the mean calc using Kahan summation """
cdef:
float_complex_t y, t
# Not NaN
if val == val:
nobs[0] = nobs[0] + 1
y = val - compensation[0]
t = sum_x[0] + y
compensation[0] = t - sum_x[0] - y
sum_x[0] = t
if signbit(val):
neg_ct[0] = neg_ct[0] + 1
# GH#42064, record num of same values to remove floating point artifacts
if val == prev_value[0]:
num_consecutive_same_value[0] += 1
else:
# reset to 1 (include current value itself)
num_consecutive_same_value[0] = 1
prev_value[0] = val
The error is raised
Error compiling Cython file:
------------------------------------------------------------
...
nobs[0] = nobs[0] + 1
y = val - compensation[0]
t = sum_x[0] + y
compensation[0] = t - sum_x[0] - y
sum_x[0] = t
if signbit(val):
^
------------------------------------------------------------
pandas/_libs/window/aggregations.pyx:231:19: Cannot assign type 'float complex' to 'long double'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will probably need cast this to float before passing in this C func, or just use a comparison operator to replace sigbit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modify code in cython pandas/_libs/window/aggregations.pyx
file to support complex type. Do casting complex values to float in arithmetic operation
pandas/tests/window/test_rolling.py
Outdated
@@ -572,7 +572,6 @@ def test_rolling_axis_count(axis_frame): | |||
def test_readonly_array(): | |||
# GH-27766 | |||
arr = np.array([1, 3, np.nan, 3, 5]) | |||
arr.setflags(write=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be removed since it's the readonly
part of the test
pandas/_libs/window/aggregations.pyx
Outdated
ndarray[int64_t] end, int64_t minp, int ddof=1) -> np.ndarray: | ||
""" | ||
Numerically stable implementation using Welford's method. | ||
""" | ||
cdef: | ||
float64_t[:] values = ensure_float64(float_complex_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a suitable approach since it defeats the purpose of using a memory view in the input.
pandas/core/window/rolling.py
Outdated
@@ -602,6 +606,25 @@ def calc(x): | |||
) | |||
self._check_window_bounds(start, end, len(x)) | |||
|
|||
arithmetic_operators = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this may still be addressable in cython without hardcoding this here which is also not idea.
Going back to a couple of commits earlier where signbit
was the issue, all it does is check if the value is negative: https://www.cplusplus.com/reference/cmath/signbit/
So maybe test if you can do a value comparison to make it complex compatible
I close this since I stuck this for long time. I not familiar much with cython part. For now, I not able to find examples to handle fused type (complex + float) I get other task which I think can complete. Thank for review. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.