Skip to content

Fix: Pandas rolling removes imaginary part of complex #47028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

weikhor
Copy link
Contributor

@weikhor weikhor commented May 15, 2022

func_name = arithmetic_win_operators
df = DataFrame([1j, 1 + 2j])
result = getattr(
df.rolling(2).apply(lambda x: print(x) is None),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So func_name should now be tested instead of apply

Copy link
Contributor Author

@weikhor weikhor May 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mroeschke, I not able to understand fully. If I not use apply.

func_name = arithmetic_win_operators
df = DataFrame([1j, 1 + 2j])
result = getattr(
    df.rolling(2),
    func_name,
)()

Will get AssertionError depend on function from arithmetic_win_operators

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, the cython code may need modifying to support complex dtypes in this case if an AssertionError is raised (guessing since no traceback was provided)

Copy link
Contributor Author

@weikhor weikhor May 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke

The code I run

func_name = arithmetic_win_operators
df = DataFrame([1j, 1 + 2j])
result = getattr(
    df.rolling(2),
    func_name,
)()

The 'arithmetic_win_operators' can be sum, mean, median, max, min, var, std, kurt, skew, count, sem (11 possible arithmetic function.

For example, if arithmetic_win_operators='sum', the error is raised that function sum does not support complex dtypes from cython function inside 'pandas/_libs/window/aggregations.pyx'

Traceback (most recent call last):
  File "/home/open_source/pandas_2/development/development_5.py", line 26, in <module>
    result = getattr(
  File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 1974, in sum
    return super().sum(*args, engine=engine, engine_kwargs=engine_kwargs, **kwargs)
  File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 1407, in sum
    return self._apply(window_func, name="sum", **kwargs)
  File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 619, in _apply
    return self._apply_blockwise(homogeneous_func, name)
  File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 487, in _apply_blockwise
    res = hfunc(arr)
  File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 477, in hfunc
    return homogeneous_func(values)
  File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 614, in homogeneous_func
    result = calc(values)
  File "/home/open_source/pandas_2/pandas/pandas/core/window/rolling.py", line 611, in calc
    return func(x, start, end, min_periods, *numba_args)
  File "pandas/_libs/window/aggregations.pyx", line 132, in pandas._libs.window.aggregations.roll_sum
ValueError: Buffer dtype mismatch, expected 'const float64_t' but got 'complex float'

Some question. Thus, to make this work. I should write new roll sum function that support complex dtypes in cython. Is this right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create fused types (e.g. float + complex) that the current cython functions accept instead which should get this working for the existing functions. You can see the groupby cython code for an example.

Copy link
Contributor Author

@weikhor weikhor May 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke Hi
I not sure whether I do it correctly. I try to update roll_mean to support complex type in this file pandas/_libs/window/aggregations.pyx

ctypedef fused float_complex_t:
    float64_t
    complex64_t

cdef inline void add_mean(float_complex_t val, Py_ssize_t *nobs, float_complex_t *sum_x,
                          Py_ssize_t *neg_ct, float_complex_t *compensation,
                          int64_t *num_consecutive_same_value, float_complex_t *prev_value) nogil:
    """ add a value from the mean calc using Kahan summation """
    cdef:
        float_complex_t y, t

    # Not NaN
    if val == val:
        nobs[0] = nobs[0] + 1
        y = val - compensation[0]
        t = sum_x[0] + y
        compensation[0] = t - sum_x[0] - y
        sum_x[0] = t
        if signbit(val):
            neg_ct[0] = neg_ct[0] + 1

        # GH#42064, record num of same values to remove floating point artifacts
        if val == prev_value[0]:
            num_consecutive_same_value[0] += 1
        else:
            # reset to 1 (include current value itself)
            num_consecutive_same_value[0] = 1
        prev_value[0] = val

The error is raised

Error compiling Cython file:
------------------------------------------------------------
...
        nobs[0] = nobs[0] + 1
        y = val - compensation[0]
        t = sum_x[0] + y
        compensation[0] = t - sum_x[0] - y
        sum_x[0] = t
        if signbit(val):
                  ^
------------------------------------------------------------

pandas/_libs/window/aggregations.pyx:231:19: Cannot assign type 'float complex' to 'long double'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will probably need cast this to float before passing in this C func, or just use a comparison operator to replace sigbit.

Copy link
Contributor Author

@weikhor weikhor May 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modify code in cython pandas/_libs/window/aggregations.pyx file to support complex type. Do casting complex values to float in arithmetic operation

@weikhor weikhor closed this May 21, 2022
@weikhor weikhor reopened this May 22, 2022
@weikhor weikhor requested a review from mroeschke May 22, 2022 13:15
@@ -572,7 +572,6 @@ def test_rolling_axis_count(axis_frame):
def test_readonly_array():
# GH-27766
arr = np.array([1, 3, np.nan, 3, 5])
arr.setflags(write=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be removed since it's the readonly part of the test

ndarray[int64_t] end, int64_t minp, int ddof=1) -> np.ndarray:
"""
Numerically stable implementation using Welford's method.
"""
cdef:
float64_t[:] values = ensure_float64(float_complex_values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a suitable approach since it defeats the purpose of using a memory view in the input.

@@ -602,6 +606,25 @@ def calc(x):
)
self._check_window_bounds(start, end, len(x))

arithmetic_operators = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this may still be addressable in cython without hardcoding this here which is also not idea.

Going back to a couple of commits earlier where signbit was the issue, all it does is check if the value is negative: https://www.cplusplus.com/reference/cmath/signbit/

So maybe test if you can do a value comparison to make it complex compatible

@weikhor
Copy link
Contributor Author

weikhor commented May 26, 2022

I close this since I stuck this for long time. I not familiar much with cython part. For now, I not able to find examples to handle fused type (complex + float) I get other task which I think can complete. Thank for review.

@weikhor weikhor closed this May 26, 2022
@weikhor weikhor reopened this Jun 6, 2022
@weikhor weikhor closed this Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Pandas rolling removes imaginary part of complex
2 participants