Skip to content

Panic in datafusion_expr::window_state::WindowAggState::update #16308

Closed
@andygrove

Description

@andygrove

Describe the bug

Upgrading Comet to use 48.0.0-rc2 causes tests to fail with a attempt to subtract with overflow panic. This did not happen with rc1. I have not debugged this yet to find the root cause.

PR: apache/datafusion-comet#1853

failing build: https://github.com/apache/datafusion-comet/actions/runs/15491877086/job/43619110943?pr=1853

The relevant part of the stack trace is:

2025-06-06T13:57:54.1903145Z         at datafusion_expr::window_state::WindowAggState::update(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/expr/src/window_state.rs:95)
2025-06-06T13:57:54.1905310Z         at datafusion_physical_expr::window::window_expr::AggregateWindowExpr::aggregate_evaluate_stateful(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-expr/src/window/window_expr.rs:260)
2025-06-06T13:57:54.1920612Z         at <datafusion_physical_expr::window::aggregate::PlainAggregateWindowExpr as datafusion_physical_expr::window::window_expr::WindowExpr>::evaluate_stateful(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-expr/src/window/aggregate.rs:148)
2025-06-06T13:57:54.1924024Z         at datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream::compute_aggregates(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:983)
2025-06-06T13:57:54.1927398Z         at datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream::poll_next_inner(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:1033)
2025-06-06T13:57:54.1930653Z         at <datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream as futures_core::stream::Stream>::poll_next(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:949)

There was one PR between rc1 and rc2 specifically related to evaluating window expressions, so I wonder if that is the issue. I will try and confirm.

#16234

Full stack trace:

2025-06-06T13:57:54.1864287Z - aggregate window function for all types *** FAILED *** (406 milliseconds)
2025-06-06T13:57:54.1871363Z   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2045.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2045.0 (TID 5401) (62bae2d9d85a executor driver): org.apache.comet.CometNativeException: attempt to subtract with overflow
2025-06-06T13:57:54.1873529Z         at comet::errors::init::{{closure}}(/__w/datafusion-comet/datafusion-comet/native/core/src/errors.rs:151)
2025-06-06T13:57:54.1883399Z         at <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/alloc/src/boxed.rs:1980)
2025-06-06T13:57:54.1894489Z         at std::panicking::rust_panic_with_hook(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/std/src/panicking.rs:841)
2025-06-06T13:57:54.1895884Z         at std::panicking::begin_panic_handler::{{closure}}(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/std/src/panicking.rs:699)
2025-06-06T13:57:54.1897662Z         at std::sys::backtrace::__rust_end_short_backtrace(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/std/src/sys/backtrace.rs:168)
2025-06-06T13:57:54.1899012Z         at __rustc::rust_begin_unwind(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/std/src/panicking.rs:697)
2025-06-06T13:57:54.1900180Z         at core::panicking::panic_fmt(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/core/src/panicking.rs:75)
2025-06-06T13:57:54.1901495Z         at core::panicking::panic_const::panic_const_sub_overflow(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/core/src/panicking.rs:178)
2025-06-06T13:57:54.1903145Z         at datafusion_expr::window_state::WindowAggState::update(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/expr/src/window_state.rs:95)
2025-06-06T13:57:54.1905310Z         at datafusion_physical_expr::window::window_expr::AggregateWindowExpr::aggregate_evaluate_stateful(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-expr/src/window/window_expr.rs:260)
2025-06-06T13:57:54.1920612Z         at <datafusion_physical_expr::window::aggregate::PlainAggregateWindowExpr as datafusion_physical_expr::window::window_expr::WindowExpr>::evaluate_stateful(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-expr/src/window/aggregate.rs:148)
2025-06-06T13:57:54.1924024Z         at datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream::compute_aggregates(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:983)
2025-06-06T13:57:54.1927398Z         at datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream::poll_next_inner(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:1033)
2025-06-06T13:57:54.1930653Z         at <datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream as futures_core::stream::Stream>::poll_next(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:949)
2025-06-06T13:57:54.1933599Z         at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
2025-06-06T13:57:54.1935713Z         at futures_util::stream::stream::StreamExt::poll_next_unpin(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
2025-06-06T13:57:54.1938604Z         at <datafusion_physical_plan::projection::ProjectionStream as futures_core::stream::Stream>::poll_next(/usr/local/cargo/git/checkouts/datafusion-11a8b534adb6bd68/85f6621/datafusion/physical-plan/src/projection.rs:354)
2025-06-06T13:57:54.1940894Z         at <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-core-0.3.31/src/stream.rs:130)
2025-06-06T13:57:54.1942871Z         at futures_util::stream::stream::StreamExt::poll_next_unpin(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/mod.rs:1638)
2025-06-06T13:57:54.1945055Z         at <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/stream/stream/next.rs:32)
2025-06-06T13:57:54.1947663Z         at futures_util::future::future::FutureExt::poll_unpin(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/future/future/mod.rs:558)
2025-06-06T13:57:54.1949835Z         at <futures_util::async_await::poll::PollOnce<F> as core::future::future::Future>::poll(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/futures-util-0.3.31/src/async_await/poll.rs:37)
2025-06-06T13:57:54.1952041Z         at comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}::{{closure}}::{{closure}}(/__w/datafusion-comet/datafusion-comet/native/core/src/execution/jni_api.rs:438)
2025-06-06T13:57:54.1954070Z         at tokio::runtime::park::CachedParkThread::block_on::{{closure}}(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/park.rs:284)
2025-06-06T13:57:54.1955846Z         at tokio::task::coop::with_budget(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/task/coop/mod.rs:167)
2025-06-06T13:57:54.1957636Z         at tokio::task::coop::budget(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/task/coop/mod.rs:133)
2025-06-06T13:57:54.1959325Z         at tokio::runtime::park::CachedParkThread::block_on(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/park.rs:284)
2025-06-06T13:57:54.1961375Z         at tokio::runtime::context::blocking::BlockingRegionGuard::block_on(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/context/blocking.rs:66)
2025-06-06T13:57:54.1963697Z         at tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/scheduler/multi_thread/mod.rs:87)
2025-06-06T13:57:54.1965887Z         at tokio::runtime::context::runtime::enter_runtime(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/context/runtime.rs:65)
2025-06-06T13:57:54.1968188Z         at tokio::runtime::scheduler::multi_thread::MultiThread::block_on(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/scheduler/multi_thread/mod.rs:86)
2025-06-06T13:57:54.1970246Z         at tokio::runtime::runtime::Runtime::block_on_inner(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/runtime.rs:358)
2025-06-06T13:57:54.1972087Z         at tokio::runtime::runtime::Runtime::block_on(/usr/local/cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/runtime.rs:330)
2025-06-06T13:57:54.1974189Z         at comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}::{{closure}}(/__w/datafusion-comet/datafusion-comet/native/core/src/execution/jni_api.rs:438)
2025-06-06T13:57:54.1975895Z         at comet::execution::tracing::with_trace(/__w/datafusion-comet/datafusion-comet/native/core/src/execution/tracing.rs:117)
2025-06-06T13:57:54.1977694Z         at comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::{{closure}}(/__w/datafusion-comet/datafusion-comet/native/core/src/execution/jni_api.rs:395)
2025-06-06T13:57:54.1979212Z         at comet::errors::curry::{{closure}}(/__w/datafusion-comet/datafusion-comet/native/core/src/errors.rs:485)
2025-06-06T13:57:54.1980462Z         at std::panicking::try::do_call(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/std/src/panicking.rs:589)
2025-06-06T13:57:54.1981370Z         at __rust_try(__internal__:0)
2025-06-06T13:57:54.1982193Z         at std::panicking::try(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/std/src/panicking.rs:552)
2025-06-06T13:57:54.1983410Z         at std::panic::catch_unwind(/rustc/17067e9ac6d7ecb70e50f92c1944e545188d2359/library/std/src/panic.rs:359)
2025-06-06T13:57:54.1984614Z         at comet::errors::try_unwrap_or_throw(/__w/datafusion-comet/datafusion-comet/native/core/src/errors.rs:499)
2025-06-06T13:57:54.1985938Z         at Java_org_apache_comet_Native_executePlan(/__w/datafusion-comet/datafusion-comet/native/core/src/execution/jni_api.rs:375)
2025-06-06T13:57:54.1987315Z         at <unknown>(__internal__:0)

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionSomething that used to work no longer doesspark

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions