Compute Limit bugfix #123

jaychia · 2022-08-26T07:28:56Z

No description provided.

@ccmao1130

## Summary  ## Related Issues  ## Changes Made  ## Checklist - [ ] All tests have passed - [ ] Documented in API Docs - [ ] Documented in User Guide - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag [@ccmao1130](https://github.com/ccmao1130) for docs review)

@ccmao1130

## Summary Fix some typos, missing info, on s3 tables docs. ## Related Issues  ## Changes Made  ## Checklist - [ ] All tests have passed - [ ] Documented in API Docs - [ ] Documented in User Guide - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag [@ccmao1130](https://github.com/ccmao1130) for docs review)

@ccmao1130

## Summary Track imports on scarf ## Related Issues  ## Changes Made  ## Checklist - [ ] All tests have passed - [ ] Documented in API Docs - [ ] Documented in User Guide - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag [@ccmao1130](https://github.com/ccmao1130) for docs review)

@ccmao1130

## Summary  ## Related Issues  ## Changes Made  ## Checklist - [ ] All tests have passed - [ ] Documented in API Docs - [ ] Documented in User Guide - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag [@ccmao1130](https://github.com/ccmao1130) for docs review)

@ccmao1130

…ad of endswith (#4046) ## Summary Fix runner switching test. ## Related Issues  ## Changes Made Pass the assertion as long as `cannot set runner more than once` is inside error message instead of endswith, as there could be other stuff. ## Checklist - [ ] All tests have passed - [ ] Documented in API Docs - [ ] Documented in User Guide - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag [@ccmao1130](https://github.com/ccmao1130) for docs review)

## Summary Clean up the PR template in various ways ## Related Issues  ## Changes Made  - remove summary section (I felt that there was a large amount of redundancy between the PR title, summary, and changes made sections) - remove JIRA mention in changes made since we do not use that - remove "All tests have passed" from checklist since that is already a strict requirement for merging - add "(if applicable)" to some checklist items ## Checklist - [x] All tests have passed - [x] Documented in API Docs - [x] Documented in User Guide - [x] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [x] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Summary Remove ray compatibility workflow as it has been failing for a while. ## Related Issues  ## Changes Made  ## Checklist - [ ] All tests have passed - [ ] Documented in API Docs - [ ] Documented in User Guide - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made Adds overwrite mode to spark connect for writing csv and parquet. ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review) --------- Co-authored-by: universalmind303 <[email protected]>

## Changes Made Adds docs in core concepts for cross column expressions ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made Add a section called numbering rows that explains how to add ids to rows with monotonically increasing id ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made Use io-runtime for task in remote parquet reader. Fixes deadlock. ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made Disable scan task merging for warcs. ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review) --------- Co-authored-by: Desmond Cheong <[email protected]>

## Changes Made Quick refactor of some of the logic that previously lived in PushDownFilter into its own rule. I previously implemented it in PushDownFilter because it could have interactions with the other filter pushdown logic. However, we should try to keep PushDownFilter as simple as we can, just for pushing filter nodes through a plan. Thus, I am moving this logic into a new rule and using the repeated execution of the rule batch to deal with interactions between the two rules. ## Related Issues  Related to why we closed #4118 ## Checklist - [x] Documented in API Docs (if applicable) - [x] Documented in User Guide (if applicable) - [x] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [x] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made This PR adds an optimizer rule that pushes down join predicates. This will fix our TPC-H benchmarks, since Q13, which is technically a non-equi join, can be optimized into an equi-join. ## Related Issues  ## Checklist - [x] Documented in API Docs (if applicable) - [x] Documented in User Guide (if applicable) - [x] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [x] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review) Co-authored-by: desmondcheongzx <[email protected]>

## Changes Made Added an optimizer rule `PushDownAntiSemiJoin`, which pushes anti and semi joins through inner joins, projects, filters, sorts, and distincts. I decided to do this instead of adding anti and semi joins to our brute force join reorderer because it is simpler and will largely produce the same results, especially with TPC-H. More specifically, join graph creation makes some assumptions about the uniqueness of input column names that I'm not sure will hold for anti and semi joins, since their columns are not preserved in the output. I was not confident in my ability to modify the join reordering algorithm to properly fix this within a reasonable time. We will want to implement a more efficient reordering algorithm in the future anyway, so I do not want to spend a significant effort on patching the current one up. ### TPC-H Q18 #### Results | Benchmark | Before | After | Improvement | |--------|--------|--------|--------| | Local (SF 100) | 56.79s | 10.38s | 5.47x | | Distributed (SF 1000) | 285.4s | 66.12s | 4.32x | #### Optimized Plan (Before) ```mermaid flowchart TD subgraph optimized["Optimized LogicalPlan"] direction TB optimizedLimit0["Limit: 100 Stats = { Approx num rows = 100, Approx size bytes = 1.56 KiB, Accumulated selectivity = 0.00 }"] optimizedSort1["Sort: Sort by = (col(o_totalprice), descending, nulls first), (col(o_orderdate), ascending, nulls last) Stats = { Approx num rows = 5,665,935, Approx size bytes = 86.46 MiB, Accumulated selectivity = 0.12 }"] optimizedAggregate2["Aggregation: sum(col(l_quantity)) Group by = col(c_name), col(c_custkey), col(o_orderkey), col(o_orderdate), col(o_totalprice) Output schema = c_name#Utf8, c_custkey#Int64, o_orderkey#Int64, o_orderdate#Date, o_totalprice#Float64, l_quantity#Int64 Stats = { Approx num rows = 5,665,935, Approx size bytes = 86.46 MiB, Accumulated selectivity = 0.12 }"] optimizedProject3["Project: col(l_quantity), col(c_name), col(c_custkey), col(o_orderkey), col(o_orderdate), col(o_totalprice) Stats = { Approx num rows = 7,082,419, Approx size bytes = 109.76 MiB, Accumulated selectivity = 0.14 }"] optimizedJoin4["Join: Type = Semi Strategy = Auto On = col(left.o_orderkey#Int64) == col(right.l_orderkey#Int64) Output schema = o_orderkey#Int64, c_custkey#Int64, c_name#Utf8, o_totalprice#Float64, o_orderdate#Date, l_quantity#Int64 Stats = { Approx num rows = 7,082,419, Approx size bytes = 109.76 MiB, Accumulated selectivity = 0.14 }"] optimizedProject5["Project: col(o_orderkey), col(c_custkey), col(c_name), col(o_totalprice), col(o_orderdate), col(l_quantity) Stats = { Approx num rows = 46,594,852, Approx size bytes = 722.09 MiB, Accumulated selectivity = 0.95 }"] optimizedJoin6["Join: Type = Inner Strategy = Auto On = col(left.o_orderkey#Int64) == col(right.l_orderkey#Int64) Output schema = c_custkey#Int64, c_name#Utf8, o_orderkey#Int64, o_totalprice#Float64, o_orderdate#Date, l_orderkey#Int64, l_quantity#Int64 Stats = { Approx num rows = 46,594,852, Approx size bytes = 722.09 MiB, Accumulated selectivity = 0.95 }"] optimizedProject7["Project: col(c_custkey), col(c_name), col(o_orderkey), col(o_totalprice), col(o_orderdate) Stats = { Approx num rows = 18,273,257, Approx size bytes = 496.66 MiB, Accumulated selectivity = 0.95 }"] optimizedJoin8["Join: Type = Inner Strategy = Auto On = col(left.c_custkey#Int64) == col(right.o_custkey#Int64) Output schema = c_custkey#Int64, c_name#Utf8, o_orderkey#Int64, o_custkey#Int64, o_totalprice#Float64, o_orderdate#Date Stats = { Approx num rows = 18,273,257, Approx size bytes = 496.66 MiB, Accumulated selectivity = 0.95 }"] optimizedProject9["Project: col(C_CUSTKEY) as c_custkey, col(C_NAME) as c_name Stats = { Approx num rows = 3,308,528, Approx size bytes = 89.14 MiB, Accumulated selectivity = 1.00 }"] optimizedSource10["Num Scan Tasks = 32 File schema = C_CUSTKEY#Int64, C_NAME#Utf8, C_ADDRESS#Utf8, C_NATIONKEY#Int64, C_PHONE#Utf8, C_ACCTBAL#Float64, C_MKTSEGMENT#Utf8, C_COMMENT#Utf8 Partitioning keys = [] Projection pushdown = [C_CUSTKEY, C_NAME] Output schema = C_CUSTKEY#Int64, C_NAME#Utf8 Stats = { Approx num rows = 3,308,528, Approx size bytes = 89.14 MiB, Accumulated selectivity = 1.00 }"] optimizedSource10 --> optimizedProject9 optimizedProject9 --> optimizedJoin8 optimizedProject11["Project: col(O_ORDERKEY) as o_orderkey, col(O_CUSTKEY) as o_custkey, col(O_TOTALPRICE) as o_totalprice, col(O_ORDERDATE) as o_orderdate Stats = { Approx num rows = 18,273,257, Approx size bytes = 496.66 MiB, Accumulated selectivity = 0.95 }"] optimizedSource12["Num Scan Tasks = 32 File schema = O_ORDERKEY#Int64, O_CUSTKEY#Int64, O_ORDERSTATUS#Utf8, O_TOTALPRICE#Float64, O_ORDERDATE#Date, O_ORDERPRIORITY#Utf8, O_CLERK#Utf8, O_SHIPPRIORITY#Int64, O_COMMENT#Utf8 Partitioning keys = [] Projection pushdown = [O_ORDERKEY, O_CUSTKEY, O_TOTALPRICE, O_ORDERDATE] Filter pushdown = not(is_null(col(O_ORDERKEY))) Output schema = O_ORDERKEY#Int64, O_CUSTKEY#Int64, O_TOTALPRICE#Float64, O_ORDERDATE#Date Stats = { Approx num rows = 18,273,257, Approx size bytes = 496.66 MiB, Accumulated selectivity = 0.95 }"] optimizedSource12 --> optimizedProject11 optimizedProject11 --> optimizedJoin8 optimizedJoin8 --> optimizedProject7 optimizedProject7 --> optimizedJoin6 optimizedProject13["Project: col(L_ORDERKEY) as l_orderkey, col(L_QUANTITY) as l_quantity Stats = { Approx num rows = 49,047,212, Approx size bytes = 760.10 MiB, Accumulated selectivity = 1.00 }"] optimizedSource14["Num Scan Tasks = 32 File schema = L_ORDERKEY#Int64, L_PARTKEY#Int64, L_SUPPKEY#Int64, L_LINENUMBER#Int64, L_QUANTITY#Int64, L_EXTENDEDPRICE#Float64, L_DISCOUNT#Float64, L_TAX#Float64, L_RETURNFLAG#Utf8, L_LINESTATUS#Utf8, L_SHIPDATE#Date, L_COMMITDATE#Date, L_RECEIPTDATE#Date, L_SHIPINSTRUCT#Utf8, L_SHIPMODE#Utf8, L_COMMENT#Utf8 Partitioning keys = [] Projection pushdown = [L_ORDERKEY, L_QUANTITY] Output schema = L_ORDERKEY#Int64, L_QUANTITY#Int64 Stats = { Approx num rows = 49,047,212, Approx size bytes = 760.10 MiB, Accumulated selectivity = 1.00 }"] optimizedSource14 --> optimizedProject13 optimizedProject13 --> optimizedJoin6 optimizedJoin6 --> optimizedProject5 optimizedProject5 --> optimizedJoin4 optimizedProject15["Project: col(l_orderkey) Stats = { Approx num rows = 7,455,177, Approx size bytes = 113.76 MiB, Accumulated selectivity = 0.15 }"] optimizedFilter16["Filter: col((l_quantity.local_sum() > Literal(Int64(300)))) Stats = { Approx num rows = 7,455,177, Approx size bytes = 113.76 MiB, Accumulated selectivity = 0.15 }"] optimizedProject17["Project: col(l_orderkey), col(l_quantity.local_sum()) > lit(300) as (l_quantity.local_sum() > Literal(Int64(300))) Stats = { Approx num rows = 37,275,881, Approx size bytes = 568.78 MiB, Accumulated selectivity = 0.76 }"] optimizedFilter18["Filter: not(is_null(col(l_orderkey))) Stats = { Approx num rows = 37,275,881, Approx size bytes = 568.78 MiB, Accumulated selectivity = 0.76 }"] optimizedAggregate19["Aggregation: sum(col(l_quantity)) as l_quantity.local_sum() Group by = col(l_orderkey) Output schema = l_orderkey#Int64, l_quantity.local_sum()#Int64 Stats = { Approx num rows = 39,237,769, Approx size bytes = 598.72 MiB, Accumulated selectivity = 0.80 }"] optimizedProject20["Project: col(L_QUANTITY) as l_quantity, col(L_ORDERKEY) as l_orderkey Stats = { Approx num rows = 49,047,212, Approx size bytes = 760.10 MiB, Accumulated selectivity = 1.00 }"] optimizedSource21["Num Scan Tasks = 32 File schema = L_ORDERKEY#Int64, L_PARTKEY#Int64, L_SUPPKEY#Int64, L_LINENUMBER#Int64, L_QUANTITY#Int64, L_EXTENDEDPRICE#Float64, L_DISCOUNT#Float64, L_TAX#Float64, L_RETURNFLAG#Utf8, L_LINESTATUS#Utf8, L_SHIPDATE#Date, L_COMMITDATE#Date, L_RECEIPTDATE#Date, L_SHIPINSTRUCT#Utf8, L_SHIPMODE#Utf8, L_COMMENT#Utf8 Partitioning keys = [] Projection pushdown = [L_QUANTITY, L_ORDERKEY] Output schema = L_ORDERKEY#Int64, L_QUANTITY#Int64 Stats = { Approx num rows = 49,047,212, Approx size bytes = 760.10 MiB, Accumulated selectivity = 1.00 }"] optimizedSource21 --> optimizedProject20 optimizedProject20 --> optimizedAggregate19 optimizedAggregate19 --> optimizedFilter18 optimizedFilter18 --> optimizedProject17 optimizedProject17 --> optimizedFilter16 optimizedFilter16 --> optimizedProject15 optimizedProject15 --> optimizedJoin4 optimizedJoin4 --> optimizedProject3 optimizedProject3 --> optimizedAggregate2 optimizedAggregate2 --> optimizedSort1 optimizedSort1 --> optimizedLimit0 end ``` #### Optimized Plan (After) ```mermaid flowchart TD subgraph optimized["Optimized LogicalPlan"] direction TB optimizedLimit0["Limit: 100 Stats = { Approx num rows = 100, Approx size bytes = 1.56 KiB, Accumulated selectivity = 0.00 }"] optimizedSort1["Sort: Sort by = (col(o_totalprice), descending, nulls first), (col(o_orderdate), ascending, nulls last) Stats = { Approx num rows = 5,665,935, Approx size bytes = 86.46 MiB, Accumulated selectivity = 0.12 }"] optimizedAggregate2["Aggregation: sum(col(l_quantity)) Group by = col(c_name), col(c_custkey), col(o_orderkey), col(o_orderdate), col(o_totalprice) Output schema = c_name#Utf8, c_custkey#Int64, o_orderkey#Int64, o_orderdate#Date, o_totalprice#Float64, l_quantity#Int64 Stats = { Approx num rows = 5,665,935, Approx size bytes = 86.46 MiB, Accumulated selectivity = 0.12 }"] optimizedProject3["Project: col(l_quantity), col(c_name), col(c_custkey), col(o_orderkey), col(o_orderdate), col(o_totalprice) Stats = { Approx num rows = 7,082,419, Approx size bytes = 109.76 MiB, Accumulated selectivity = 0.14 }"] optimizedProject4["Project: col(c_custkey), col(c_name), col(o_orderkey), col(o_totalprice), col(o_orderdate), col(l_orderkey), col(l_quantity) Stats = { Approx num rows = 7,082,419, Approx size bytes = 109.76 MiB, Accumulated selectivity = 0.14 }"] optimizedJoin5["Join: Type = Inner Strategy = Auto On = col(left.c_custkey#Int64) == col(right.o_custkey#Int64) Output schema = c_custkey#Int64, c_name#Utf8, o_orderkey#Int64, o_custkey#Int64, o_totalprice#Float64, o_orderdate#Date, l_orderkey#Int64, l_quantity#Int64 Stats = { Approx num rows = 7,082,419, Approx size bytes = 109.76 MiB, Accumulated selectivity = 0.14 }"] optimizedProject6["Project: col(C_CUSTKEY) as c_custkey, col(C_NAME) as c_name Stats = { Approx num rows = 3,308,528, Approx size bytes = 89.14 MiB, Accumulated selectivity = 1.00 }"] optimizedSource7["Num Scan Tasks = 32 File schema = C_CUSTKEY#Int64, C_NAME#Utf8, C_ADDRESS#Utf8, C_NATIONKEY#Int64, C_PHONE#Utf8, C_ACCTBAL#Float64, C_MKTSEGMENT#Utf8, C_COMMENT#Utf8 Partitioning keys = [] Projection pushdown = [C_CUSTKEY, C_NAME] Output schema = C_CUSTKEY#Int64, C_NAME#Utf8 Stats = { Approx num rows = 3,308,528, Approx size bytes = 89.14 MiB, Accumulated selectivity = 1.00 }"] optimizedSource7 --> optimizedProject6 optimizedProject6 --> optimizedJoin5 optimizedJoin8["Join: Type = Inner Strategy = Auto On = col(left.o_orderkey#Int64) == col(right.l_orderkey#Int64) Output schema = o_orderkey#Int64, o_custkey#Int64, o_totalprice#Float64, o_orderdate#Date, l_orderkey#Int64, l_quantity#Int64 Stats = { Approx num rows = 7,082,419, Approx size bytes = 109.76 MiB, Accumulated selectivity = 0.14 }"] optimizedProject9["Project: col(O_ORDERKEY) as o_orderkey, col(O_CUSTKEY) as o_custkey, col(O_TOTALPRICE) as o_totalprice, col(O_ORDERDATE) as o_orderdate Stats = { Approx num rows = 7,082,419, Approx size bytes = 108.07 MiB, Accumulated selectivity = 0.14 }"] optimizedJoin10["Join: Type = Semi Strategy = Auto On = col(left.O_ORDERKEY#Int64) == col(right.l_orderkey#Int64) Output schema = O_ORDERKEY#Int64, O_CUSTKEY#Int64, O_TOTALPRICE#Float64, O_ORDERDATE#Date Stats = { Approx num rows = 7,082,419, Approx size bytes = 108.07 MiB, Accumulated selectivity = 0.14 }"] optimizedSource11["Num Scan Tasks = 32 File schema = O_ORDERKEY#Int64, O_CUSTKEY#Int64, O_ORDERSTATUS#Utf8, O_TOTALPRICE#Float64, O_ORDERDATE#Date, O_ORDERPRIORITY#Utf8, O_CLERK#Utf8, O_SHIPPRIORITY#Int64, O_COMMENT#Utf8 Partitioning keys = [] Projection pushdown = [O_ORDERKEY, O_CUSTKEY, O_TOTALPRICE, O_ORDERDATE] Filter pushdown = not(is_null(col(O_ORDERKEY))) Output schema = O_ORDERKEY#Int64, O_CUSTKEY#Int64, O_TOTALPRICE#Float64, O_ORDERDATE#Date Stats = { Approx num rows = 18,273,257, Approx size bytes = 496.66 MiB, Accumulated selectivity = 0.95 }"] optimizedSource11 --> optimizedJoin10 optimizedProject12["Project: col(l_orderkey) Stats = { Approx num rows = 7,455,177, Approx size bytes = 113.76 MiB, Accumulated selectivity = 0.15 }"] optimizedFilter13["Filter: col((l_quantity.local_sum() > Literal(Int64(300)))) Stats = { Approx num rows = 7,455,177, Approx size bytes = 113.76 MiB, Accumulated selectivity = 0.15 }"] optimizedProject14["Project: col(l_orderkey), col(l_quantity.local_sum()) > lit(300) as (l_quantity.local_sum() > Literal(Int64(300))) Stats = { Approx num rows = 37,275,881, Approx size bytes = 568.78 MiB, Accumulated selectivity = 0.76 }"] optimizedFilter15["Filter: not(is_null(col(l_orderkey))) Stats = { Approx num rows = 37,275,881, Approx size bytes = 568.78 MiB, Accumulated selectivity = 0.76 }"] optimizedAggregate16["Aggregation: sum(col(l_quantity)) as l_quantity.local_sum() Group by = col(l_orderkey) Output schema = l_orderkey#Int64, l_quantity.local_sum()#Int64 Stats = { Approx num rows = 39,237,769, Approx size bytes = 598.72 MiB, Accumulated selectivity = 0.80 }"] optimizedProject17["Project: col(L_QUANTITY) as l_quantity, col(L_ORDERKEY) as l_orderkey Stats = { Approx num rows = 49,047,212, Approx size bytes = 760.10 MiB, Accumulated selectivity = 1.00 }"] optimizedSource18["Num Scan Tasks = 32 File schema = L_ORDERKEY#Int64, L_PARTKEY#Int64, L_SUPPKEY#Int64, L_LINENUMBER#Int64, L_QUANTITY#Int64, L_EXTENDEDPRICE#Float64, L_DISCOUNT#Float64, L_TAX#Float64, L_RETURNFLAG#Utf8, L_LINESTATUS#Utf8, L_SHIPDATE#Date, L_COMMITDATE#Date, L_RECEIPTDATE#Date, L_SHIPINSTRUCT#Utf8, L_SHIPMODE#Utf8, L_COMMENT#Utf8 Partitioning keys = [] Projection pushdown = [L_QUANTITY, L_ORDERKEY] Output schema = L_ORDERKEY#Int64, L_QUANTITY#Int64 Stats = { Approx num rows = 49,047,212, Approx size bytes = 760.10 MiB, Accumulated selectivity = 1.00 }"] optimizedSource18 --> optimizedProject17 optimizedProject17 --> optimizedAggregate16 optimizedAggregate16 --> optimizedFilter15 optimizedFilter15 --> optimizedProject14 optimizedProject14 --> optimizedFilter13 optimizedFilter13 --> optimizedProject12 optimizedProject12 --> optimizedJoin10 optimizedJoin10 --> optimizedProject9 optimizedProject9 --> optimizedJoin8 optimizedProject19["Project: col(L_ORDERKEY) as l_orderkey, col(L_QUANTITY) as l_quantity Stats = { Approx num rows = 49,047,212, Approx size bytes = 760.10 MiB, Accumulated selectivity = 1.00 }"] optimizedSource20["Num Scan Tasks = 32 File schema = L_ORDERKEY#Int64, L_PARTKEY#Int64, L_SUPPKEY#Int64, L_LINENUMBER#Int64, L_QUANTITY#Int64, L_EXTENDEDPRICE#Float64, L_DISCOUNT#Float64, L_TAX#Float64, L_RETURNFLAG#Utf8, L_LINESTATUS#Utf8, L_SHIPDATE#Date, L_COMMITDATE#Date, L_RECEIPTDATE#Date, L_SHIPINSTRUCT#Utf8, L_SHIPMODE#Utf8, L_COMMENT#Utf8 Partitioning keys = [] Projection pushdown = [L_ORDERKEY, L_QUANTITY] Output schema = L_ORDERKEY#Int64, L_QUANTITY#Int64 Stats = { Approx num rows = 49,047,212, Approx size bytes = 760.10 MiB, Accumulated selectivity = 1.00 }"] optimizedSource20 --> optimizedProject19 optimizedProject19 --> optimizedJoin8 optimizedJoin8 --> optimizedJoin5 optimizedJoin5 --> optimizedProject4 optimizedProject4 --> optimizedProject3 optimizedProject3 --> optimizedAggregate2 optimizedAggregate2 --> optimizedSort1 optimizedSort1 --> optimizedLimit0 end ``` ## Related Issues  ## Checklist - [x] Documented in API Docs (if applicable) - [x] Documented in User Guide (if applicable) - [x] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [x] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made Use native runner for doctests. Modify code examples to add a sort if needed (e.g. groupbys) because swordfish does not have ordering guarantees without an order by. Also, don't run them for monotonically increasing id, because it requires partitioning. ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made Expose an async shuffle cache to python, remove use of python threadpool in the shuffle actor, spawn partitioner tasks that connect directly to writer tasks. ### How it works: Flight shuffle works by first partitioning and saving all outputs of the map pipeline to disk in arrow IPC stream format, then reading the files back and sending them to the reducer via arrow flight rpc. We use actor per nodes to do the partitioning and saving. Each actor has a `ShuffleCache` in rust which is responsible for partitioning and saving. Previously, we use python threads via python's concurrent threadpool executor to concurrently push partitions into the cache, which allows us to do partitioning in parallel. But, we shouldn't need to use a python threadpool, and instead be able to use the tokio runtimes in rust to do this. This reduces reliance on python, reduces number of threads, and will provide a smoother transition to using swordfish with the shuffle cache. To do this, we spawn `num_cpu` partitioner tasks and `num_partition` writer tasks on a single threaded runtime, the tasks then spawn compute on the compute runtime when needed. This is also how swordfish works. We then use https://docs.rs/pyo3-async-runtimes/latest/pyo3_async_runtimes/ to expose an async bridge to python. Since ray has async actors, this makes it easy to use async in python as well. ### Results: Question | Before (s) | After (s) | Difference (s) | % Change -- | -- | -- | -- | -- Q1 | 31.73 | 26.63 | -5.10 | -16.07% Q2 | 36.40 | 31.26 | -5.14 | -14.12% Q3 | 40.50 | 39.58 | -0.92 | -2.27% Q4 | 27.58 | 22.90 | -4.68 | -16.97% Q5 | 64.22 | 63.06 | -1.16 | -1.81% Q6 | 9.77 | 9.55 | -0.22 | -2.25% Q7 | 44.56 | 45.43 | 0.87 | 1.95% Q8 | 96.06 | 96.61 | 0.55 | 0.57% Q9 | 120.09 | 117.40 | -2.69 | -2.24% Q10 | 36.68 | 37.49 | 0.81 | 2.21% Q11 | 29.91 | 33.55 | 3.64 | 12.17% Q12 | 25.21 | 25.51 | 0.30 | 1.19% Q13 | 36.05 | 33.92 | -2.13 | -5.91% Q14 | 17.68 | 17.12 | -0.56 | -3.17% Q15 | 33.26 | 33.27 | 0.01 | 0.03% Q16 | 19.11 | 20.84 | 1.73 | 9.05% Q17 | 120.98 | 118.21 | -2.77 | -2.29% Q18 | 64.95 | 62.89 | -2.06 | -3.17% Q19 | 24.38 | 24.01 | -0.37 | -1.52% Q20 | 51.75 | 53.28 | 1.53 | 2.96% Q21 | 142.00 | 140.75 | -1.25 | -0.88% Q22 | 13.32 | 13.63 | 0.31 | 2.32% Total | 1086.19 | 1066.38 | -19.81 | -1.85% ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made  ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

Fix TimeUnit repr format doctest was failing because it didnt match the repr. ## Changes Made  ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made Mypy wasn't running. This removes the daft_dashboard arg (which is fine because it doesn't exist) and it works now. ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review) --------- Co-authored-by: R. Conner Howell <[email protected]>

## Changes Made Expose an async shuffle cache to python, remove use of python threadpool in the shuffle actor, spawn partitioner tasks that connect directly to writer tasks. ### How it works: Flight shuffle works by first partitioning and saving all outputs of the map pipeline to disk in arrow IPC stream format, then reading the files back and sending them to the reducer via arrow flight rpc. We use actor per nodes to do the partitioning and saving. Each actor has a `ShuffleCache` in rust which is responsible for partitioning and saving. Previously, we use python threads via python's concurrent threadpool executor to concurrently push partitions into the cache, which allows us to do partitioning in parallel. But, we shouldn't need to use a python threadpool, and instead be able to use the tokio runtimes in rust to do this. This reduces reliance on python, reduces number of threads, and will provide a smoother transition to using swordfish with the shuffle cache. To do this, we spawn `num_cpu` partitioner tasks and `num_partition` writer tasks on a single threaded runtime, the tasks then spawn compute on the compute runtime when needed. This is also how swordfish works. We then use https://docs.rs/pyo3-async-runtimes/latest/pyo3_async_runtimes/ to expose an async bridge to python. Since ray has async actors, this makes it easy to use async in python as well. ### Results: Question | Before (s) | After (s) | Difference (s) | % Change -- | -- | -- | -- | -- Q1 | 31.73 | 26.63 | -5.10 | -16.07% Q2 | 36.40 | 31.26 | -5.14 | -14.12% Q3 | 40.50 | 39.58 | -0.92 | -2.27% Q4 | 27.58 | 22.90 | -4.68 | -16.97% Q5 | 64.22 | 63.06 | -1.16 | -1.81% Q6 | 9.77 | 9.55 | -0.22 | -2.25% Q7 | 44.56 | 45.43 | 0.87 | 1.95% Q8 | 96.06 | 96.61 | 0.55 | 0.57% Q9 | 120.09 | 117.40 | -2.69 | -2.24% Q10 | 36.68 | 37.49 | 0.81 | 2.21% Q11 | 29.91 | 33.55 | 3.64 | 12.17% Q12 | 25.21 | 25.51 | 0.30 | 1.19% Q13 | 36.05 | 33.92 | -2.13 | -5.91% Q14 | 17.68 | 17.12 | -0.56 | -3.17% Q15 | 33.26 | 33.27 | 0.01 | 0.03% Q16 | 19.11 | 20.84 | 1.73 | 9.05% Q17 | 120.98 | 118.21 | -2.77 | -2.29% Q18 | 64.95 | 62.89 | -2.06 | -3.17% Q19 | 24.38 | 24.01 | -0.37 | -1.52% Q20 | 51.75 | 53.28 | 1.53 | 2.96% Q21 | 142.00 | 140.75 | -1.25 | -0.88% Q22 | 13.32 | 13.63 | 0.31 | 2.32% Total | 1086.19 | 1066.38 | -19.81 | -1.85% ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made  ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

Fix TimeUnit repr format doctest was failing because it didnt match the repr. ## Changes Made  ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made Mypy wasn't running. This removes the daft_dashboard arg (which is fine because it doesn't exist) and it works now. ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review) --------- Co-authored-by: R. Conner Howell <[email protected]>

## Changes Made Replace python runner with native runner in bug report. ## Related Issues  ## Checklist - [ ] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

## Changes Made Support split and merge jsonl/ndjson files, now only support jsonl and ndjson files. Unsupported: 1. json file. 2. compressed files haven't support now.  ## Related Issues #5615  Co-authored-by: cancai <[email protected]>

## Changes Made an an option to ignore deletion vectors on a delta lake table read instead of failing. ## Related Issues

## Changes Made Before this, if there were empty json/jsonl files in a path, `read_json` would fail. Now, add a new param named `skip_empty_files`, when there are empty json/jsonl files, users can decide whether to skip the empty files and continue the execution according to their own will.  ## Related Issues #5646  --------- Co-authored-by: cancai <[email protected]> Co-authored-by: Colin Ho <[email protected]>

## Changes Made adds a `values` function on primitive arrays to get a `ScalarBuffer`. this allows us to iterate over only the values. ## Related Issues

## Changes Made Types the `**options` param in the model APIs using TypedDicts. Each model api now has it's own specific TypedDict that indicates the a subset of the possible options that can be passed, providing better typing hints for the user and their IDEs. For most of the model apis, the options include `max_retries`, `on_error`, and `batch_size`, which are subsequently passed into the UDF. Provider specific options can be subclassed, for example the `OpenAIPromptOptions` is a subclass of `PromptOptions`, and has an additional field for `use_chat_completions`, which is specific to openai prompting. ## Related Issues

## Changes Made ## Overview The following diagram describes the core workflow of the new add_columns feature. When a user calls write_lance(mode='merge'), the system checks for schema changes: - No new columns: Falls back to the existing append mode. - New columns present: Triggers the new row-level merge workflow, which groups data by fragment_id and efficiently merges new columns within each fragment using a join key (e.g., _rowaddr), ultimately committing only metadata changes. <img width="1016" height="2786" alt="whiteboard_exported_image-已去除(lightpdf cn)" src="https://github.com/user-attachments/assets/c7f2a3c6-ac3b-419e-bb61-7d5e540d302e" /> # Core Interface Changes This optimization introduces new APIs and extends existing interfaces. The core changes are as follows: ## 1. DataFrame.write_lance() The write_lance method adds a merge mode to implement efficient schema evolution. - Mode: mode: Literal["create", "append", "overwrite", "merge"] = "create" - New merge mode behavior: 1. Dataset does not exist: Behaves like create mode. 2. Dataset exists but no new columns: Behaves like append mode. 3. Dataset exists with new columns: Triggers the row-level merge workflow. In this mode, the DataFrame must contain fragment_id and a join key (defaults to _rowaddr or as specified by left_on/right_on). - UT Example (tests/io/lancedb/test_lance_merge_evolution.py): ``` # Read existing data, including fragment_id and _rowaddr df_loaded = daft.read_lance( lance_dataset_path, default_scan_options={"with_row_address": True}, include_fragment_id=True ) # Derive a new column df_with_new_col = df_loaded.with_column("double_lat", daft.col("lat") * 2) # Write with merge mode; Daft handles column merging automatically df_with_new_col.write_lance(lance_dataset_path, mode="merge") ``` ## 2. daft.io.lance.read_lance() read_lance adds the include_fragment_id parameter to expose each fragment's ID as a new column (fragment_id) to the user upon reading. This is a key prerequisite for row-level merging. - New Parameter: include_fragment_id: Optional[bool] = None - UT Example (tests/io/lancedb/test_lance_merge_evolution.py): ``` df = daft.read_lance( lance_dataset_path, default_scan_options={"with_row_address": True}, include_fragment_id=True ) assert "fragment_id" in df.column_names ``` ## 3. daft.io.lance.merge_columns_df() This is a new low-level API that provides a more flexible, DataFrame-based row-level column merge capability. write_lance(mode='merge') internally wraps this function. - Core Functionality: Receives a DataFrame containing fragment_id, a join key, and new columns, and merges it efficiently into an existing Lance dataset. - UT Example (tests/io/lancedb/test_lance_merge_evolution.py): ``` # Prepare a DataFrame with only fragment_id, join key, and new columns df_subset = df_loaded.with_column("double_lat", daft.col("lat") * 2)\ .select("_rowaddr", "fragment_id", "double_lat") # Call the low-level API to perform the merge daft.io.lance.merge_columns_df( df_subset, lance_dataset_path, read_columns=["_rowaddr", "double_lat"], ) ``` ## 4. Interface Comparison Both interfaces support column merging, but their purpose and usage differ: - Interface 1: DataFrame.write_lance(mode="merge") — A writer-side merge; automatically detects new columns; requires fragment_id and a join key; returns write statistics metadata. - Interface 2: daft.io.lance.merge_columns_df() — A DataFrame-based row-level merge; explicitly takes a DataFrame with only fragment_id, join key, and new columns; returns None; allows for finer control via read_columns and batch_size. Usage Recommendation: - For end-to-end write pipelines with more intuitive semantics, prefer write_lance(mode="merge"). - For complex DataFrame transformations or custom column reading before writing, use merge_columns_df(). # Deprecation Plan With the introduction of the new` write_lance(mode='merge') `and `daft.io.lance.merge_columns_df`, the old `daft.io.lance.merge_columns` interface has been marked as deprecated. - Reason for Deprecation: The old interface was based on a fragment-level transformation function, which did not support DataFrames that had undergone complex operations (like joins, groupbys, etc.), limiting its flexibility and use cases. - Future Plan:` daft.io.lance.merge_columns` will be removed in a future release. All users are advised to migrate to the new write_lance(mode='merge') interface, which provides a more powerful and user-friendly column extension capability.  ## Related Issues

## Changes Made Adds a `RetryAfterException` that can be raised from a UDF, which will supply the UDF retry mechanism with a delay to sleep. This allows us to respect the retry-afters provided in headers from 429 and 503 errors from clients, such as our openai and google clients in the model apis. By default, all of our model apis should already be configured with max_retries = 3, and our exponential backoff has an initial delay of 100ms. It might be worth exploring in the future how we may want to more easily expose these configurations to users. ## Related Issues

## Changes Made Add support for Series[start:end]  ## Related Issues #4771  Co-authored-by: wangzheyan <[email protected]> Co-authored-by: Colin Ho <[email protected]>

## Changes Made  Change "We <3 developers!" to "We ❤️ developers!". I know this is a rather interesting expression, but some customers have interpreted this sentence as "Daft has no more than 3 full-time developers", mistakenly thinking that Daft is a personal project and thus being hesitant to adopt it in production easily. ## Related Issues  Signed-off-by: plotor <[email protected]>

## Changes Made Try to overwrite files via native IO instead of pyarrow fs, it's convenient to extend new sources. And keep the same logic with the overwrite_files of Python code even though there might some overwrite issue mentioned #5700, should submit another PR to handle it if worthy ## Related Issues

## Changes Made  ### Extended SQL LIKE Pattern Matching in MemoryCatalog - Upgrade `sqlparser` to extract namespace from query - Translate SQL LIKE patterns to regex (`src/daft-catalog/src/pattern.rs`) - Supports standard SQL LIKE wildcards: `%` (zero or more), `_` (exactly one), `\` (escape) - Updated documentation on `SHOW TABLES` syntax and pattern behavior ### Testing - Added pattern matching tests to `test_sql_show_tables.py` and in `src/daft-catalog/pattern.rs` - `cargo test -p daft-catalog pattern::` ## Related Issues  Closes #4461 Closes #4007 ## Checklist - [x] Documented in API Docs (if applicable) - [ ] Documented in User Guide (if applicable) - [ ] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [ ] Documentation builds and is formatted properly

## Changes Made Set the default ImageMode for decode_image to RGB. Currently, the behavior is to infer the image mode at a per-image level, which means we could have 8-bit and 16-bit images in the same column, which will error because the underlying datatypes (uint8 vs uint16) are different. The solution to this is to force decode into a specific image mode. This PR simply makes that the default. ## Related Issues

## Changes Made Since we support overwrite files and write empty files via native io from #5728 and #5682 , these two method is no need ## Related Issues

## Changes Made gets field and dtype roundtrips working. Note, since arrow-rs uses fields to store extension data, converting a datatype with extensions to arrow will fail by design. Instead you must operate on the field to preserve extension metadata. the following conversions are possible. - DaftField -> ArrowField - ArrowField -> DaftField - DaftDataType -> ArrowDataType where DaftDataType != Extension - ArrowDataType -> DaftDataType - ArrowField -> DaftDataType All of these are fallible operations. ## Related Issues  --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

## Changes Made This was failing https://github.com/Eventual-Inc/Daft/actions/runs/20347885747/job/58464760614 because of the failed assertion. I believe this is due to https://github.com/Eventual-Inc/Daft/pull/5676/changes#diff-a77514cfe53156c701c4b3b7426dc9d2c0d257deb52dae53fba976f51fcf8daf, which changed the way the buffer gets popped to be one by one instead of as many as possible, so we don't need the assertion anymore ## Related Issues

#5845) ## Changes Made the value of ` Estimated Scan Bytes` is from the disk file size, if there are some pushdown on source, it's a little bit confused especially trying to analysis the plan and try to optimize daft job. ## Related Issues

## Changes Made 1. support configure log format and date format 2. enable all logger by default 3. modify documentation to easy setup. ## Related Issues

## Changes Made 1. implement TosMultipartWriter 2. support native writer via tos ## Related Issues

#5855) Namespace the `RemoteFlotillaRunner` Ray actor name per Ray job / process to avoid cross-client plan ID collisions when multiple Daft clients connect to the same Ray cluster. - Introduce a cached suffix used to namespace the flotilla plan runner actor name per Ray job / process. - Derive the suffix from the Ray runtime context `job_id` when available. - If no job identifier can be determined, fall back to a per-process UUID so the actor name remains unique across different Python processes. - Update `FlotillaRunner` to construct the Ray actor with the new per-job actor name while keeping the `namespace="daft"` and `get_if_exists=True` semantics unchanged. - Add comments explaining the root cause: `DistributedPhysicalPlan` / `QueryIdx` uses a per-process counter, so plan IDs like `"0"` can collide across different clients if they share the same named actor instance. - Add a lightweight unit test that verifies the flotilla actor name is derived from the base name plus a job-specific suffix and remains stable within a process. ## Changes Made  ## Related Issues #5856

## Changes Made adds the following methods to most array implementations ```rs // convert to type erased ArrayRef // all arrays implement this fn to_arrow(&self) -> DaftResult<ArrayRef>; // convert to a concrete type which varies for each array type // such as DataArray<Int8Type> -> arrow::array::Int8Array // most arrays implement this, but not all of them. // this is also true for the original `as_arrow` (now `as_arrow2`) that this is intended to replace. fn as_arrow(&self) -> DaftResult<Self::ArrowOutput> ``` and `from` methods. ```rs // this is also implemented for all arrays. fn from_arrow<F: Into<Arc<Field>>>(field: F, array: ArrayRef) -> DaftResult<Self>; ``` ## Note for reviewers the bulk of the changes is in the following - `src/daft-core/src/array/ops/from_arrow.rs` - `src/daft-core/src/datatypes/logical.rs` - `src/daft-core/src/array/*` ## Related Issues  --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

… guide (#5865) ## Changes Made  Add instructions on the basic usage of Daft Dashboard in the development guide to reduce the learning cost for developers. Daft Dashboard is currently in the rapid development stage. There will be a dedicated article or page to introduce its functions and usage in detail later, but it is more appropriate to include it in the development guide at this stage. ## Related Issues  Signed-off-by: plotor <[email protected]>

…5864) ## Changes Made The integration test TestIcebergCountPushdown.test_count_pushdown_with_delete_files was failing for the test_overlapping_deletes table because it incorrectly enabled count pushdown. The root cause was that the initial Spark write created multiple small data files. Subsequent DELETE operations were optimized by Iceberg to mark entire data files as removed instead of generating position/equality delete files. As a result, Daft's _has_delete_files() check did not find any delete files and incorrectly allowed the count pushdown optimization. This PR fixes the test by adding coalesce(1) to the Spark DataFrame before writing the initial data for the test_overlapping_deletes table. This ensures the data is written to a single Parquet file, forcing subsequent DELETE operations to generate actual delete files. This aligns the test's behavior with its intent, correctly disabling count pushdown when delete files are present. ## Related Issues #5863 5863  --------- Co-authored-by: root <root@bytedance> Co-authored-by: huleilei <huleilei@bytedance>

## Changes Made Bump `mypy` to 1.19.1 and `ruff` to 0.14.10 and resolve pre-commit errors. Rationale: my local `ruff` version is much higher than that in the project `pre-commit`, so I am getting a lot of linter warnings in my IDE. It is also about time to upgrade lints to utilize newer Python language features. ## Related Issues

jaychia and others added 2 commits August 25, 2022 23:55

Limit after a computation breaks the computation

08b3fa7

fix for limit after projection

3b80f90

jaychia merged commit 2daf382 into main Aug 26, 2022

jaychia deleted the jay/compute-limit-bug-repro branch August 26, 2022 08:14

greptile-apps bot mentioned this pull request Dec 14, 2025

docs: Update slack invite #5813

Merged

greptile-apps bot mentioned this pull request Dec 17, 2025

chore: Add huggingface url read with owned dataset #5821

Closed

greptile-apps bot mentioned this pull request Dec 18, 2025

feat: support configure logging fmt of setup_logger #5846

Merged

kevinzwang pushed a commit that referenced this pull request Dec 19, 2025

feat: support native writer via tos (#5760)

c1ee691

## Changes Made 1. implement TosMultipartWriter 2. support native writer via tos ## Related Issues

This was referenced Dec 22, 2025

chore: Add basic usage instructions for daft-dashboard in development guide #5865

Merged

chore: enable task split and merge by default #5866

Closed

This was referenced Dec 24, 2025

fix: Optimize the display information of Join nodes in query plan #5617

Open

chore: remove deprecated __column_input_to_expression method #5892

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compute Limit bugfix #123

Compute Limit bugfix #123

Uh oh!

jaychia commented Aug 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Compute Limit bugfix #123

Compute Limit bugfix #123

Uh oh!

Conversation

jaychia commented Aug 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants