Skip to content

Commit d97cb28

Browse files
committed
fix: collect_columns quadratic complexity
Fix accidental $O(n^2)$ in `collect_columns`. There are the following ways to insert a clone into a hash set: - **clone before check:** Basically `set.insert(x.clone())`. That's rather expensive if you have a lot of duplicates. - **iter & clone:** That's what we do right now, but that's in $O(n^2)$. - **check & insert:** Basically `if !set.contains(x) {set.insert(x.clone())}` That requires two hash probes though. - **entry-based API:** Sadly the stdlib doesn't really offer any handle/entry-based APIs yet (see rust-lang/rust#60896 ), but `hashbrown` does, so we can use the nice `set.get_or_insert_owned(x)` which will only clone the reference if it doesn't exists yet and only hashes once. We now use the last approach.
1 parent 1c98e6e commit d97cb28

File tree

1 file changed

+3
-4
lines changed
  • datafusion/physical-expr/src/utils

1 file changed

+3
-4
lines changed

datafusion/physical-expr/src/utils/mod.rs

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,10 @@
1717

1818
mod guarantee;
1919
pub use guarantee::{Guarantee, LiteralGuarantee};
20+
use hashbrown::HashSet;
2021

2122
use std::borrow::Borrow;
22-
use std::collections::{HashMap, HashSet};
23+
use std::collections::HashMap;
2324
use std::sync::Arc;
2425

2526
use crate::expressions::{BinaryExpr, Column};
@@ -204,9 +205,7 @@ pub fn collect_columns(expr: &Arc<dyn PhysicalExpr>) -> HashSet<Column> {
204205
let mut columns = HashSet::<Column>::new();
205206
expr.apply(|expr| {
206207
if let Some(column) = expr.as_any().downcast_ref::<Column>() {
207-
if !columns.iter().any(|c| c.eq(column)) {
208-
columns.insert(column.clone());
209-
}
208+
columns.get_or_insert_owned(column);
210209
}
211210
Ok(TreeNodeRecursion::Continue)
212211
})

0 commit comments

Comments
 (0)