Filter function simplification when there are multiple Iceberg equality delete files

### Description

Velox can do logical expression flattening, but still can't automatically simplify the logical expression. For example, the expression a AND (b AND (c AND d)) would be flattened as AND(a,b,c,d), but a AND (a OR b) cannot be automatically simplified to a, therefore to evaluate a AND (a OR b), a and b will both be evaluated, and one AND and one OR operation need to be performed. While we hope to improve logical expression simplification in the future, we can still do some simple improvements for Iceberg now.

An Iceberg split can come with multiple equality delete files and their schemas could have overlaps. For example
Equality delete file 1
```
equality_ids=[1, 2, 3]
1: id | 2: category | 3: name
-------|-------------|---------
 1      |   mouse   | Micky
 2      |   mouse   | Minnie
 3      |     bear     | Winnie
 4      |     bear     | Betty
```
Equality delete file 2
```
equality_ids=[2]
2: category 
---------------
   mouse
```
Equality delete file 3
```
equality_ids=[2, 3]
2: category  | 3: name
----------------|-------------
   bear           | Winnie
```
We see that equality delete file 2 is on the category column and would remove all tuples with value mouse. This means that the first two rows in equality delete file 1 are already contained and doesn’t need to be read or compiled. Similarly, the single row in file 3 contains row 3 in file 1, therefore row 3 in file 1 doesn’t need to be read or compiled. The simplified delete files are like the follows:
```
equality_ids=[1, 2, 3]
1: id | 2: category | 3: name
-------|-------------|---------
 4      |     bear     | Betty
```
and
```
equality_ids=[2]
2: category 
---------------
   mouse
```
and
```
equality_ids=[2, 3]
2: category  | 3: name
----------------|-------------
   bear           | Winnie
```
With this simplification, the resulted expression would be simpler and the evaluation cost will be reduced.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Filter function simplification when there are multiple Iceberg equality delete files #8

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Filter function simplification when there are multiple Iceberg equality delete files #8

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions