-
Notifications
You must be signed in to change notification settings - Fork 28
fix: resolve parquet duplication of rows #1424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1424 +/- ##
==========================================
- Coverage 89.17% 88.94% -0.23%
==========================================
Files 23 23
Lines 2587 2561 -26
==========================================
- Hits 2307 2278 -29
- Misses 280 283 +3
🚀 New features to boost your workflow:
|
84fe41d to
2517079
Compare
ivirshup
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any reproducible cases of the bug that we can add as a test case?
And do we know that there are no duplicate rows in the converted parquet file?
I wasn't able to reproduce it, but narrowed down the possible area of code it could be caused.
We know there are no duplicate in the parquet file because we have an explicit check for that as part of our validation. It's only after we write it back out to bgzip that we end up with duplicates. |
(cherry picked from commit 0130ace)
(cherry picked from commit 0130ace)
Reason for Change
Changes
Testing
copied a atac fragment with known duplicates
and then ran these tests on it
Notes for Reviewer