-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix issue with reading footer of orc file with large stripe #25634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue with reading footer of orc file with large stripe #25634
Conversation
|
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Anton Kutuzov.
|
0385952 to
6adcd9b
Compare
6adcd9b to
8e9477e
Compare
raunaqmorarka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you show the full error stacktrace from the failed query ?
I don't think Trino would write orc files with such large stripe row count, so I'd expect the code changes to be mostly in the reader classes.
Yes, of course. We write this files using other tools. The code changes in writer classes only in places where the |
8e9477e to
2f4a113
Compare
|
@raunaqmorarka could you please merge PR? Or do you wait something from me? |
lib/trino-orc/src/main/java/io/trino/orc/OrcWriteValidation.java
Outdated
Show resolved
Hide resolved
82dae36 to
9c49dbc
Compare
9c49dbc to
0a076bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue with reading ORC files that have a very large stripe by updating the data types and method signatures from int to long where necessary to prevent integer overflow.
- Update test coverage to verify large stripe row counts
- Change field types and method parameters from int to long
- Adjust calculations (e.g. ceil function) accordingly in multiple modules
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| lib/trino-orc/src/test/java/io/trino/orc/metadata/TestOrcMetadataReader.java | Adds a test verifying correct handling of large row counts |
| lib/trino-orc/src/main/java/io/trino/orc/metadata/StripeInformation.java | Updates the numberOfRows field and related accessors from int to long |
| lib/trino-orc/src/main/java/io/trino/orc/metadata/OrcMetadataReader.java | Adapts conversion to use long values for stripe row counts |
| lib/trino-orc/src/main/java/io/trino/orc/StripeReader.java | Changes parameters and local variable types from int to long and revises the ceil function |
| lib/trino-orc/src/main/java/io/trino/orc/OrcWriterStats.java | Updates the signature for stripeRows to long |
| lib/trino-orc/src/main/java/io/trino/orc/OrcWriterFlushStats.java | Updates the signature for stripeRows to long |
| lib/trino-orc/src/main/java/io/trino/orc/OrcWriteValidation.java | Revises addStripe method to accept long values |
| lib/trino-orc/src/main/java/io/trino/orc/OrcRecordReader.java | Updates helper method validateWriteStripe to use long |
Description
We have tables in dds layer that look like
where
id1is unique andid2has only two values. The data file has 70 MB size and more than 7 000 000 000 rows.When we try to read data the error occurs
It happens because OrcProto support
longtype, but in the trino code we haveinttype.Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: