feat(cudf): Update hybrid scan usage and cuDF dependency tree#16226
feat(cudf): Update hybrid scan usage and cuDF dependency tree#16226mhaseeb123 wants to merge 12 commits intofacebookincubator:mainfrom
Conversation
✅ Deploy Preview for meta-velox canceled.
|
|
|
||
| bool CudfHiveConfig::useExperimentalCudfReader() const { | ||
| return config_->get<bool>(kUseExperimentalCudfReader, false); | ||
| bool CudfHiveConfig::useOldCudfReader() const { |
There was a problem hiding this comment.
Hybrid scan will now be the default parquet reader in CudfHiveDataSource. The old reader is kept as backup (enabled by config) until we have thoroughly tested the new reader and will be removed altogether after that.
| outputName); | ||
|
|
||
| auto* handle = static_cast<const hive::HiveColumnHandle*>(it->second.get()); | ||
| readColumnSet_.emplace(handle->name()); |
There was a problem hiding this comment.
Minor improvement, use a companion unordered_set to check if a column name has already been inserted instead of std::find
| stream->readFully(reinterpret_cast<char*>(dst), size); | ||
| } | ||
|
|
||
| referenceToNameConverter::referenceToNameConverter( |
There was a problem hiding this comment.
Taken care of internally in libcudf. No need for this anymore
| const size_t fileSize_; | ||
| }; | ||
|
|
||
| // ---------------- Internal helper ---------------- |
There was a problem hiding this comment.
No longer needed, libcudf takes care of this now
| len - ender->footer_len - ender_len, ender->footer_len); | ||
| } | ||
|
|
||
| std::vector<std::unique_ptr<cudf::io::datasource>> |
There was a problem hiding this comment.
No longer needed. As you guessed it, libcudf has this util now
| // Set column projection if needed | ||
| if (readColumnNames_.size()) { | ||
| readerOptions_.set_columns(readColumnNames_); | ||
| readerOptions_.set_column_names(readColumnNames_); |
There was a problem hiding this comment.
Explicitly use either set_column_names or set_column_indices if want to select columns by their respective indices in the parquet file.
bdice
left a comment
There was a problem hiding this comment.
Nice work. This will be great to have the new reader.
devavret
left a comment
There was a problem hiding this comment.
Approving because we've tested this.
Description
Supersedes #1622, #16036, and #16160
This PR enables Velox to use updated hybrid scan APIs as well as do smarter IO when possible. This PR also updates cuDF and its dependencies to fetch the said hybrid scan APIs from cudf as well as to fix an issue that @simoneves observed with full debug builds of Velox with cuDF. The dependencies of cuDF (RMM, rapids-cmake, CCCL) have been updated to fix this issue. For details, see PRs linked in rapidsai/rapids-cmake#979.