Skip to content

🚧 Implement an experimental Parquet reader optimized for highly-selective hybrid scan reads #18011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
7791b35
Add first five APIs of hybrid scan
mhaseeb123 Feb 14, 2025
f4efce6
Dictionary page pruning basics and page pruning
mhaseeb123 Feb 19, 2025
e6aaa31
Style fix and merge stats filter changes
mhaseeb123 Feb 19, 2025
c1f6b09
Remove redefinition of host_column
mhaseeb123 Feb 19, 2025
2fafc14
Merge branch 'branch-25.04' into fea/hybrid-scan-footer
mhaseeb123 Feb 19, 2025
42738d2
Minor bug fix
mhaseeb123 Feb 19, 2025
242df4f
move defs to stats_filter_helpers.hpp
mhaseeb123 Feb 19, 2025
3451cb8
Add dictionary filtering basics
mhaseeb123 Feb 20, 2025
3a05451
GPU algorithm to conver per-page col to per-row col
mhaseeb123 Feb 20, 2025
5031ed0
Add gtest for row group and page pruning. Fix page pruning code
mhaseeb123 Mar 4, 2025
f2729bc
Code cleanup and style fix
mhaseeb123 Mar 4, 2025
aa7d551
Revert erroneous change
mhaseeb123 Mar 4, 2025
7042670
Merge branch 'branch-25.04' into fea/hybrid-scan-footer
mhaseeb123 Mar 4, 2025
40bb46a
cmake changes for the new gtest
mhaseeb123 Mar 4, 2025
f52843d
Merge branch 'branch-25.04' into fea/hybrid-scan-footer
mhaseeb123 Mar 4, 2025
d7769d2
Fix more bugs in page filtering with stats
mhaseeb123 Mar 5, 2025
648a095
Accelerate page pruning for strings cols
mhaseeb123 Mar 8, 2025
e6bfd81
Comments and minor improvements
mhaseeb123 Mar 8, 2025
dec6ace
Small perf improvement
mhaseeb123 Mar 8, 2025
e291ac1
Merge branch 'branch-25.04' into fea/hybrid-scan-footer
mhaseeb123 Mar 8, 2025
5eaea66
Bare bones materialize filter cols
mhaseeb123 Mar 8, 2025
a854dc2
Accelerate page span selection algorithm
mhaseeb123 Mar 10, 2025
a2f109d
Deduplicate `make_page_indices` function
mhaseeb123 Mar 10, 2025
67be821
Minor comments and headers update
mhaseeb123 Mar 11, 2025
56d3bb4
Copy over reader chunking and preprocess code
mhaseeb123 Mar 13, 2025
d452a64
Push all copied changes from normal reader
mhaseeb123 Mar 13, 2025
4e79466
Remaining reader infrastructure
mhaseeb123 Mar 13, 2025
266a618
Minor clean up
mhaseeb123 Mar 13, 2025
cc87f66
Minor renaming and remove chunking
mhaseeb123 Mar 14, 2025
fcec55e
Code cleanup
mhaseeb123 Mar 26, 2025
6fb04ad
Changes from skip decode
mhaseeb123 Mar 26, 2025
34bfe57
Merge branch 'branch-25.06' into fea/hybrid-scan-footer
mhaseeb123 Mar 26, 2025
6c115fb
Revert erroneous changes
mhaseeb123 Mar 26, 2025
02a45e2
Merge branch 'branch-25.06' into fea/hybrid-scan-footer
mhaseeb123 Mar 26, 2025
a5766aa
Add hybrid_scan_page_pruning.cu to cmake
mhaseeb123 Mar 26, 2025
8e3d38d
Merge uncomp changes from branch-25.06 in hybrid scan
mhaseeb123 Mar 26, 2025
cf56eee
Filter columns materialized
mhaseeb123 Mar 26, 2025
2a65ae1
Separate page and row group pruning in the gtests
mhaseeb123 Mar 26, 2025
96d11ee
Style fix and minor improvements
mhaseeb123 Mar 26, 2025
072a259
Code clean up. Update gtest to only check filter column
mhaseeb123 Mar 26, 2025
fd9a757
Add comments to hybrid scan api use
mhaseeb123 Mar 27, 2025
b760890
Add API 10-11 signatures
mhaseeb123 Mar 27, 2025
d0edff2
Revert unnecessary changes + check tables equivalent instead of equal
mhaseeb123 Mar 27, 2025
cd98468
Revert eols in comments
mhaseeb123 Mar 27, 2025
c9c371c
Materialize payload columns
mhaseeb123 Apr 2, 2025
7340117
Improvements
mhaseeb123 Apr 2, 2025
a403935
Merge from #18347
mhaseeb123 Apr 2, 2025
d50edaa
Remove hybrid_scan_page_pruning.cu from cmake
mhaseeb123 Apr 2, 2025
ad30216
Reorder args to decoders
mhaseeb123 Apr 2, 2025
a012cad
Merge branch 'branch-25.06' into fea/hybrid-scan-footer
mhaseeb123 Apr 2, 2025
98511c2
Minor
mhaseeb123 Apr 2, 2025
c41c60b
bug fixing for failing gtest
mhaseeb123 Apr 3, 2025
c9cfbbb
Cleanup
mhaseeb123 Apr 3, 2025
33042ed
Cleanup and docstrings
mhaseeb123 Apr 4, 2025
b3a506f
More API cleanup, add docstrings
mhaseeb123 Apr 4, 2025
7494dcf
Minor improvement
mhaseeb123 Apr 4, 2025
98cda58
Allow selecting specified payload columns as well
mhaseeb123 Apr 4, 2025
3d16da1
Improve variable names and docstrings
mhaseeb123 Apr 5, 2025
af56e8a
Minor improvements
mhaseeb123 Apr 7, 2025
db32df9
Carry over improvements from #18347
mhaseeb123 Apr 8, 2025
4754c7e
Minor code cleanup
mhaseeb123 Apr 8, 2025
25de212
Update the hybrid scan test
mhaseeb123 Apr 9, 2025
516d579
Minor comments update
mhaseeb123 Apr 9, 2025
cbfbaad
Minor improvements
mhaseeb123 Apr 9, 2025
5d25bf7
Merge branch 'branch-25.06' into fea/hybrid-scan-footer
mhaseeb123 Apr 9, 2025
6ed12a2
Merge latest parquet changes
mhaseeb123 Apr 9, 2025
0e69535
Get parquet file footer API
mhaseeb123 Apr 9, 2025
40fc7eb
Minor improvements
mhaseeb123 Apr 9, 2025
7a47f43
Directly use the hybrid_scan_reader instead of via duplicated APIs
mhaseeb123 Apr 9, 2025
f7f277d
Minor improvements
mhaseeb123 Apr 10, 2025
4a70586
Improve gtest
mhaseeb123 Apr 10, 2025
cb2b3fd
Merge changes from 18480
mhaseeb123 Apr 11, 2025
f9be51a
Merge decode skipping PR
mhaseeb123 Apr 11, 2025
12ff2eb
Merge branch 'branch-25.06' into fea/hybrid-scan-footer
mhaseeb123 Apr 11, 2025
5314a1e
Merge changes
mhaseeb123 Apr 11, 2025
22a62a5
Fix `std::accumulate`s across the experimental reader
mhaseeb123 Apr 11, 2025
4a41640
Merge branch 'branch-25.06' into fea/hybrid-scan-footer
mhaseeb123 Apr 18, 2025
48da795
Merge branch 'branch-25.06' into fea/hybrid-scan-footer
mhaseeb123 Apr 21, 2025
52c2069
Minor improvements
mhaseeb123 Apr 21, 2025
705520b
Merge branch 'branch-25.06' into fea/hybrid-scan-footer
mhaseeb123 Apr 22, 2025
666a6d1
Simplify `select_payload_columns`
mhaseeb123 Apr 22, 2025
6e58f33
skip decompressing of pruned data pages
mhaseeb123 Apr 22, 2025
bfa06da
Minor improvement
mhaseeb123 Apr 22, 2025
a9a11dc
Improvements for tests
mhaseeb123 Apr 23, 2025
cbe0f50
Merge branch 'branch-25.06' into fea/hybrid-scan-footer
mhaseeb123 Apr 23, 2025
6c8f7b8
Merge commit
mhaseeb123 Apr 23, 2025
3bc48be
Use host_span for bloom filter and dictionary device buffers
mhaseeb123 Apr 23, 2025
ada9b8a
Use host_vector instead of std::vector for bool page mask
mhaseeb123 Apr 24, 2025
2c83fce
Style consistency
mhaseeb123 Apr 24, 2025
05c52b3
Merge updates from #18480
mhaseeb123 Apr 24, 2025
aaae90b
Merge from PR 18480
mhaseeb123 Apr 24, 2025
8e99f82
Handle empty page mask in materialization
mhaseeb123 Apr 28, 2025
3f34dab
Handle empty row mask in materialize payload column
mhaseeb123 Apr 28, 2025
45cff8e
Add `num_rows_in_row_groups` API
mhaseeb123 Apr 29, 2025
19560c5
Minor improvements
mhaseeb123 Apr 29, 2025
c12e6d8
revert minor changes
mhaseeb123 Apr 29, 2025
a51dda0
Add docs from #18480
mhaseeb123 Apr 30, 2025
c6c2795
Sync with child PRs
mhaseeb123 May 1, 2025
32a9ec6
Minor improvements
mhaseeb123 May 1, 2025
084f958
Minor improvements
mhaseeb123 May 1, 2025
5b33d40
Merge from reviews
mhaseeb123 May 1, 2025
402dd8f
remove duplicate `build_string_dict_indices`
mhaseeb123 May 2, 2025
d0b2e03
Minor improvement
mhaseeb123 May 3, 2025
1804f44
Sync with child PR
mhaseeb123 May 6, 2025
8188117
MInor improvement
mhaseeb123 May 6, 2025
6bf70b1
Sync
mhaseeb123 May 8, 2025
5d4426d
Sync with child PRs
mhaseeb123 May 8, 2025
50166fa
Use aligned_mr for bloom filter allocations
mhaseeb123 May 8, 2025
fc29653
Minor improvement
mhaseeb123 May 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -521,6 +521,13 @@ add_library(
src/io/parquet/compact_protocol_reader.cpp
src/io/parquet/compact_protocol_writer.cpp
src/io/parquet/decode_preprocess.cu
src/io/parquet/experimental/dictionary_page_filter.cu
src/io/parquet/experimental/hybrid_scan.cpp
src/io/parquet/experimental/hybrid_scan_impl.cpp
src/io/parquet/experimental/hybrid_scan_helpers.cpp
src/io/parquet/experimental/hybrid_scan_preprocess.cu
src/io/parquet/experimental/hybrid_scan_chunking.cu
src/io/parquet/experimental/page_index_filter.cu
src/io/parquet/page_data.cu
src/io/parquet/chunk_dict.cu
src/io/parquet/page_enc.cu
Expand Down Expand Up @@ -806,6 +813,7 @@ set_source_files_properties(
PROPERTIES COMPILE_DEFINITIONS "_FILE_OFFSET_BITS=64"
)


set_property(
SOURCE src/io/parquet/writer_impl.cu
APPEND
Expand Down
479 changes: 479 additions & 0 deletions cpp/include/cudf/io/experimental/hybrid_scan.hpp

Large diffs are not rendered by default.

5 changes: 3 additions & 2 deletions cpp/include/cudf/io/parquet.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -100,12 +100,13 @@ class parquet_reader_options {
explicit parquet_reader_options() = default;

/**
* @brief Creates a parquet_reader_options_builder which will build parquet_reader_options.
* @brief Creates a `parquet_reader_options_builder` to build `parquet_reader_options`.
* By default, build with empty data source info.
*
* @param src Source information to read parquet file
* @return Builder to build reader options
*/
static parquet_reader_options_builder builder(source_info src);
static parquet_reader_options_builder builder(source_info src = source_info{});

/**
* @brief Returns source info.
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/io/parquet/bloom_filter_reader.cu
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,7 @@ std::vector<Type> aggregate_reader_metadata::get_parquet_types(
}

std::optional<std::vector<std::vector<size_type>>> aggregate_reader_metadata::apply_bloom_filters(
std::vector<rmm::device_buffer>& bloom_filter_data,
cudf::host_span<rmm::device_buffer> bloom_filter_data,
host_span<std::vector<size_type> const> input_row_group_indices,
host_span<std::vector<ast::literal*> const> literals,
size_type total_row_groups,
Expand Down
134 changes: 134 additions & 0 deletions cpp/src/io/parquet/experimental/dictionary_page_filter.cu
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include "hybrid_scan_helpers.hpp"

#include <cudf/ast/detail/expression_transformer.hpp>
#include <cudf/ast/detail/operators.hpp>
#include <cudf/ast/expressions.hpp>
#include <cudf/detail/cuco_helpers.hpp>
#include <cudf/detail/transform.hpp>
#include <cudf/hashing/detail/xxhash_64.cuh>
#include <cudf/io/parquet_schema.hpp>
#include <cudf/logger.hpp>
#include <cudf/utilities/span.hpp>
#include <cudf/utilities/traits.hpp>
#include <cudf/utilities/type_checks.hpp>

#include <rmm/cuda_stream_view.hpp>
#include <rmm/device_buffer.hpp>
#include <rmm/exec_policy.hpp>

#include <thrust/iterator/counting_iterator.h>
#include <thrust/tabulate.h>

#include <future>
#include <numeric>
#include <optional>

namespace cudf::io::parquet::experimental::detail {

dictionary_literals_and_operators_collector::dictionary_literals_and_operators_collector() =
default;

dictionary_literals_and_operators_collector::dictionary_literals_and_operators_collector(
ast::expression const& expr, cudf::size_type num_input_columns)
{
_num_input_columns = num_input_columns;
_literals.resize(num_input_columns);
_operators.resize(num_input_columns);
expr.accept(*this);
}

std::reference_wrapper<ast::expression const> dictionary_literals_and_operators_collector::visit(
ast::column_reference const& expr)
{
CUDF_EXPECTS(expr.get_table_source() == ast::table_reference::LEFT,
"DictionaryAST supports only left table");
CUDF_EXPECTS(expr.get_column_index() < _num_input_columns,
"Column index cannot be more than number of columns in the table");
return expr;
}

std::reference_wrapper<ast::expression const> dictionary_literals_and_operators_collector::visit(
ast::column_name_reference const& expr)
{
CUDF_FAIL("Column name reference is not supported in DictionaryAST");
}

std::reference_wrapper<ast::expression const> dictionary_literals_and_operators_collector::visit(
ast::operation const& expr)
{
using cudf::ast::ast_operator;
auto const operands = expr.get_operands();
auto const op = expr.get_operator();

if (auto* v = dynamic_cast<ast::column_reference const*>(&operands[0].get())) {
// First operand should be column reference, second should be literal.
CUDF_EXPECTS(cudf::ast::detail::ast_operator_arity(op) == 2,
"Only binary operations are supported on column reference");
auto const literal_ptr = dynamic_cast<ast::literal const*>(&operands[1].get());
CUDF_EXPECTS(literal_ptr != nullptr,
"Second operand of binary operation with column reference must be a literal");
v->accept(*this);

// Push to the corresponding column's literals and operators list iff EQUAL or NOT_EQUAL
// operator is seen
if (op == ast_operator::EQUAL or op == ast::ast_operator::NOT_EQUAL) {
auto const col_idx = v->get_column_index();
_literals[col_idx].emplace_back(const_cast<ast::literal*>(literal_ptr));
_operators[col_idx].emplace_back(op);
}
} else {
// Just visit the operands and ignore any output
std::ignore = visit_operands(operands);
}

return expr;
}

std::pair<std::vector<std::vector<ast::literal*>>, std::vector<std::vector<ast::ast_operator>>>
dictionary_literals_and_operators_collector::get_literals_and_operators() &&
{
return {std::move(_literals), std::move(_operators)};
}

std::optional<std::vector<std::vector<size_type>>>
aggregate_reader_metadata::apply_dictionary_filter(
cudf::host_span<rmm::device_buffer> dictionaries,
host_span<std::vector<size_type> const> input_row_group_indices,
host_span<std::vector<ast::literal*> const> literals,
host_span<std::vector<ast::ast_operator> const> operators,
size_type total_row_groups,
host_span<data_type const> output_dtypes,
host_span<int const> dictionary_col_schemas,
std::reference_wrapper<ast::expression const> filter,
rmm::cuda_stream_view stream) const
{
return {};
}

std::vector<rmm::device_buffer> aggregate_reader_metadata::materialize_dictionaries(
cudf::host_span<rmm::device_buffer> dictionary_page_data,
host_span<std::vector<size_type> const> input_row_group_indices,
host_span<data_type const> output_dtypes,
host_span<int const> dictionary_col_schemas,
rmm::cuda_stream_view stream) const
{
return {};
}

} // namespace cudf::io::parquet::experimental::detail
181 changes: 181 additions & 0 deletions cpp/src/io/parquet/experimental/hybrid_scan.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
/*
* Copyright (c) 2025, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include "cudf/utilities/error.hpp"
#include "hybrid_scan_impl.hpp"

#include <cudf/io/experimental/hybrid_scan.hpp>

#include <thrust/host_vector.h>

namespace cudf::io::parquet::experimental {

using text::byte_range_info;

hybrid_scan_reader::hybrid_scan_reader(cudf::host_span<uint8_t const> footer_bytes,
parquet_reader_options const& options)
: _impl{std::make_unique<detail::hybrid_scan_reader_impl>(footer_bytes, options)}
{
}

hybrid_scan_reader::~hybrid_scan_reader() = default;

[[nodiscard]] byte_range_info hybrid_scan_reader::page_index_byte_range() const
{
return _impl->page_index_byte_range();
}

[[nodiscard]] FileMetaData hybrid_scan_reader::parquet_metadata() const
{
return _impl->parquet_metadata();
}

void hybrid_scan_reader::setup_page_index(cudf::host_span<uint8_t const> page_index_bytes) const
{
return _impl->setup_page_index(page_index_bytes);
}

std::vector<cudf::size_type> hybrid_scan_reader::all_row_groups(
parquet_reader_options const& options) const
{
CUDF_EXPECTS(options.get_row_groups().size() <= 1,
"Encountered invalid size of row group indices in parquet reader options");

// If row groups are specified in parquet reader options, return them as is
if (options.get_row_groups().size() == 1) { return options.get_row_groups().front(); }

return _impl->all_row_groups(options);
}

size_type hybrid_scan_reader::num_rows_in_row_groups(
cudf::host_span<size_type const> row_group_indices) const
{
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};
return _impl->num_rows_in_row_groups(input_row_group_indices);
}

std::vector<cudf::size_type> hybrid_scan_reader::filter_row_groups_with_stats(
cudf::host_span<size_type const> row_group_indices,
parquet_reader_options const& options,
rmm::cuda_stream_view stream) const
{
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};
return _impl->filter_row_groups_with_stats(input_row_group_indices, options, stream).front();
}

std::pair<std::vector<byte_range_info>, std::vector<byte_range_info>>
hybrid_scan_reader::secondary_filters_byte_ranges(
cudf::host_span<size_type const> row_group_indices, parquet_reader_options const& options) const
{
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};
return _impl->secondary_filters_byte_ranges(input_row_group_indices, options);
}

std::vector<cudf::size_type> hybrid_scan_reader::filter_row_groups_with_dictionary_pages(
cudf::host_span<rmm::device_buffer> dictionary_page_data,
cudf::host_span<size_type const> row_group_indices,
parquet_reader_options const& options,
rmm::cuda_stream_view stream) const
{
CUDF_EXPECTS(row_group_indices.size() == dictionary_page_data.size(),
"Mismatch in size of input row group indices and dictionary page device buffers");
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};
return _impl
->filter_row_groups_with_dictionary_pages(
dictionary_page_data, input_row_group_indices, options, stream)
.front();
}

std::vector<cudf::size_type> hybrid_scan_reader::filter_row_groups_with_bloom_filters(
cudf::host_span<rmm::device_buffer> bloom_filter_data,
cudf::host_span<size_type const> row_group_indices,
parquet_reader_options const& options,
rmm::cuda_stream_view stream) const
{
CUDF_EXPECTS(row_group_indices.size() == bloom_filter_data.size(),
"Mismatch in size of input row group indices and bloom filter device buffers");
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};
return _impl
->filter_row_groups_with_bloom_filters(
bloom_filter_data, input_row_group_indices, options, stream)
.front();
}

std::pair<std::unique_ptr<cudf::column>, std::vector<thrust::host_vector<bool>>>
hybrid_scan_reader::filter_data_pages_with_stats(cudf::host_span<size_type const> row_group_indices,
parquet_reader_options const& options,
rmm::cuda_stream_view stream,
rmm::device_async_resource_ref mr) const
{
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};
return _impl->filter_data_pages_with_stats(input_row_group_indices, options, stream, mr);
}

[[nodiscard]] std::vector<byte_range_info> hybrid_scan_reader::filter_column_chunks_byte_ranges(
cudf::host_span<size_type const> row_group_indices, parquet_reader_options const& options) const
{
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};
return _impl->filter_column_chunks_byte_ranges(input_row_group_indices, options).first;
}

table_with_metadata hybrid_scan_reader::materialize_filter_columns(
cudf::host_span<thrust::host_vector<bool> const> data_page_mask,
cudf::host_span<size_type const> row_group_indices,
std::vector<rmm::device_buffer> column_chunk_buffers,
cudf::mutable_column_view row_mask,
parquet_reader_options const& options,
rmm::cuda_stream_view stream) const
{
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};
return _impl->materialize_filter_columns(data_page_mask,
input_row_group_indices,
std::move(column_chunk_buffers),
row_mask,
options,
stream);
}

[[nodiscard]] std::vector<byte_range_info> hybrid_scan_reader::payload_column_chunks_byte_ranges(
cudf::host_span<size_type const> row_group_indices, parquet_reader_options const& options) const
{
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};
return _impl->payload_column_chunks_byte_ranges(input_row_group_indices, options).first;
}

table_with_metadata hybrid_scan_reader::materialize_payload_columns(
cudf::host_span<size_type const> row_group_indices,
std::vector<rmm::device_buffer> column_chunk_buffers,
cudf::column_view row_mask,
parquet_reader_options const& options,
rmm::cuda_stream_view stream) const
{
auto const input_row_group_indices =
std::vector<std::vector<size_type>>{{row_group_indices.begin(), row_group_indices.end()}};

return _impl->materialize_payload_columns(
input_row_group_indices, std::move(column_chunk_buffers), row_mask, options, stream);
}

} // namespace cudf::io::parquet::experimental
Loading