Skip to content

Update String Transform Examples #19407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: branch-25.08
Choose a base branch
from

Conversation

lamarrr
Copy link
Contributor

@lamarrr lamarrr commented Jul 17, 2025

Description

This pull request makes changes needed for the string transforms examples. It:

  • Fixes an overflow in calculating the scratch space
  • Updates the CSV row per chunk to speed up csv write
  • Adds missing NVTX range to cudf::io::write_csv
  • Renames the source files for the examples

Closes #19041

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@lamarrr lamarrr requested a review from a team as a code owner July 17, 2025 13:36
@lamarrr lamarrr requested review from devavret and mhaseeb123 July 17, 2025 13:36
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jul 17, 2025
@lamarrr lamarrr added the bug Something isn't working label Jul 17, 2025
@lamarrr lamarrr changed the title [FEA] Update String Transform Examples [BUG+FEA] Update String Transform Examples Jul 17, 2025
@lamarrr lamarrr changed the title [BUG+FEA] Update String Transform Examples Update String Transform Examples Jul 17, 2025
@lamarrr lamarrr added the non-breaking Non-breaking change label Jul 17, 2025
@lamarrr lamarrr requested review from a team as code owners July 17, 2025 14:31
@lamarrr lamarrr requested a review from AyodeAwe July 17, 2025 14:31
@github-actions github-actions bot added the CMake CMake build issue label Jul 17, 2025
@davidwendt davidwendt self-requested a review July 17, 2025 14:33
@@ -411,6 +412,8 @@ void write_csv(data_sink* out_sink,
// write header: column names separated by delimiter:
// (even for tables with no rows)
//
cudf::scoped_range range("io::csv::write_csv");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhaseeb123 Is this where you would put the scoped range?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me as this is the main CSV write function that gets called from cudf::io::write_csv and I don't see any CUDF_FUNC_RANGE() in any of the functions called from in here so this one should capture the complete write_csv time. In future, we might want to put scoped ranges in maybe (write_chunked_begin and write_chunked for better granularity). Wdyt @vuule ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, scoped range is used to make a range within a code block. That means, we can have multiple scoped ranges in a single function. On the other hand, CUDF_FUNC_RANGE is used to generate a (scoped)range for the entire function, at the top function level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this scoped range is at the top function level so it should be equivalent to CUDF_FUNC_RANGE.

@lamarrr lamarrr requested a review from ttnghia July 19, 2025 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add nvtx region for write_csv
6 participants