Skip to content

Conversation

@7amza79
Copy link
Contributor

@7amza79 7amza79 commented Jun 18, 2025

Add CRLF Support for CSV Exports

Description

This PR adds support for Windows-style line endings (CRLF) in CSV exports, while maintaining the default UNIX-style line endings (LF). This enhancement improves compatibility with Windows-based systems and tools that expect CRLF line endings.

Changes

  • Added line_ending configuration option to CSV writers with two possible values:
    • lf (default): Uses UNIX-style line endings
    • crlf: Uses Windows-style line endings
  • Updated both CsvWriter and ArrowToCsvWriter to support line ending configuration
  • Added comprehensive test coverage for both line ending types
  • Updated documentation to reflect the new configuration option

Technical Details

  • Added CsvLineEnding type to configuration specs
  • Implemented custom CSV dialect for Python's csv writer to handle CRLF
  • Added version-aware implementation for PyArrow CSV writer (supports line endings from version 14.0)
  • Added configuration options in both TOML and environment variables

Testing

  • Added unit tests for both CsvWriter and ArrowToCsvWriter
  • Verified correct line ending generation for both LF and CRLF
  • Ensured no mixed line endings in output files
  • Added tests for both direct data writing and Arrow table conversion

Documentation

  • Updated CSV format documentation to include the new line_ending option
  • Added examples for both TOML and environment variable configurations

Configuration Example

[NORMALIZE]
data_writer__line_ending="crlf"

Or via environment variable:

NORMALIZE__DATA_WRITER__LINE_ENDING=crlf

Checklist

  • Added new configuration option
  • Implemented support in both CSV writers
  • Added comprehensive tests
  • Updated documentation
  • Verified backward compatibility
  • Added examples for configuration

@netlify
Copy link

netlify bot commented Jun 18, 2025

Deploy Preview for dlt-hub-docs ready!

Name Link
🔨 Latest commit 3409386
🔍 Latest deploy log https://app.netlify.com/projects/dlt-hub-docs/deploys/687a24db5b204b0008ec8dc6
😎 Deploy Preview https://deploy-preview-2783--dlt-hub-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@7amza79 7amza79 changed the title feat() add crlf support for csv exports feat - add crlf support for csv exports Jun 23, 2025
@7amza79 7amza79 force-pushed the feat-add-crlf-support-for-csv-exports branch from 2f817b5 to 462cee7 Compare June 23, 2025 15:47
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks pretty good! pls see my comments. thx for the tests!

file_max_bytes: int = None,
disable_compression: bool = False,
_caps: DestinationCapabilitiesContext = None,
**writer_kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see this used. do you need it for testing? We pass csv options via config injection (you do it correctly below)

@rudolfix rudolfix self-assigned this Jun 23, 2025
@rudolfix rudolfix added the ci from fork Use to trigger CI on a PR (even from a fork) label Jun 23, 2025
@rudolfix
Copy link
Collaborator

@7amza79 please also fix linter: make format and make lint

@burnash burnash assigned burnash and unassigned rudolfix Jun 25, 2025
@burnash burnash self-requested a review June 25, 2025 12:24
@burnash
Copy link
Collaborator

burnash commented Jul 15, 2025

Hey @7amza79, would you like to continue working on this PR or you'd prefer that we take it from here? Thanks again for your effort!

@7amza79
Copy link
Contributor Author

7amza79 commented Jul 15, 2025

@burnash I 've noticed that pyArrow didn't support CRLF writing neither so I've "abandonned" working on this implementation and used pandas instead. So I think you can take it on from here if this "feature" is interesting for the product (which I think it is :D )

@burnash burnash requested a review from rudolfix July 17, 2025 22:23
@burnash burnash added documentation Improvements or additions to documentation enhancement New feature or request labels Jul 17, 2025
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but maybe we could make lineterminator more flexible? see my comments. if you think it is not worth the effort you can also merge

@burnash burnash requested a review from rudolfix July 18, 2025 10:44
@burnash burnash merged commit 2d0a817 into dlt-hub:devel Jul 18, 2025
46 checks passed
zilto pushed a commit that referenced this pull request Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci from fork Use to trigger CI on a PR (even from a fork) documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants