Skip to content
This repository was archived by the owner on Sep 23, 2024. It is now read-only.

Fix csv quoting mechanism to preserve line feeds #126

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

msg555
Copy link

@msg555 msg555 commented Sep 18, 2023

Problem

Currently we're using json.dumps to quote/escape each column of our CSV rows. However, json.dumps escapes characters beyond just the double-quote and backslash characters that will not get unescaped when postgres processes the CSV data. In particular the current method will cause the backspace character (\b), form feed (\f), line feed (\n), carriage return (\r), and tab (\t) to be escaped in the extracted payload mistakenly.

Proposed changes

Uses a simpler escaping mechanism that only escapes the double-quote character and backslash literals. This appears to be what postgres is expecting; see https://www.postgresql.org/docs/8.0/sql-copy.html.

We also to take care to encode the empty string as a quoted empty string and null just as an empty string. According to that same sql-copy doc this appears to be what postgres is expecting by default to distinguish between the two cases.

Types of changes

What types of changes does your code introduce to PipelineWise?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

  • Description above provides context of the change
  • I have added tests that prove my fix is effective or that my feature works
  • Unit tests for changes (not needed for documentation changes)
  • CI checks pass with my changes
  • Bumping version in setup.py is an individual PR and not mixed with feature or bugfix PRs
  • Commit message/PR title starts with [AP-NNNN] (if applicable. AP-NNNN = JIRA ID)
  • Branch name starts with AP-NNN (if applicable. AP-NNN = JIRA ID)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant