Skip to content

New line stripping is broken in binary reading mode #173

Closed
@pmeier

Description

@pmeier

Let's say we have this file:

path = "example.txt"

with open(path, "w", newline="") as file:
    file.write("foo\n")
    file.write("bar\r\n")
    file.write("baz\r")

If we open in text reading mode (default), we get the following output for the lines:

with open(path) as file:
    for line in file:
        print(repr(line))
'foo\n'
'bar\n'
'baz\n'

By default, Python recognizes the different line terminators and maps them to \n. Thus, our current line stripping is sufficient:

if isinstance(line, str):
yield line.strip("\n")
else:
yield line.strip(b"\n")

dp = IterableWrapper([path])
dp = FileOpener(dp)
dp = LineReader(dp, decode=True, strip_newline=True, return_path=False)

for item in dp:
    print(repr(item))
'foo'
'bar'
'baz'

However, if we open it in binary reading mode, this is a different story:

with open(path, "rb") as file:
    for line in file:
        print(repr(line.decode()))
'foo\n'
'bar\r\n'
'baz\r'

Python does not perform the newline mapping here. Thus, in this mode our stripping is not sufficient

dp = IterableWrapper([path])
dp = FileOpener(dp, mode="rb")
dp = LineReader(dp, decode=True, strip_newline=True, return_path=False)

for item in dp:
    print(repr(item))
'foo'
'bar\r'
'baz\r'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions