Closed
Description
Let's say we have this file:
path = "example.txt"
with open(path, "w", newline="") as file:
file.write("foo\n")
file.write("bar\r\n")
file.write("baz\r")
If we open in text reading mode (default), we get the following output for the lines:
with open(path) as file:
for line in file:
print(repr(line))
'foo\n'
'bar\n'
'baz\n'
By default, Python recognizes the different line terminators and maps them to \n
. Thus, our current line stripping is sufficient:
data/torchdata/datapipes/iter/util/plain_text_reader.py
Lines 46 to 49 in c06066a
dp = IterableWrapper([path])
dp = FileOpener(dp)
dp = LineReader(dp, decode=True, strip_newline=True, return_path=False)
for item in dp:
print(repr(item))
'foo'
'bar'
'baz'
However, if we open it in binary reading mode, this is a different story:
with open(path, "rb") as file:
for line in file:
print(repr(line.decode()))
'foo\n'
'bar\r\n'
'baz\r'
Python does not perform the newline mapping here. Thus, in this mode our stripping is not sufficient
dp = IterableWrapper([path])
dp = FileOpener(dp, mode="rb")
dp = LineReader(dp, decode=True, strip_newline=True, return_path=False)
for item in dp:
print(repr(item))
'foo'
'bar\r'
'baz\r'
Metadata
Metadata
Assignees
Labels
No labels