New line stripping is broken in binary reading mode

Let's say we have this file:

```python
path = "example.txt"

with open(path, "w", newline="") as file:
    file.write("foo\n")
    file.write("bar\r\n")
    file.write("baz\r")
```

If we open in text reading mode (default), we get the following output for the lines:

```python
with open(path) as file:
    for line in file:
        print(repr(line))
```

```
'foo\n'
'bar\n'
'baz\n'
```

By default, Python recognizes the different line terminators and maps them to `\n`. Thus, our current line stripping is sufficient:

https://github.com/pytorch/data/blob/c06066ae360fc6054fb826ae041b1cb0c09b2f3b/torchdata/datapipes/iter/util/plain_text_reader.py#L46-L49

```python
dp = IterableWrapper([path])
dp = FileOpener(dp)
dp = LineReader(dp, decode=True, strip_newline=True, return_path=False)

for item in dp:
    print(repr(item))
```

```
'foo'
'bar'
'baz'
```

However, if we open it in binary reading mode, this is a different story:

```python
with open(path, "rb") as file:
    for line in file:
        print(repr(line.decode()))
```

```
'foo\n'
'bar\r\n'
'baz\r'
```

Python does not perform the newline mapping here. Thus, in this mode our stripping is not sufficient

```python
dp = IterableWrapper([path])
dp = FileOpener(dp, mode="rb")
dp = LineReader(dp, decode=True, strip_newline=True, return_path=False)

for item in dp:
    print(repr(item))
```

```
'foo'
'bar\r'
'baz\r'
```

	if isinstance(line, str):
	yield line.strip("\n")
	else:
	yield line.strip(b"\n")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New line stripping is broken in binary reading mode #173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New line stripping is broken in binary reading mode #173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions