Skip to content

Encoding error : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2: invalid continuation byte #976

Closed
@ghsama

Description

@ghsama

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ubuntu 18.04
  • **Modin installed from : pip install modin[ray]
  • Modin version: 0.6.3
  • Python version: 3.7.3

Describe the problem

Hello,
i'm trying to use modin to reduce the memory peak due the volum of the data, so i change the pandas with modin.pandas, i try to do a simple read of a file but encoded in 'latin-1' (french) . With pandas all goes smoothly but using modin i got an error of encoding as follow :
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2: invalid continuation byte

the script used (which works fine on pandas but not in modin ) :
caract = pd.read_csv(path, sep="\t", encoding = "ISO-8859-1")

ps :: i tried other encoding and the same remark : works on pandas and not on modin (backed by ray) : ISO-8859-1, ISO-8859-9, latin-1

any solution ??

thanks

Source code / logs

`RayTaskError: ray_worker (pid=10815, host=ubuntu)
File "pandas/_libs/parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx", line 1520, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2: invalid continuation byte

During handling of the above exception, another exception occurred:

ray_worker (pid=10815, host=ubuntu)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/modin/engines/ray/task_wrapper.py", line 8, in deploy_ray_func
return func(**args)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/modin/backends/pandas/parsers.py", line 69, in parse
pandas_df = pandas.read_csv(BytesIO(to_read), **kwargs)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/pandas/io/parsers.py", line 463, in _read
data = parser.read(nrows)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/pandas/io/parsers.py", line 1154, in read
ret = self._engine.read(nrows)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/pandas/io/parsers.py", line 2059, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 896, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 973, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1105, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1158, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas/_libs/parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx", line 1520, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2: invalid continuation byte`

Metadata

Metadata

Assignees

Labels

bug 🦗Something isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions