Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ubuntu 18.04
- **Modin installed from : pip install modin[ray]
- Modin version: 0.6.3
- Python version: 3.7.3
Describe the problem
Hello,
i'm trying to use modin to reduce the memory peak due the volum of the data, so i change the pandas with modin.pandas, i try to do a simple read of a file but encoded in 'latin-1' (french) . With pandas all goes smoothly but using modin i got an error of encoding as follow :
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2: invalid continuation byte
the script used (which works fine on pandas but not in modin ) :
caract = pd.read_csv(path, sep="\t", encoding = "ISO-8859-1")
ps :: i tried other encoding and the same remark : works on pandas and not on modin (backed by ray) : ISO-8859-1, ISO-8859-9, latin-1
any solution ??
thanks
Source code / logs
`RayTaskError: ray_worker (pid=10815, host=ubuntu)
File "pandas/_libs/parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx", line 1520, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2: invalid continuation byte
During handling of the above exception, another exception occurred:
ray_worker (pid=10815, host=ubuntu)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/modin/engines/ray/task_wrapper.py", line 8, in deploy_ray_func
return func(**args)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/modin/backends/pandas/parsers.py", line 69, in parse
pandas_df = pandas.read_csv(BytesIO(to_read), **kwargs)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/pandas/io/parsers.py", line 463, in _read
data = parser.read(nrows)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/pandas/io/parsers.py", line 1154, in read
ret = self._engine.read(nrows)
File "/home/lasngd/.conda/envs/pytorch/lib/python3.7/site-packages/pandas/io/parsers.py", line 2059, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 896, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 973, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1105, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1158, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas/_libs/parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx", line 1520, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2: invalid continuation byte`