Switch to pyarrow engine when reading CSV files

As discussed in https://datapythonista.me/blog/pandas-20-and-the-arrow-revolution-part-i `pyarrow` can also be used to handle data types inside `pandas` and read CSV files.

I did a small test with a 304 MB CSV file containing 1,461,090 rows:

* reading with `pandas`: 3.99 s
* reading with `pyarrow`: 0.64 s

The most obvious difference is then that the resulting data types are called `int64[pyarrow]`, `string[pyarrow]`, and so on. There might also be other differences as it was stated in the article that not all operations are yet supported by the `pyarrow` data types, but maybe we are lucky and can use it already for the cases we have in `audformat`.

It might also well align with https://github.com/audeering/audformat/issues/321 and https://github.com/audeering/audformat/issues/376

<details><summary>Benchmark code</summary>

```python
import pandas as pd
import time

path = 'db.csv'
start = time.time()
df = pd.read_csv(path)
end = time.time()
print(f'{end - start:.2f} s')
start = time.time()
df = pd.read_csv(path, engine='pyarrow', dtype_backend='pyarrow')
end = time.time()
print(f'{end - start:.2f} s')
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch to pyarrow engine when reading CSV files #382

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Switch to pyarrow engine when reading CSV files #382

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions