Skip to content

load_dataset for CSV files not working #743

Open
@iliemihai

Description

@iliemihai

Similar to #622, I've noticed there is a problem when trying to load a CSV file with datasets.

from datasets import load_dataset
dataset = load_dataset("csv", data_files=["./sample_data.csv"], delimiter="\t", column_names=["title", "text"], script_version="master")

Displayed error:
... ArrowInvalid: CSV parse error: Expected 2 columns, got 1

I should mention that when I've tried to read data from https://github.com/lhoestq/transformers/tree/custom-dataset-in-rag-retriever/examples/rag/test_data/my_knowledge_dataset.csv it worked without a problem. I've read that there might be some problems with /r character, so I've removed them from the custom dataset, but the problem still remains.

I've added a colab reproducing the bug, but unfortunately I cannot provide the dataset.
https://colab.research.google.com/drive/1Qzu7sC-frZVeniiWOwzoCe_UHZsrlxu8?usp=sharing

Are there any work around for it ?
Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions