Skip to content

Support CSV as Metrics Data Format #10184

Open
@mnrozhkov

Description

@mnrozhkov

Summary:

We propose adding CSV file support as a metics file format. This feature will allow users to leverage the flexibility and familiarity of CSV, and DataFrame libraries widely used to calculate metrics.

Current Limitations:

Presently, DVC supports the formats JSON, TOML 1.0, or YAML 1.2 files. However, the absence of CSV support restricts its compatibility and integration with common data workflows. It's tedious to convert tabular data into nested JSON format.

Proposed Solution:

  • CSV File Processing: Enable DVC to read and convert CSV data into an internal format.
  • Grouping keys is the correct spelling.: Support nested keys to configure groupping metrics via CLI and dvc.yaml:metrics
  • Compatibility and Variation Handling: Ensure the tool can handle different CSV structures, including varying delimiters, missing values, and missing headers.
  • Error Handling: Provide clear messages for errors related to CSV formatting.

Benefits:

  • User-Friendly: CSV is a familiar format for many, making the tool more accessible.
  • Flexibility: CSV support allows for a broader range of data import and export options, accommodating diverse user workflows.

Use Case Example:

A data scientist needs to log metrics for a CV model (e.g. vehicle inspection) stored in CSV file.

  • CSV Example: Multi-indexed CSV with vehicle types and parts.
    Vehicle, Part, Accuracy, Count
    Car, Wheel, 0.3, 150
    Car, Bumper, 0.5, 200
    Truck, Glass, 0.5, 250
    Truck, Wheel, 0.1, 110
    
  • JSON Structure: The data is structured in JSON to reflect the vehicle-part relationship and metrics.
{
    "Car": {
        "Wheel": {"Accuracy": 0.3, "Count": 150},
        "Bumper": {"Accuracy": 0.5, "Count": 200}
        ...
    },
    "Truck": {
        "Glass": {"Accuracy": 0.5, "Count": 250},
        "Wheel": {"Accuracy": 0.1, "Count": 110}
        ...
    }
}
  • *dvc.yaml: Metrics configuration
metrics:
  - metrics.csv:
      keys: ["Vehicle", "Part]
      metrics: ["Accuracy", "Count"]

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: apiRelated to the dvc.api

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions