Open
Description
Summary:
We propose adding CSV file support as a metics file format. This feature will allow users to leverage the flexibility and familiarity of CSV, and DataFrame libraries widely used to calculate metrics.
Current Limitations:
Presently, DVC supports the formats JSON, TOML 1.0, or YAML 1.2 files. However, the absence of CSV support restricts its compatibility and integration with common data workflows. It's tedious to convert tabular data into nested JSON format.
Proposed Solution:
- CSV File Processing: Enable DVC to read and convert CSV data into an internal format.
- Grouping keys is the correct spelling.: Support nested keys to configure groupping metrics via CLI and
dvc.yaml:metrics
- Compatibility and Variation Handling: Ensure the tool can handle different CSV structures, including varying delimiters, missing values, and missing headers.
- Error Handling: Provide clear messages for errors related to CSV formatting.
Benefits:
- User-Friendly: CSV is a familiar format for many, making the tool more accessible.
- Flexibility: CSV support allows for a broader range of data import and export options, accommodating diverse user workflows.
Use Case Example:
A data scientist needs to log metrics for a CV model (e.g. vehicle inspection) stored in CSV file.
- CSV Example: Multi-indexed CSV with vehicle types and parts.
Vehicle, Part, Accuracy, Count Car, Wheel, 0.3, 150 Car, Bumper, 0.5, 200 Truck, Glass, 0.5, 250 Truck, Wheel, 0.1, 110
- JSON Structure: The data is structured in JSON to reflect the vehicle-part relationship and metrics.
{
"Car": {
"Wheel": {"Accuracy": 0.3, "Count": 150},
"Bumper": {"Accuracy": 0.5, "Count": 200}
...
},
"Truck": {
"Glass": {"Accuracy": 0.5, "Count": 250},
"Wheel": {"Accuracy": 0.1, "Count": 110}
...
}
}
- *dvc.yaml: Metrics configuration
metrics:
- metrics.csv:
keys: ["Vehicle", "Part]
metrics: ["Accuracy", "Count"]