-
Notifications
You must be signed in to change notification settings - Fork 414
Open
Labels
QoLQuality of Life: improve the developer experienceQuality of Life: improve the developer experienceenhancementNew feature or requestNew feature or request
Milestone
Description
Background
With working fsspec we may recognize and automatically load data from various uris. We can combine this with a few additional types like pandas frame.
- accept strings as dlt data if they are uris to resources
- recognize fsspec uris
- allow loading the following data formats (by extension): json, jsonl, csv ... from those uris
- accept gzipped files
- accept panda frames - allow to stream large json files, recognize files containing lists of objects and a few other streamable cases (we have a concept code)
Implementation Outline
Extent our internal sources by merging the following into main library
- pandas source (enumerate pandas frames)
- json and jsonl sources
- jsonl streaming source with format autodetection
- we can use pandas for csv, xml, xls etc.
Future Work
At some point we want to change how the normalizer works so it can deal with (serialized) panda frames (ie. feather), parquet files etc. directly to not be forces to convert all of them into python objects and back
Metadata
Metadata
Assignees
Labels
QoLQuality of Life: improve the developer experienceQuality of Life: improve the developer experienceenhancementNew feature or requestNew feature or request
Type
Projects
Status
Todo