Skip to content

Use table.iter api with readable stream for loading large data files #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RandomFractals opened this issue Dec 29, 2021 · 2 comments
Closed
Labels
data Data task enhancement New feature or request

Comments

@RandomFractals
Copy link
Owner

see changes in #49 and table schema docs: https://github.com/frictionlessdata/tableschema-js#tableiterkeyed-extended-cast-forcecast-relations-stream--asynciterator--stream

@RandomFractals RandomFractals added enhancement New feature or request data Data task labels Dec 29, 2021
RandomFractals added a commit that referenced this issue Dec 29, 2021
might need to use a worker threads for larger data files too ... https://nodejs.org/api/worker_threads.html
@RandomFractals
Copy link
Owner Author

RandomFractals commented Dec 30, 2021

Tried it with vizgen mouse genome data: https://vizgen.com/data-release-program/

One of their data files is ~207 Mbytes, has 650 columns and over 78K rows. That's about 50 million of dense wide column mostly numeric data points to parse.

This took a while to load and Tabulator table is not very responsive on scrolling ...

tabular-mouse-genome-data

I might need to move reading large data files to a worker thread because of all the CSV line data parsing. Docs on nodejs worker threads: https://nodejs.org/api/worker_threads.html

Even with the added table.iter rows stream reader, ext. host main thread is blocked till all data is read and the first 1K rows are sent to the table webview for display.

RandomFractals added a commit that referenced this issue Dec 30, 2021
RandomFractals added a commit that referenced this issue Dec 30, 2021
…to table view on init and add data requests (#50)
RandomFractals added a commit that referenced this issue Dec 31, 2021
@RandomFractals
Copy link
Owner Author

Closing this. Will handle the rest of large CSV data parsing and loading optimizations in #58 and #59.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Data task enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant