Use table.iter api with readable stream for loading large data files #50

RandomFractals · 2021-12-29T16:47:49Z

see changes in #49 and table schema docs: https://github.com/frictionlessdata/tableschema-js#tableiterkeyed-extended-cast-forcecast-relations-stream--asynciterator--stream

might need to use a worker threads for larger data files too ... https://nodejs.org/api/worker_threads.html

RandomFractals · 2021-12-30T08:20:19Z

Tried it with vizgen mouse genome data: https://vizgen.com/data-release-program/

One of their data files is ~207 Mbytes, has 650 columns and over 78K rows. That's about 50 million of dense wide column mostly numeric data points to parse.

This took a while to load and Tabulator table is not very responsive on scrolling ...

I might need to move reading large data files to a worker thread because of all the CSV line data parsing. Docs on nodejs worker threads: https://nodejs.org/api/worker_threads.html

Even with the added table.iter rows stream reader, ext. host main thread is blocked till all data is read and the first 1K rows are sent to the table webview for display.

let tabulator table handle it for now

…to table view on init and add data requests (#50)

…ests after initial table is created (#50)

RandomFractals · 2022-01-01T12:19:17Z

Closing this. Will handle the rest of large CSV data parsing and loading optimizations in #58 and #59.

RandomFractals added enhancement New feature or request data Data task labels Dec 29, 2021

RandomFractals added a commit that referenced this issue Dec 29, 2021

initial set of stream data loading changes (#50)

2e1ba30

might need to use a worker threads for larger data files too ... https://nodejs.org/api/worker_threads.html

RandomFractals added a commit that referenced this issue Dec 30, 2021

don't cast parsed data fields to speed up row data loading (#50)

6474c5b

let tabulator table handle it for now

RandomFractals added a commit that referenced this issue Dec 30, 2021

set data page buffer size to 10k for larger data loads and transfers …

d3fe6c3

…to table view on init and add data requests (#50)

RandomFractals added a commit that referenced this issue Dec 31, 2021

split data loading into initial 10K read and subsequent add data requ…

646d265

…ests after initial table is created (#50)

RandomFractals added a commit that referenced this issue Dec 31, 2021

add parsing 100K+ rows tracing for debug (#50)

d1b8023

RandomFractals mentioned this issue Jan 1, 2022

Switch to Papa Parse for CSV data loading and parsing in a worker thread #59

Closed

RandomFractals closed this as completed Jan 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Use table.iter api with readable stream for loading large data files #50

Use table.iter api with readable stream for loading large data files #50

RandomFractals commented Dec 29, 2021

RandomFractals commented Dec 30, 2021 •

edited

Loading

Uh oh!

RandomFractals commented Jan 1, 2022

Uh oh!

Uh oh!

Use table.iter api with readable stream for loading large data files #50

Use table.iter api with readable stream for loading large data files #50

Comments

RandomFractals commented Dec 29, 2021

RandomFractals commented Dec 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RandomFractals commented Jan 1, 2022

Uh oh!

RandomFractals commented Dec 30, 2021 •

edited

Loading