Since we now have a Rust accelerator, and are using it for several operations in dataset, it seems that it would be easier to maintain the dataset abstractions, as well as possibly more performant in some cases, if we built a Rust kernel to manage the core of Dataset and DatasetBuilder, with the Python code providing a more convenient interface and wrapper functionality (e.g. conversion to tensor engines), instead of bouncing between Rust and Python for core functionality.
Some of the data processing would also be easier to read if it was written directly in Rust instead of chaining together (sometimes buggy) PyArrow compute kernels.
Since we now have a Rust accelerator, and are using it for several operations in dataset, it seems that it would be easier to maintain the dataset abstractions, as well as possibly more performant in some cases, if we built a Rust kernel to manage the core of
DatasetandDatasetBuilder, with the Python code providing a more convenient interface and wrapper functionality (e.g. conversion to tensor engines), instead of bouncing between Rust and Python for core functionality.Some of the data processing would also be easier to read if it was written directly in Rust instead of chaining together (sometimes buggy) PyArrow compute kernels.