Replies: 2 comments
-
GeoDataFusion came first! SedonaDB focused on the plumbing (e.g., optimizer rules, joins, statistics propagation) more than the components, although at this point we have most of the same functions. We also focused on making the WKB representation work everywhere (whereas GeoDataFusion has support for the native encodings that are potentially faster for some functions).
I think from DataFusion's perspective it already does this (in the sense that it allows arbitrary field metadata and arbitrary components to be plugged in). I would love true user-defined extension types in DataFusion so that I can upstream some of the pieces of SedonaDB (e.g., our table printer) but it has been difficult to find a committer interested.
This would also be amazing, although I think it's unlikely to happen in the near future given the multitude of other things Comet has to keep up with. Personally I think that a more DataFusion-native distributed framework will give better results than trying to align with Spark (which did not do a very good job with the corner cases of its Geometry type in my opinion).
This would be amazing. It's on our TODO list to help with this but it's a long TODO list 🙂
I think the general gist of this is more like...we're doing a lot of work in SedonaDB to add functions and file format support, so maybe we can find a way to call SedonaDB functions from Spark to avoid duplicating work when implementing new ones. |
Beta Was this translation helpful? Give feedback.
-
|
I don't have too much to add to Dewey's answer. This project is more focused, ideally, on downstream projects embedding That said, I've had less time lately to spend on this project, so it's been progressing more slowly. I could review a PR if you had a specific feature you wanted to add. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
How does this crate differ from the https://github.com/apache/sedona-db project which already implements geospatial functions with DataFusion?
It would be wonderful to streamline efforts at enabling geospatial processing in the ecosystem. As far as I can tell, here's the remaining efforts needed:
I may have missed some other pieces that need to come together for a frictionless geospatial experience in the DataFusion ecosystem, but those are the outstanding work I know about.
There seems to be some efforts in the Sedona project to use DataFusion via SedonaDB to accelerate Spark queries, which seems a little hacky and backdoor-ish given the context of the wider DataFusion ecosystem and existence of Comet already: apache/sedona#2593
Beta Was this translation helpful? Give feedback.
All reactions