Skip to content

Conversation

@jcrist
Copy link
Member

@jcrist jcrist commented Aug 1, 2023

This improves the efficiency and usability of the __dataframe__ protocol in a few ways:

  • It avoids executing the wrapped query for operations where that's not strictly required. This allows fetching things like column names and types, or subselecting columns to occur without executing the query first.
  • Given a single __dataframe__() output (an IbisDataFrame instance), the wrapped query will execute at most once for all methods called. Once executed, the resulting pyarrow.Table is stored on the IbisDataFrame instance to be reused by subsequent access.
  • If column subselection is done on an already executed IbisDataFrame, no computation results.

Given these improvements, an ibis.Table passed to altair (dev) now results in only one execution, rather than potentially several executions.

@cpcloud cpcloud added this to the 6.1 milestone Aug 1, 2023
@cpcloud cpcloud added the performance Issues related to ibis's performance label Aug 1, 2023
@jcrist jcrist force-pushed the improve-dataframe-interchange branch from 86c1195 to 8f58270 Compare August 1, 2023 15:41
@jcrist
Copy link
Member Author

jcrist commented Aug 1, 2023

Thanks for the review, I believe this should be good-to-go now.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work.

Excited to see this unblock some sweet integrations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecosystem External projects or activities performance Issues related to ibis's performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants