-
-
Notifications
You must be signed in to change notification settings - Fork 833
Closed
Labels
Description
With #3114 (and some recent work in ibis) we can now pass ibis tables directly to altair.Chart and things just work.
import ibis
import altair as alt
t = ibis.examples.penguins.fetch()
chart = (
alt.Chart(t, width=600)
.mark_circle(size=50)
.encode(
x=alt.X("bill_length_mm").scale(zero=False),
y=alt.Y("bill_depth_mm").scale(zero=False),
color="species"
)
.interactive()
)However, currently it appears the entire table is loaded into memory, even if only a few columns are needed for the plot. For example, the above penguins dataset has 8 columns, but only 3 of them are used to generate the plot. Can altair make use of this information to subselect columns before conversion when using the __dataframe__ protocol?
With some recent work in ibis, the following can all happen without loading data into memory:
df = t.__dataframe__()
# subselect columns
df = df.select_columns_by_name(["bill_length_mm", "bill_depth_mm", "species"])
# view dtypes, as needed for altair's type inference
df.get_column_by_name("bill_length_mm").dtype
# convert to pyarrow here. Only this step will actually execute the query
t = pa.interchange.from_dataframe(df)Especially for wide input tables, having altair handle subselecting columns automatically may be useful for improving performance.