-
Notifications
You must be signed in to change notification settings - Fork 367
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
I am trying to calculate the exponential moving average to a koalas dataframe. I am able to achieve this as below
import pandas as pd
import databricks.koalas as ks
from databricks.koalas import pandas_wraps
df = ks.DataFrame({'cust_id':['a', 'a', 'a', 'b', 'b'],
'sales': [100, 200, 300, 400, 500]})
def fun(col1) -> ks.Series[np.float64]:
return col1.apply(lambda x: x.ewm(alpha=0.5, adjust=False).mean()) # Arbitrary pandas code.
df['moving_average'] = fun(df.groupby('cust_id').sales)
df.head()
However, when I try to implement the above in a dataset that has 30M records it takes 4 hrs to complete. Is there any way to speed this up
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working