-
Notifications
You must be signed in to change notification settings - Fork 367
Closed
Description
Hello there, firstly thank you for such an amazing package that bridges the gap between Pandas and PySpark.
I started using koalas approximately 1 week back and everything was intuitive till the time i stumbled upon koalas.Groupby.Apply.
Code:
if __name__ == '__main__':
ks_df = ks.DataFrame(features_data)
ks_df_info_abt_train = ks_df.groupby(['div_nbr', 'store_nbr']).apply(_koalas_train)
def _koalas_train(frame):
out_frame = frame.copy()
out_frame = frame['trans_type_value'].sum()
return out_frame
Here features_data is a pd.Dataframe.
Output from Koalas.Groupby.Apply:
Output from Pandas.Groupby.Apply:

As you can see, the output from pandas Groupby apply is as expected, but the output from Koalas Groupby apply is not right. Could you guid me towards the right direction by pointing out any logical mistake that i might have made or anything else.
Thank you once again.
Koalas version - 0.18.0
Pandas version - 0.23.4
PySpark - 2.4.3
Metadata
Metadata
Assignees
Labels
No labels
