Release Version 0.33.0 · databricks/koalas

`apply` and `transform` Improvements

We added supports to have positional/keyword arguments for apply, apply_batch, transform, and transform_batch in DataFrame, Series, and GroupBy. (#1484, #1485, #1486)

>>> ks.range(10).apply(lambda a, b, c: a + b + c, args=(1,), c=3)
   id
0   4
1   5
2   6
3   7
4   8
5   9
6  10
7  11
8  12
9  13

>>> ks.range(10).transform_batch(lambda pdf, a, b, c: pdf.id + a + b + c, 1, 2, c=3)
0     6
1     7
2     8
3     9
4    10
5    11
6    12
7    13
8    14
9    15
Name: id, dtype: int64

>>> kdf = ks.DataFrame(
...    {"a": [1, 2, 3, 4, 5, 6], "b": [1, 1, 2, 3, 5, 8], "c": [1, 4, 9, 16, 25, 36]},
...    columns=["a", "b", "c"])
>>> kdf.groupby(["a", "b"]).apply(lambda x, y, z: x + x.min() + y + z, 1, z=2)
    a   b   c
0   5   5   5
1   7   5  11
2   9   7  21
3  11   9  35
4  13  13  53
5  15  19  75

Spark Schema

We add spark_schema and print_schema to know the underlying Spark Schema. (#1446)

>>> kdf = ks.DataFrame({'a': list('abc'),
...                     'b': list(range(1, 4)),
...                     'c': np.arange(3, 6).astype('i1'),
...                     'd': np.arange(4.0, 7.0, dtype='float64'),
...                     'e': [True, False, True],
...                     'f': pd.date_range('20130101', periods=3)},
...                    columns=['a', 'b', 'c', 'd', 'e', 'f'])

>>> # Print the schema out in Spark’s DDL formatted string
>>> kdf.spark_schema().simpleString()
'struct<a:string,b:bigint,c:tinyint,d:double,e:boolean,f:timestamp>'
>>> kdf.spark_schema(index_col='index').simpleString()
'struct<index:bigint,a:string,b:bigint,c:tinyint,d:double,e:boolean,f:timestamp>'

>>> # Print out the schema as same as DataFrame.printSchema()
>>> kdf.print_schema()
root
 |-- a: string (nullable = false)
 |-- b: long (nullable = false)
 |-- c: byte (nullable = false)
 |-- d: double (nullable = false)
 |-- e: boolean (nullable = false)
 |-- f: timestamp (nullable = false)

>>> kdf.print_schema(index_col='index')
root
 |-- index: long (nullable = false)
 |-- a: string (nullable = false)
 |-- b: long (nullable = false)
 |-- c: byte (nullable = false)
 |-- d: double (nullable = false)
 |-- e: boolean (nullable = false)
 |-- f: timestamp (nullable = false)

GroupBy Improvements

We fixed many bugs of GroupBy as listed below.

Fix groupby when as_index=False. (#1457)
Make groupby.apply in pandas<0.25 run the function only once per group. (#1462)
Fix Series.groupby on the Series from different DataFrames. (#1460)
Fix GroupBy.head to recognize agg_columns. (#1474)
Fix GroupBy.filter to follow complex group keys. (#1471)
Fix GroupBy.transform to follow complex group keys. (#1472)
Fix GroupBy.apply to follow complex group keys. (#1473)
Fix GroupBy.fillna to use GroupBy._apply_series_op. (#1481)
Fix GroupBy.filter and apply to handle agg_columns. (#1480)
Fix GroupBy apply, filter, and head to ignore temp columns when ops from different DataFrames. (#1488)
Fix GroupBy functions which need natural orderings to follow the order when opts from different DataFrames. (#1490)

Other new features and improvements

We added the following new feature:

SeriesGroupBy:

filter (#1483)

Other improvements

dtype for DateType should be np.dtype("object"). (#1447)
Make reset_index disallow the same name but allow it when drop=True. (#1455)
Fix named aggregation for MultiIndex (#1435)
Raise ValueError that is not raised now (#1461)
Fix get dummies when uses the prefix parameter whose type is dict (#1478)
Simplify DataFrame.columns setter. (#1489)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Version 0.33.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

`apply` and `transform` Improvements

Spark Schema

GroupBy Improvements

Other new features and improvements

Other improvements

Uh oh!

Version 0.33.0

apply and transform Improvements

Spark Schema

GroupBy Improvements

Other new features and improvements

Other improvements

Uh oh!

`apply` and `transform` Improvements