Releases · etiennebacher/tidypolars

31 Mar 13:44

etiennebacher

v0.18.0

661a744

tidypolars 0.18.0 Latest

Latest

tidypolars requires polars >= 1.10.0.

New features

Added support for the following functions:
- anyNA() (#330)
- cummax() (#323)
- trunc() (#343)
Added decreasing and na.last arguments support to sort() (@Yousa-Mirage, #328).
Added na.last and ties.method arguments support to rank() (@Yousa-Mirage, #329).
Better error message in filter() when a condition uses = instead of == (#341).
count() and add_count() now work with expressions, e.g. count(mtcars, mpg + 1)
(#346).
as_tibble() on grouped Polars DataFrames or LazyFrames now returns a grouped
tibble (#348).

Bug fixes

Fix NA handling in cummin(), cumprod(), cumsum() (@Yousa-Mirage, #326).
Fix NA handling in is.finite(), is.infinite(), and is.nan() (#331).
In arrange(), if the data was grouped, the order was never maintained even if
maintain_order = TRUE was passed in group_by(). This is now fixed (#332).
When exporting to CSV, null_values alone did not apply and could override explicitly
provided null_value. This is now fixed (@Yousa-Mirage, #334).
Fix sample() to make it work correctly (@Yousa-Mirage, #338).
Fix unite() behavior when na.rm = TRUE (#344).
Fix a bug in fill() where groups set in .by would be preserved after the
operation (hence returning a grouped output) (#348).

Contributors

Yousa-Mirage

Assets 2

12 Feb 11:03

etiennebacher

v0.17.0

b2f6e1f

tidypolars 0.17.0

tidypolars requires polars >= 1.9.0 and dplyr >= 1.2.0.

Breaking changes and deprecations

The following functions (deprecated since 0.10.0, August 2024) are now removed
(#303):
- describe(), use summary() instead.
- describe_plan() and describe_optimized_plan(), use
  explain(optimized = TRUE/FALSE) instead.
make_unique_id() is deprecated and will be removed in a future version. This
is because the underlying Polars function isn't guaranteed to give the same
results across different versions. This function doesn't have a replacement in
tidypolars (#304).
In partition_by_key() and partition_by_max_size() (both already deprecated
in 0.16.0), the argument per_partition_sort_by has been removed (#322).

New features

Added support for dplyr::near() (#311).
pivot_wider() now works with Polars LazyFrames (#318).
Added support for several functions implemented in dplyr 1.2.0:
- filter_out() (#280)
- recode_values() (#308)
- replace_values() (#308)
- replace_when() (#307)
- when_any() (#306)
- when_all() (#306)
separate() now supports regex in the sep argument (#320).

Other changes

Several changes to make tidypolars more aligned with the tidyverse output
in general (#316):
- in count(), if sort = TRUE and there are some ties, then other variables
  are sorted in increasing order.
- coalesce() no longer has a default argument. This was an implementation
  mistake since dplyr::coalesce() never had this argument.
- ungroup() used to remove the group-specific attributes in the original
  grouped data, even if the result of the operation was not assigned. This is
  fixed.
- replace_na() on a Polars DataFrame or LazyFrame now errors if replacement
  is not a list.
- slice_*() functions on grouped data return columns in the same order as in
  the input.
- summarize() with only NULL expressions now returns one row per unique
  group instead of the entire data.
- unite() now returns columns in the correct order, and doesn't duplicate the
  sep in the output if some values are NA.

Bug fixes

bind_rows_polars() now uses input names in .id if not all inputs are named,
for example bind_rows_polars(x1 = x1, x2, .id = "id") (#317).

Assets 2

21 Jan 21:39

etiennebacher

v0.16.0

d29609f

tidypolars 0.16.0

tidypolars requires polars >= 1.8.0.

New features

New function unnest_longer_polars() to unnest list-columns into rows,
equivalent to tidyr::unnest_longer(). It supports the parameters values_to,
indices_to, keep_empty, as well as the {col} templates for column
naming. (#212, #281, @Yousa-Mirage)
New functions separate_longer_delim_polars() and separate_longer_position_polars()
to split string columns into rows by delimiter or fixed width, equivalent to
tidyr::separate_longer_delim() and tidyr::separate_longer_position().
(#57, #285, @Yousa-Mirage)
New argument .by in fill() (this was introduced in tidyr 1.3.2). (#283)
wday() now supports arbitrary week_start values (1~7), allowing for
custom week start days. (#292, @Yousa-Mirage)
Add support for argument type in nchar (#288).
It is now possible to use translated functions without loading the package
they come from. For example, the following code can run without loading
stringr in the session:
```
data |>
  mutate(y = .tp$str_extract_stringr(x, "\\d+"))
```
This can be useful to benefit from polars speed while using the interface of
tidyverse functions, without adding additional tidyverse dependencies. This
may be useful to avoid installing extra dependencies, but it is not the
recommended usage because it makes it harder to convert tidypolars code to
run with other tidyverse-based backends. More information with ?.tp (#293).
New argument mkdir in write_parquet_polars() (this already existed in
sink_parquet()). (#298)
New (experimental) function partition_by() to write partitioned output in
sink_*() and write_*_polars(). The following functions are deprecated and
will be removed in a future release (#299):
- partition_by_key() can be replaced with partition_by(key =)
- partition_by_max_size() can be replaced with partition_by(max_rows_per_file =)

Changes

collect() now returns a tibble instead of a data.frame, for consistency
with other collect() methods (#273).

Bug fixes

arrange() now works with literal values, such as arrange(x, 1:2) (#296).

Documentation

Removed the "FAQ" vignette, which was outdated and wasn't particularly helpful.

Contributors

Yousa-Mirage

Assets 2

16 Nov 12:54

etiennebacher

v0.15.1

6f92eff

tidypolars 0.15.1

tidypolars requires polars >= 1.6.0.

Assets 2

03 Nov 14:03

etiennebacher

v0.15.0

1aa2173

tidypolars 0.15.0

Breaking changes

For consistency with dplyr, distinct() now only keeps the selected columns.
To keep all columns, use .keep_all = TRUE (#227, @ppanko).

New features

New argument mkdir in all sink_*() functions to recursively create the
folder(s) specified in the path(s) to files (#236).
New functions partition_by_key() and partition_by_max_size() that can be
used in the path argument of sink_*() functions. Those enable writing a
LazyFrame to several files as partitioned output. See more details in
?sink_parquet() (#237).
bind_cols_polars() now works with more than two LazyFrames (#244).
Add support for gsub() (#250).
Add partial support for stringr::str_equal() (#228).
Add support for lubridate functions rollbackward(), rollback(), and rollforward() (#252).
Support stringr::fixed() in more stringr functions (#250).
Add support for argument ignore.case in grepl() (#251).
Add support for argument .keep_all in distinct() (#227, @ppanko).

Bug fixes

Better error message in group_by() for unsupported argument .drop (#230).
Better error message in group_by() when passing named expressions in ....
dplyr supports those but it is more and more recommended to use the .by /
by argument in individual functions rather than using group_by() and
ungroup() (#238).
Better error message in count() when passing named expressions in ... (#239).
Fix bug in join_where() when all common column names between two DataFrames
are used in the join conditions (#254).
Using %in% with NA now retains the NA in the data. Using %in% NA will
error (#256).
Remove occasional deprecation message coming from Polars when using %in%
(#259, @ppanko).
Better handling of functions prefixed with <pkg>:: (#261).
Fix wrong behavior of paste() and paste0() with collapse (#263).

Documentation

New vignette "How to benchmark tidypolars" (#232).
Better documentation for all read_*() and scan_*() functions (#241).

Contributors

ppanko

Assets 2

06 Aug 08:27

etiennebacher

v0.14.1

5acc4a5

tidypolars 0.14.1

tidypolars requires polars >= 1.1.0 (#222).

Bug fixes

Fix a corner case when filter() was used in a custom function with missing
arguments (#220).
In grepl(), the argument fixed is now used correctly (thanks @gernophil
for the report, #223).
if_else() and ifelse() now work when using named arguments (#224).

Contributors

gernophil

Assets 2

22 Jul 15:33

etiennebacher

v0.14.0

754622c

tidypolars 0.14.0

tidypolars requires polars >= 1.0.0. This release of polars contains
many breaking changes. Those should be invisible to tidypolars users, with
the exception of deprecation messages (see below). However, if your code
contains user-defined functions that use polars syntax, you may need to
revise those (#194).

Deprecations and breaking changes

The following arguments are deprecated and will be removed in a future
version. The recommended replacement is indicated on the right of the arrow
(#194):
- in compute() and collect(): streaming -> engine;
- in read_csv_polars() and scan_csv_polars():
  - dtypes -> schema_overrides
  - reuse_downloaded -> no replacement
- in read_ndjson_polars and scan_ndjson_polars():
  - reuse_downloaded -> no replacement
- in read_ipc_polars and scan_ipc_polars():
  - memory_map -> no replacement
- in write_csv_polars() and sink_csv():
  - null_values -> null_value
  - quote -> quote_char
- in write_ndjson_polars():
  - pretty -> no replacement
  - row_oriented -> no replacement
- in write_ipc_polars():
  - future -> compat_level
fetch() is deprecated, use head() before collect() instead (#194).
group_keys() now returns a tibble and not a data.frame anymore (#194).
lubridate::make_date(), lubridate::make_datetime(), and ISOdatetime()
now error if some components go over their expected range, e.g. month = 20
or hour = 25. Before, those functions were returning NA in this situation
(#194).
summary() returns an additional row for the 50% percentile (#194).

New features

Added support for various lubridate functions:
- force_tz() and with_tz() (@atsyplenkov, #170);
- date() (@atsyplenkov, #181);
- today() and now() (#183);
- weeks(), days(), hours(), minutes(), seconds(), milliseconds(),
  microseconds(), nanoseconds() (#184).
tidypolars can now use expressions that contain non-translated functions
if those expressions do not use columns from the data.

Example:
```
dat <- pl$DataFrame(foo = c(2, 1, 2))
a <- c("d", "e", "f")
dat |>
  filter(foo >= agrep("a", a))
```
agrep() is not a translated function so this used to error:
```
Error in `filter()`:
! `tidypolars` doesn't know how to translate this function: `agrep()`.
```
However, we see that agrep("a", a) doesn't use any column but instead an
object in the environment so it can be evaluated without caring whether
tidypolars knows this function or not:
```
shape: (1, 1)
┌─────┐
│ foo │
│ --- │
│ f64 │
╞═════╡
│ 2.0 │
└─────┘
```
Note that this is evaluated before running polars in the background so this
expression can't benefit from polars parallel evaluation for instance.
Thanks @mgacc0 for the suggestion.
Add support for as.Date() for character columns (#190).
Error messages due to untranslated functions now suggest opening an issue to
ask for their translation (#197).
Add support for %>% in expressions (#200).
Add support for dplyr::tally() (#203).
count() and add_count() now warn or error when argument wt is used
since it is not supported. The behavior depends on the global option
tidypolars_unknown_args (#204).
tidypolars has experimental support for fallback to R when a function is not
internally translated to polars syntax. The default behavior is still to
error, but the user can now set options(tidypolars_fallback_to_r = TRUE)
to handle those unknown functions. See ?tidypolars_options for
details on the drawbacks of this approach (#205).
Large performance improvement when using selection helpers (such as
contains()) on data with many columns (#211).
tidypolars now exports rules to be used with flir for detecting deprecated
functions describe_plan() and describe_optimized_plan(). Those can be
used in your project by following this article.
Note that this requires flir 0.5.0.9000 or higher (#214).

Bug fixes

Fix behavior of mutate() and summarize() when they don't contain any
expression (#191).
Fix error in count() when it includes grouping variables (#193).
Passing . in an anonymous function in across() now works (#216).

Contributors

mgacc0 and atsyplenkov

Assets 2

10 Mar 18:01

etiennebacher

v0.13.0

4e08e45

tidypolars 0.13.0

New features

Added support for stringr::str_replace_na() (#153).
Better checks for unknown and unsupported arguments in compute(),
collect(), *_join(), pivot_*(), sink_*(), slice_sample() and
uncount()(#158, thanks @fkohrt for the report). Now, when those
functions receive:
- an argument that exists in the tidyverse implementation but not supported
  by tidypolars, they warn the user. This default behaviour can be changed
  to error instead with options(tidypolars_unknown_args = "error").
- an argument that doesn't exist at all, they error.
Add support for argument explicit in tidyr::complete().
Add option to keep track of filenames in scan_csv_polars() (#171, @ginolhac).
Add partial support for seq() (argument length.out is not supported) and
seq_len().
complete() now accepts named elements, e.g. complete(df, group, value = 1:4)
(#176).
Add support for several lubridate functions:
- am(), pm(), leap_year(), days_in_month() (#178);

Bug fixes

Fix edge cases in the tidypolars implementation of stringr::str_sub()
and substr() compared to their original implementation (#159).
arrange() now places NA values last, like dplyr.

Contributors

ginolhac and fkohrt

Assets 2

19 Nov 16:03

etiennebacher

v0.12.0

4508d4d

tidypolars 0.12.0

tidypolars requires polars >= 0.21.0.

Breaking changes

summarize() now drops the last group of the output by default (for
consistency with dplyr). Previously it kept the same groups as in the input
data (#149).

New features

Add support for argument .groups in summarize(). Value "rowwise" is not
supported for now (#149).
Added support for dplyr::lead(). In dplyr::lead() and dplyr::lag(), the
arguments default and order_by are now supported (#151).

Assets 2

17 Oct 10:20

etiennebacher

v0.11.0

9144fbf

tidypolars 0.11.0

tidypolars requires polars >= 0.20.0.

Breaking changes

arrange() now errors with unknown variable names (like dplyr::arrange()).
Previously, unknown variables were silently ignored. Using expressions (like
a + b) is now accepted (#144).
The parameter inherit_optimization is removed from all sink_*() functions.

New features

The power operators ^ and ** now work.
New function sink_ndjson() to write the results of a lazy query to a NDJSON
file without collecting it in memory.
inner_join() now accepts inequality joins in the by argument, including
the following helpers: between(), overlaps(), within() (#148).

Bug fixes

Using an external object in case_when(), ifelse() and ifelse() now works.
str_sub() doesn't error anymore when start is positive and end is negative.
read_*_polars() functions used to return a standard data.frame by mistake.
They now return a Polars DataFrame.
Using [ for subsetting in expressions now works. Thanks @ginolhac for the
report (#141).
bind_cols_polars() and bind_rows_polars() now error (as expected before) if
elements are a mix of Polars DataFrames and LazyFrames.

Contributors

ginolhac

Assets 2

Releases: etiennebacher/tidypolars

tidypolars 0.18.0

New features

Bug fixes

Contributors

Uh oh!

tidypolars 0.17.0

Breaking changes and deprecations

New features

Other changes

Bug fixes

Uh oh!

tidypolars 0.16.0

New features

Changes

Bug fixes

Documentation

Contributors

Uh oh!

tidypolars 0.15.1

Uh oh!

tidypolars 0.15.0

Breaking changes

New features

Bug fixes

Documentation

Contributors

Uh oh!

tidypolars 0.14.1

Bug fixes

Contributors

Uh oh!

tidypolars 0.14.0

Deprecations and breaking changes

New features

Bug fixes

Contributors

Uh oh!

tidypolars 0.13.0

New features

Bug fixes

Contributors

Uh oh!

tidypolars 0.12.0

Breaking changes

New features

Uh oh!

tidypolars 0.11.0

Breaking changes

New features

Bug fixes

Contributors

Uh oh!