Releases: etiennebacher/tidypolars
tidypolars 0.18.0
tidypolars requires polars >= 1.10.0.
New features
-
Added support for the following functions:
-
Added
decreasingandna.lastarguments support tosort()(@Yousa-Mirage, #328). -
Added
na.lastandties.methodarguments support torank()(@Yousa-Mirage, #329). -
Better error message in
filter()when a condition uses=instead of==(#341). -
count()andadd_count()now work with expressions, e.g.count(mtcars, mpg + 1)
(#346). -
as_tibble()on grouped Polars DataFrames or LazyFrames now returns a grouped
tibble (#348).
Bug fixes
-
Fix
NAhandling incummin(),cumprod(),cumsum()(@Yousa-Mirage, #326). -
Fix
NAhandling inis.finite(),is.infinite(), andis.nan()(#331). -
In
arrange(), if the data was grouped, the order was never maintained even if
maintain_order = TRUEwas passed ingroup_by(). This is now fixed (#332). -
When exporting to CSV,
null_valuesalone did not apply and could override explicitly
providednull_value. This is now fixed (@Yousa-Mirage, #334). -
Fix
sample()to make it work correctly (@Yousa-Mirage, #338). -
Fix
unite()behavior whenna.rm = TRUE(#344). -
Fix a bug in
fill()where groups set in.bywould be preserved after the
operation (hence returning a grouped output) (#348).
tidypolars 0.17.0
tidypolars requires polars >= 1.9.0 and dplyr >= 1.2.0.
Breaking changes and deprecations
-
The following functions (deprecated since 0.10.0, August 2024) are now removed
(#303):describe(), usesummary()instead.describe_plan()anddescribe_optimized_plan(), use
explain(optimized = TRUE/FALSE)instead.
-
make_unique_id()is deprecated and will be removed in a future version. This
is because the underlying Polars function isn't guaranteed to give the same
results across different versions. This function doesn't have a replacement in
tidypolars(#304). -
In
partition_by_key()andpartition_by_max_size()(both already deprecated
in 0.16.0), the argumentper_partition_sort_byhas been removed (#322).
New features
-
Added support for
dplyr::near()(#311). -
pivot_wider()now works with Polars LazyFrames (#318). -
Added support for several functions implemented in
dplyr1.2.0: -
separate()now supports regex in thesepargument (#320).
Other changes
-
Several changes to make
tidypolarsmore aligned with thetidyverseoutput
in general (#316):- in
count(), ifsort = TRUEand there are some ties, then other variables
are sorted in increasing order. coalesce()no longer has adefaultargument. This was an implementation
mistake sincedplyr::coalesce()never had this argument.ungroup()used to remove the group-specific attributes in the original
grouped data, even if the result of the operation was not assigned. This is
fixed.replace_na()on a Polars DataFrame or LazyFrame now errors ifreplacement
is not a list.slice_*()functions on grouped data return columns in the same order as in
the input.summarize()with onlyNULLexpressions now returns one row per unique
group instead of the entire data.unite()now returns columns in the correct order, and doesn't duplicate the
sepin the output if some values areNA.
- in
Bug fixes
bind_rows_polars()now uses input names in.idif not all inputs are named,
for examplebind_rows_polars(x1 = x1, x2, .id = "id")(#317).
tidypolars 0.16.0
tidypolars requires polars >= 1.8.0.
New features
-
New function
unnest_longer_polars()to unnest list-columns into rows,
equivalent totidyr::unnest_longer(). It supports the parametersvalues_to,
indices_to,keep_empty, as well as the{col}templates for column
naming. (#212, #281, @Yousa-Mirage) -
New functions
separate_longer_delim_polars()andseparate_longer_position_polars()
to split string columns into rows by delimiter or fixed width, equivalent to
tidyr::separate_longer_delim()andtidyr::separate_longer_position().
(#57, #285, @Yousa-Mirage) -
New argument
.byinfill()(this was introduced intidyr1.3.2). (#283) -
wday()now supports arbitraryweek_startvalues (1~7), allowing for
custom week start days. (#292, @Yousa-Mirage) -
Add support for argument
typeinnchar(#288). -
It is now possible to use translated functions without loading the package
they come from. For example, the following code can run without loading
stringrin the session:data |> mutate(y = .tp$str_extract_stringr(x, "\\d+"))
This can be useful to benefit from
polarsspeed while using the interface of
tidyversefunctions, without adding additionaltidyversedependencies. This
may be useful to avoid installing extra dependencies, but it is not the
recommended usage because it makes it harder to converttidypolarscode to
run with othertidyverse-based backends. More information with?.tp(#293). -
New argument
mkdirinwrite_parquet_polars()(this already existed in
sink_parquet()). (#298) -
New (experimental) function
partition_by()to write partitioned output in
sink_*()andwrite_*_polars(). The following functions are deprecated and
will be removed in a future release (#299):partition_by_key()can be replaced withpartition_by(key =)partition_by_max_size()can be replaced withpartition_by(max_rows_per_file =)
Changes
collect()now returns atibbleinstead of adata.frame, for consistency
with othercollect()methods (#273).
Bug fixes
arrange()now works with literal values, such asarrange(x, 1:2)(#296).
Documentation
- Removed the "FAQ" vignette, which was outdated and wasn't particularly helpful.
tidypolars 0.15.1
tidypolars requires polars >= 1.6.0.
tidypolars 0.15.0
Breaking changes
- For consistency with
dplyr,distinct()now only keeps the selected columns.
To keep all columns, use.keep_all = TRUE(#227, @ppanko).
New features
-
New argument
mkdirin allsink_*()functions to recursively create the
folder(s) specified in the path(s) to files (#236). -
New functions
partition_by_key()andpartition_by_max_size()that can be
used in thepathargument ofsink_*()functions. Those enable writing a
LazyFrame to several files as partitioned output. See more details in
?sink_parquet()(#237). -
bind_cols_polars()now works with more than two LazyFrames (#244). -
Add support for
gsub()(#250). -
Add partial support for
stringr::str_equal()(#228). -
Add support for
lubridatefunctionsrollbackward(),rollback(), androllforward()(#252). -
Support
stringr::fixed()in morestringrfunctions (#250). -
Add support for argument
ignore.caseingrepl()(#251). -
Add support for argument
.keep_allindistinct()(#227, @ppanko).
Bug fixes
-
Better error message in
group_by()for unsupported argument.drop(#230). -
Better error message in
group_by()when passing named expressions in....
dplyrsupports those but it is more and more recommended to use the.by/
byargument in individual functions rather than usinggroup_by()and
ungroup()(#238). -
Better error message in
count()when passing named expressions in...(#239). -
Fix bug in
join_where()when all common column names between two DataFrames
are used in the join conditions (#254). -
Using
%in%withNAnow retains theNAin the data. Using%in% NAwill
error (#256). -
Remove occasional deprecation message coming from Polars when using
%in%
(#259, @ppanko). -
Better handling of functions prefixed with
<pkg>::(#261). -
Fix wrong behavior of
paste()andpaste0()withcollapse(#263).
Documentation
tidypolars 0.14.1
tidypolarsrequirespolars>= 1.1.0 (#222).
Bug fixes
-
Fix a corner case when
filter()was used in a custom function with missing
arguments (#220). -
In
grepl(), the argumentfixedis now used correctly (thanks @gernophil
for the report, #223). -
if_else()andifelse()now work when using named arguments (#224).
tidypolars 0.14.0
tidypolarsrequirespolars>= 1.0.0. This release ofpolarscontains
many breaking changes. Those should be invisible totidypolarsusers, with
the exception of deprecation messages (see below). However, if your code
contains user-defined functions that usepolarssyntax, you may need to
revise those (#194).
Deprecations and breaking changes
-
The following arguments are deprecated and will be removed in a future
version. The recommended replacement is indicated on the right of the arrow
(#194):- in
compute()andcollect():streaming->engine; - in
read_csv_polars()andscan_csv_polars():dtypes->schema_overridesreuse_downloaded-> no replacement
- in
read_ndjson_polarsandscan_ndjson_polars():reuse_downloaded-> no replacement
- in
read_ipc_polarsandscan_ipc_polars():memory_map-> no replacement
- in
write_csv_polars()andsink_csv():null_values->null_valuequote->quote_char
- in
write_ndjson_polars():pretty-> no replacementrow_oriented-> no replacement
- in
write_ipc_polars():future->compat_level
- in
-
fetch()is deprecated, usehead()beforecollect()instead (#194). -
group_keys()now returns atibbleand not adata.frameanymore (#194). -
lubridate::make_date(),lubridate::make_datetime(), andISOdatetime()
now error if some components go over their expected range, e.g.month = 20
orhour = 25. Before, those functions were returningNAin this situation
(#194). -
summary()returns an additional row for the 50% percentile (#194).
New features
-
Added support for various
lubridatefunctions:force_tz()andwith_tz()(@atsyplenkov, #170);date()(@atsyplenkov, #181);today()andnow()(#183);weeks(),days(),hours(),minutes(),seconds(),milliseconds(),
microseconds(),nanoseconds()(#184).
-
tidypolarscan now use expressions that contain non-translated functions
if those expressions do not use columns from the data.Example:
dat <- pl$DataFrame(foo = c(2, 1, 2)) a <- c("d", "e", "f") dat |> filter(foo >= agrep("a", a))
agrep()is not a translated function so this used to error:Error in `filter()`: ! `tidypolars` doesn't know how to translate this function: `agrep()`.However, we see that
agrep("a", a)doesn't use any column but instead an
object in the environment so it can be evaluated without caring whether
tidypolarsknows this function or not:shape: (1, 1) ┌─────┐ │ foo │ │ --- │ │ f64 │ ╞═════╡ │ 2.0 │ └─────┘Note that this is evaluated before running
polarsin the background so this
expression can't benefit frompolarsparallel evaluation for instance.
Thanks @mgacc0 for the suggestion. -
Add support for
as.Date()for character columns (#190). -
Error messages due to untranslated functions now suggest opening an issue to
ask for their translation (#197). -
Add support for
%>%in expressions (#200). -
Add support for
dplyr::tally()(#203). -
count()andadd_count()now warn or error when argumentwtis used
since it is not supported. The behavior depends on the global option
tidypolars_unknown_args(#204). -
tidypolarshas experimental support for fallback to R when a function is not
internally translated to polars syntax. The default behavior is still to
error, but the user can now setoptions(tidypolars_fallback_to_r = TRUE)
to handle those unknown functions. See?tidypolars_optionsfor
details on the drawbacks of this approach (#205). -
Large performance improvement when using selection helpers (such as
contains()) on data with many columns (#211). -
tidypolarsnow exports rules to be used withflirfor detecting deprecated
functionsdescribe_plan()anddescribe_optimized_plan(). Those can be
used in your project by following this article.
Note that this requiresflir0.5.0.9000 or higher (#214).
Bug fixes
tidypolars 0.13.0
New features
-
Added support for
stringr::str_replace_na()(#153). -
Better checks for unknown and unsupported arguments in
compute(),
collect(),*_join(),pivot_*(),sink_*(),slice_sample()and
uncount()(#158, thanks @fkohrt for the report). Now, when those
functions receive:- an argument that exists in the
tidyverseimplementation but not supported
bytidypolars, they warn the user. This default behaviour can be changed
to error instead withoptions(tidypolars_unknown_args = "error"). - an argument that doesn't exist at all, they error.
- an argument that exists in the
-
Add support for argument
explicitintidyr::complete(). -
Add option to keep track of filenames in
scan_csv_polars()(#171, @ginolhac). -
Add partial support for
seq()(argumentlength.outis not supported) and
seq_len(). -
complete()now accepts named elements, e.g.complete(df, group, value = 1:4)
(#176). -
Add support for several
lubridatefunctions:am(),pm(),leap_year(),days_in_month()(#178);
Bug fixes
-
Fix edge cases in the
tidypolarsimplementation ofstringr::str_sub()
andsubstr()compared to their original implementation (#159). -
arrange()now placesNAvalues last, likedplyr.
tidypolars 0.12.0
tidypolars requires polars >= 0.21.0.
Breaking changes
summarize()now drops the last group of the output by default (for
consistency withdplyr). Previously it kept the same groups as in the input
data (#149).
New features
tidypolars 0.11.0
tidypolars requires polars >= 0.20.0.
Breaking changes
-
arrange()now errors with unknown variable names (likedplyr::arrange()).
Previously, unknown variables were silently ignored. Using expressions (like
a + b) is now accepted (#144). -
The parameter
inherit_optimizationis removed from allsink_*()functions.
New features
-
The power operators
^and**now work. -
New function
sink_ndjson()to write the results of a lazy query to a NDJSON
file without collecting it in memory. -
inner_join()now accepts inequality joins in thebyargument, including
the following helpers:between(),overlaps(),within()(#148).
Bug fixes
-
Using an external object in
case_when(),ifelse()andifelse()now works. -
str_sub()doesn't error anymore whenstartis positive andendis negative. -
read_*_polars()functions used to return a standarddata.frameby mistake.
They now return a Polars DataFrame. -
Using
[for subsetting in expressions now works. Thanks @ginolhac for the
report (#141). -
bind_cols_polars()andbind_rows_polars()now error (as expected before) if
elements are a mix of Polars DataFrames and LazyFrames.