Skip to content

feat!: Update for polars 1.0.0#194

Merged
etiennebacher merged 56 commits intomainfrom
neopolars
Jul 21, 2025
Merged

feat!: Update for polars 1.0.0#194
etiennebacher merged 56 commits intomainfrom
neopolars

Conversation

@etiennebacher
Copy link
Copy Markdown
Owner

@etiennebacher etiennebacher commented May 18, 2025

No description provided.

@etiennebacher etiennebacher marked this pull request as ready for review July 15, 2025 21:56
@etiennebacher etiennebacher changed the title fix: Prepare for neopolars feat!: Update for polars 1.0.0 Jul 16, 2025
@eitsupi
Copy link
Copy Markdown
Contributor

eitsupi commented Jul 16, 2025

Thanks for working on this. Some thoughts:

  • compute() and collect() can be rewritten with as_polars_df() and as_tibble(), and ... can be passed to these functions.
  • I recall that I had not implemented <lazyframe>$`_fetch`(). I will add it later. This is completely different from <lazyframe>$head().

@etiennebacher
Copy link
Copy Markdown
Owner Author

etiennebacher commented Jul 16, 2025

compute() and collect() can be rewritten with as_polars_df() and as_tibble(), and ... can be passed to these functions.

Thanks, but that doesn't simplify much IMO.

I recall that I had not implemented <lazyframe>$`_fetch`(). I will add it later. This is completely different from <lazyframe>$head().

This was deprecated one year ago, which is why it wasn't implemented in the rewrite. The deprecation message recommended using head + collect, hence my implementation: https://github.com/pola-rs/polars/pull/17278/files#diff-6f7ee6a0f7f99cb158f4d68d883a118b9fab6175602e8696739d900c8b0d864f

@eitsupi
Copy link
Copy Markdown
Contributor

eitsupi commented Jul 16, 2025

Thanks, but that doesn't simplify much IMO.

There are several additional arguments that are passed to.

hence my implementation

If so, why is it not marked as deprecated?

@etiennebacher
Copy link
Copy Markdown
Owner Author

If so, why is it not marked as deprecated?

It was an oversight on my side, I thought this was part of the functions exported by dplyr (like collect and compute) but it actually isn't, thanks for noticing.

@etiennebacher etiennebacher marked this pull request as draft July 16, 2025 12:39
@etiennebacher
Copy link
Copy Markdown
Owner Author

etiennebacher commented Jul 17, 2025

@eitsupi currently collect() checks empty dots. I'd like to allow passing arguments in the dots so that I don't have to keep the documentation up to date and can just say something like

... accepts all arguments to specify the conversion from polars to R. See polars:::?as_tibble.polars_lazy_frame() for accepted arguments.

The problem with this approach is that wrong arguments would be undetected since as_tibble.polars_lazy_frame() doesn't perform checks on dots:

library(polars)
library(tibble)

pl$LazyFrame(x = 1) |> as_tibble(foo = 2)
#> # A tibble: 1 × 1
#>       x
#>   <dbl>
#> 1     1

Should we add a check like that on as_tibble() for LazyFrame and DataFrame in polars?


Edit: in the meantime, I've manually added the arguments and point to ?polars:::as.data.frame.polars_lazy_frame in the docs.

@etiennebacher etiennebacher marked this pull request as ready for review July 17, 2025 21:43
@eitsupi
Copy link
Copy Markdown
Contributor

eitsupi commented Jul 18, 2025

@etiennebacher In my opinion, S3 methods should simply ignore empty dots. Otherwise, it becomes difficult to write common processes for different classes.
So <lazyframe>$collect() checks for the dots, but as_polars_df(<lazyframe>) simply ignores the dots.

As I wrote before, consider making compute() and collect() just wrappers around as_polars_df() and as_tibble() documenting dots with @inheritDotParams lazyframe__collect.
https://roxygen2.r-lib.org/articles/reuse.html#automatically-copy-tags

@etiennebacher
Copy link
Copy Markdown
Owner Author

I get an error with @inheritDotParams polars:::lazyframe__collect:

Error in FUN():
! argument "source" is missing, with no default

Not sure this works with functions accessed via :::.

In any case, I want to keep the dots check in collect(), so I won't just pass the dots around. Regarding compute(), I don't see any args in polars:::as_polars_df.polars_lazy_frame() that are not supported here so I'll just leave it like this.

Thanks for your help! I think this is ready for 1.0.0 now.

@eitsupi
Copy link
Copy Markdown
Contributor

eitsupi commented Jul 20, 2025

Sorry, I didn't understand the inheritance of arguments well.
As you say, lazyframe__collect is not a public function, so it cannot be inherited.

@etiennebacher
Copy link
Copy Markdown
Owner Author

Blocked by pola-rs/r-polars#1439, there's a substantial performance decrease on one of my test data (from 10s to 48s).

@eitsupi
Copy link
Copy Markdown
Contributor

eitsupi commented Jul 21, 2025

(from 10s to 48s)

Can such a large slowdown be explained by pl$col alone?
What about the results of profvis::profvis?

@etiennebacher
Copy link
Copy Markdown
Owner Author

etiennebacher commented Jul 21, 2025

Not all of it, but a pretty large part. Here's the flamegraph with 0.22.4:

image

and with 1.0.0:

image

(It's 36s on the picture and not 48s anymore, but the point is pl$col() is the culprit)

@eitsupi
Copy link
Copy Markdown
Contributor

eitsupi commented Jul 21, 2025

I see, I don't think this is a blocker anyway (it's caused by something already released upstream and there's nothing we can do about it here)

@etiennebacher
Copy link
Copy Markdown
Owner Author

Right, I can merge this now that the installation of polars from r-multiverse points to 1.0.0

@etiennebacher etiennebacher merged commit d1e7c90 into main Jul 21, 2025
10 checks passed
@etiennebacher etiennebacher deleted the neopolars branch July 21, 2025 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants