Skip to content

feat: timezone conversion and assertion#170

Merged
etiennebacher merged 19 commits intoetiennebacher:mainfrom
atsyplenkov:lubridate
Mar 31, 2025
Merged

feat: timezone conversion and assertion#170
etiennebacher merged 19 commits intoetiennebacher:mainfrom
atsyplenkov:lubridate

Conversation

@atsyplenkov
Copy link
Copy Markdown
Contributor

Hey @etiennebacher 👋

I decided to start by adding more support to lubridate package. I can see that it is a bit less structured than the other code in your package, so I am unsure how to best write it. If you are happy with the following approach, I will continue filling tidypolars with lubridate functions.

P.S. Note that I have added the check_timezone function, which throws an error earlier if the timezone code is not supported by polars. TIL that some polars functions allow a NULL timezone, while others do not.

@etiennebacher
Copy link
Copy Markdown
Owner

Thanks, just so you know I won't be able to review this for a few days

@etiennebacher
Copy link
Copy Markdown
Owner

etiennebacher commented Feb 1, 2025

Hi @atsyplenkov, the PR looks good but I'd like the testing to be a bit more thorough. I have been adding more property-based testing to tidypolars, including for lubridate functions. Those can be found in test-funs-date.R while regular unit tests are in test-funs_date.R (I know, those names could be made more explicit).

If you're not familiar with property-based testing in R, I have a blog post on this: https://www.etiennebacher.com/posts/2024-10-01-using-property-testing-in-r/

For example, for force_tz(), here's what it looks like (combined with parametric testing using patrick):

patrick::with_parameters_test_that(
  "changing timezone works", {
    for_all(
      tests = 20,
      datetime = posixct_(any_na = TRUE),
      property = function(datetime) {
        test_df <- data.frame(x1 = ymd_hms(datetime, tz = "UTC"))
        test <- pl$DataFrame(x1 = ymd_hms(datetime, tz = "UTC"))

        expect_equal_or_both_error(
          mutate(
            test,
            x1 = force_tz(x1, tz)
          ),
          mutate(
            test_df,
            x1 = force_tz(x1, tz)
          ),
          tolerance = 1e-6
        )
      }
    )
  },
  tz = list("Pacific/Auckland", "foobar", "", NA, c("Pacific/Auckland", "Africa/Abidjan"))
)

Running this may already uncover some bugs in the current implementation. It's more tedious to write and to pass but when it does then we can be confident that tidypolars and lubridate give the same results for a wide array of cases.

Could you rework the tests to include this type of testing? Don't hesitate to ask if you run into some issues while doing this. Note that if you need to convert some polars expressions to R values for the tzone argument then you can use polars_expr_to_r().

@atsyplenkov
Copy link
Copy Markdown
Contributor Author

Thanks for the review. I am not familiar with property-based testing, but your blog post looks promising. I'll revise the PR during this week and come back to you

@atsyplenkov
Copy link
Copy Markdown
Contributor Author

@etiennebacher I cannot yet write the property-based testing yet, because the current implementation of with_tz and force_tz don't work when tzone argument passed externally, i.e.:

library(tidyverse)
devtools::load_all()

TimeZone <- base::OlsonNames()[100]
TimeZone
#> "America/Creston"

test_df <- data.frame(
  dt_utc = c(
    ymd_hms(
      c(
        "2012-03-26 12:00:00",
        "2020-01-01 12:00:00",
        "2023-12-14 12:00:00",
        NA
      ),
      tz = "UTC"
    )
  )
)

as_polars_df(test_df) |>
  mutate(
    dt_utc = with_tz(dt_utc, "America/Creston")
  )

#> shape: (4, 1)
#> ┌───────────────────────────────┐
#> │ dt_utc                        │
#> │ ---                           │
#> │ datetime[ms, America/Creston] │
#> ╞═══════════════════════════════╡
#> │ 2012-03-26 05:00:00 MST       │
#> │ 2020-01-01 05:00:00 MST       │
#> │ 2023-12-14 05:00:00 MST       │
#> │ null                          │
#> └───────────────────────────────┘

as_polars_df(test_df) |>
  mutate(
    dt_utc = with_tz(dt_utc, TimeZone)
  )

#> Error in `mutate()`:
#>   ! Error while running function `with_tz()` in Polars.
#>   ✖ Invalid 'x' type in 'x && y'
#>   Hide Traceback
#>        ▆
#>     1. ├─dplyr::mutate(...)
#>     2. └─tidypolars:::mutate.RPolarsDataFrame(as_polars_df(test_df), dt_utc = with_tz(dt_utc, TimeZone))
#>     3.   └─tidypolars:::translate_dots(...) at tidypolars/R/mutate.R:117:3
#>     4.     └─base::lapply(...) at tidypolars/R/utils-expr.R:36:3
#>     5.       └─tidypolars (local) FUN(X[[i]], ...)
#>     6.         └─tidypolars:::translate_expr(...) at tidypolars/R/utils-expr.R:46:5
#>     7.           └─tidypolars:::translate(...) at tidypolars/R/utils-expr.R:145:5
#>     8.             └─base::tryCatch(...) at tidypolars/R/utils-expr.R:622:7
#>     9.               └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#>    10.                 └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#>    11.                   └─value[[3L]](cond)
#>    12.                     └─rlang::abort(...) at tidypolars/R/utils-expr.R:638:13

I cannot wrap my head around on how to solve this issue.

@etiennebacher
Copy link
Copy Markdown
Owner

etiennebacher commented Feb 7, 2025

The issue is that when we handle tz in check_timezone(), it has already been converted to a polars expression, e.g. pl$lit(tz). Therefore we cannot use length(tz) for instance. The solution is to convert this back to an R expression using polars_expr_to_r(). This is a bit unfortunate but I don't have better solution for now because some function arguments need to be translated (e.g. x in mean(x, na.rm = TRUE)) but others don't (e.g. na.rm in mean(x, na.rm = TRUE)), and I don't have a way of knowing this in advance.

I also just realized that lubridate allows several timezones in a single column, for instance by passing tz = c("Europe/London", "Pacific/Auckland"), but Polars doesn't. I have added a message for this, and it should be mentioned in the table notes.

This now works:

patrick::with_parameters_test_that(
  "changing timezone works", {
    for_all(
      tests = 20,
      datetime = posixct_(any_na = TRUE),
      property = function(datetime) {
        test_df <- data.frame(x1 = ymd_hms(datetime, tz = "UTC"))
        test <- pl$DataFrame(x1 = ymd_hms(datetime, tz = "UTC"))

        expect_equal_or_both_error(
          mutate(
            test,
            x1 = force_tz(x1, tz)
          ),
          mutate(
            test_df,
            x1 = force_tz(x1, tz)
          ),
          tolerance = 1e-6
        )
      }
    )
  },
  tz = list("Pacific/Auckland", "foobar", "", NA)
)

Can you also:

  • add tests for the various error messages in check_timezone()
  • add tests for with_tz()
  • add those two functions in the table of translated functions in the vignette linked above, mentioning the caveat regarding multiple timezones
  • add an item in NEWS

?

@atsyplenkov
Copy link
Copy Markdown
Contributor Author

Thanks! I'll do it.

@etiennebacher
Copy link
Copy Markdown
Owner

etiennebacher commented Feb 19, 2025

@atsyplenkov just FYI I plan on doing a new release before mid-March so let me know if you need some help to finish this PR before that.

@atsyplenkov
Copy link
Copy Markdown
Contributor Author

@etiennebacher Hey, sorry for taking so long. I have a very tight deadline at work and couldn't allocate enough time to finish this PR. I should have some time after the 25th of February, and I can submit my revisions then.

@etiennebacher
Copy link
Copy Markdown
Owner

No worries, I'm not in a rush at all, I just wanted to inform you in case those functions are important in some of your project . Take your time, good luck with work 🙂

@atsyplenkov
Copy link
Copy Markdown
Contributor Author

Hey, just a quick update—I haven't forgotten about this PR, I'm just still extremely busy.

@etiennebacher
Copy link
Copy Markdown
Owner

No worries, finish when you can. I'll make a new release since there are interesting fixes and features I think.

@atsyplenkov
Copy link
Copy Markdown
Contributor Author

Gee, what a rabbit hole of the unrecognized timezone handling. TBH, I don't know what to do.

The problem is that polars does not allow replacing the timezone with a gibberish, while lubridate (and R in general) allows. At the same time, with_tz() can handle it, but force_tz() cannot handle (see issue tidyverse/lubridate#1186).

Therefore, I am keen to throw an early error when an unrecognized timezone is passed. This will be reflected in the number of property-based tests we can implement. See my comment there

Apart from that (sorry for the useless commit name, I accidentally hit enter too early):

  • added tests for the various error messages in check_timezone()
  • added property tests for with_tz() and force_tz()
  • updated the documentation.

@@ -1,3 +1,6 @@
# TODO:
# Sometimes this test fails, e.g. try seed == 1
set.seed(123)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine, the "extracting date components works" test is failing with set.seed(1). Can you check if it the case with you as well?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an error in polars:

date <- as.POSIXct("2493-05-15 00:00:00 CEST")

data.frame(x1 = date)
#>           x1
#> 1 2493-05-15
polars::pl$DataFrame(x1 = date)
#> shape: (1, 1)
#> ┌─────────────────────┐
#> │ x1                  │
#> │ ---                 │
#> │ datetime[ms]        │
#> ╞═════════════════════╡
#> │ 2493-05-14 23:00:00 │ # <<<<<<<<< shows 2493-05-14 and not 2493-05-15
#> └─────────────────────┘

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, you've got an eagle eye!

# TODO: see https://github.com/tidyverse/lubridate/issues/1186
# If lubridate changed the behavior to error, then we can add "foo" and ""
tz = list("Pacific/Auckland", "US/Alaska", NA)
tz = list("Pacific/Auckland", "Europe/Rome", NA)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like "US/Alaska" in not in OlsonNames()

@etiennebacher
Copy link
Copy Markdown
Owner

The problem is that polars does not allow replacing the timezone with a gibberish, while lubridate (and R in general) allows. At the same time, with_tz() can handle it, but force_tz() cannot handle (see issue tidyverse/lubridate#1186).

Thanks for opening the upstream issue, I think this should error so polars does the right thing IMO. Let's uncomment the test if lubridate changes the behavior to error instead.

I'll wrap up the PR, many thanks for the contribution!

@atsyplenkov
Copy link
Copy Markdown
Contributor Author

Awesome, thanks for your guidance and help!

Just to let you know, I started wrapping some other lubridate functions. Hope that their implementation will go smoother

@etiennebacher
Copy link
Copy Markdown
Owner

Great, looking forward to seeing them!

@etiennebacher etiennebacher merged commit 8286564 into etiennebacher:main Mar 31, 2025
8 checks passed
@atsyplenkov atsyplenkov deleted the lubridate branch April 17, 2025 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants