Skip to content

snakecase::to_any_case() causes warnings on 1st run due to UTF-8 characters #191

@cjyetman

Description

@cjyetman

Honestly, I can only reliably replicate this in one specific environment (2dii/r-packages), but it seems to be related to how UTF-8 characters are defined in replace_special_characters_internal.R.

snakecase::to_any_case("\u00E4ngstlicher Has\u00EA", transliterations = c("german", "Latin-ASCII"))
warnings()
sessionInfo()
Details R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out" Copyright (C) 2020 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

snakecase::to_any_case("\u00E4ngstlicher Has\u00EA", transliterations = c("german", "Latin-ASCII"))
[1] "aengstlicher_hase"
There were 13 warnings (use warnings() to see them)
warnings()
Warning messages:
1: In FUN(X[[i]], ...) : unable to translate '<U+00C4>' to native encoding
2: In FUN(X[[i]], ...) : unable to translate '<U+00D6>' to native encoding
3: In FUN(X[[i]], ...) : unable to translate '<U+00DC>' to native encoding
4: In FUN(X[[i]], ...) : unable to translate '<U+00E4>' to native encoding
5: In FUN(X[[i]], ...) : unable to translate '<U+00F6>' to native encoding
6: In FUN(X[[i]], ...) : unable to translate '<U+00FC>' to native encoding
7: In FUN(X[[i]], ...) : unable to translate '<U+00DF>' to native encoding
8: In FUN(X[[i]], ...) : unable to translate '<U+00C6>' to native encoding
9: In FUN(X[[i]], ...) : unable to translate '<U+00E6>' to native encoding
10: In FUN(X[[i]], ...) : unable to translate '<U+00D8>' to native encoding
11: In FUN(X[[i]], ...) : unable to translate '<U+00F8>' to native encoding
12: In FUN(X[[i]], ...) : unable to translate '<U+00C5>' to native encoding
13: In FUN(X[[i]], ...) : unable to translate '<U+00E5>' to native encoding
sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
[1] C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.0.3 magrittr_1.5 snakecase_0.11.0 tools_4.0.3
[5] stringi_1.5.3 stringr_1.4.0

I can also replicate getting these warnings with...

snakecase:::replace_special_characters_internal

or...

get("replace_special_characters_internal", envir = asNamespace("snakecase"), inherits = FALSE)

always only on the first time it's run in a session.

The warnings appear when snakecase:::replace_special_characters_internal() is loaded for the first time. I ended up here because janitor::make_clean_names() calls snakecase:::replace_special_characters_internal(), triggering these warnings.

Maybe, for instance, intToUtf8(220) would be safer than "\u00C4"? 🤷🏻
any idea what's causing this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions