Skip to content

[TASK] Deprecate Parser::setCharset() and Parser::getCharset() #688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 2, 2024

Conversation

oliverklee
Copy link
Contributor

Fixes #681

Copy link
Contributor

@JakeQZ JakeQZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jury's still out. I don't honestly know if these are useful methods, and thus whether they should be retained or not.

@sabberworm WDYT?

Not sure the deprecation comment should reference an issue number, but maybe it should.

@sabberworm
Copy link
Contributor

@JakeQZ I don’t know the current state of affairs regarding charset support. IIRC, the original idea was to accept lots of different charsets and convert them all to UTF-8 for parsing and but still use it when stringifying. Since the world has pretty much settled on UTF-8, I don’t think this is at all relevant anymore. Only handling UTF-8 would also rid us of the need to use the mbstring functions.

@oliverklee
Copy link
Contributor Author

Some notes on these two methods, and why I think we can deprecate (and then remove) them:

  • Neither method is used internally.
  • getCharset is missing the return statement and hence cannot work.
  • The recommended way to set the charset (as documented in the README) is to provide the parser with a corresponding Settings instance, but not by calling Parser::setCharset().

@JakeQZ
Copy link
Contributor

JakeQZ commented Sep 2, 2024

OK. It seems we're in agreement that these can be deprecated.

Regarding parsing different charsets, I think we should continue to support @charset as long as it's part of the CSS specification. When stringifying, we can render UTF-8. However, this approach won't allow us to stop using the mbstring functions until @charset is deprecated in the CSS spec.

@JakeQZ JakeQZ merged commit 1be844b into main Sep 2, 2024
21 checks passed
@JakeQZ JakeQZ deleted the task/deprecate branch September 2, 2024 23:11
oliverklee added a commit that referenced this pull request Sep 3, 2024
JakeQZ pushed a commit that referenced this pull request Sep 3, 2024
@sabberworm
Copy link
Contributor

sabberworm commented Sep 5, 2024

However, this approach won't allow us to stop using the mbstring functions until @charset is deprecated in the CSS spec.

I think we could still drop mbstring, if we use iconv to convert to UTF-8 before parsing.

In essence: have some heuristic to determine the input encoding (BOM, @charset, try a few common charsets and pick the first one that doesn’t produce errors), then convert to UTF-8 and, from that point on, all the tokens of interest to us will be ASCII-only and can be parsed using regular string functions.

@oliverklee
Copy link
Contributor Author

@sabberworm Sounds good! I've added #706 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate Parser::setCharset/getCharset
3 participants