[TASK] Deprecate `Parser::setCharset()` and `Parser::getCharset()` #688

oliverklee · 2024-08-27T14:19:51Z

Fixes #681

JakeQZ

The jury's still out. I don't honestly know if these are useful methods, and thus whether they should be retained or not.

@sabberworm WDYT?

Not sure the deprecation comment should reference an issue number, but maybe it should.

sabberworm · 2024-08-29T23:08:17Z

@JakeQZ I don’t know the current state of affairs regarding charset support. IIRC, the original idea was to accept lots of different charsets and convert them all to UTF-8 for parsing and but still use it when stringifying. Since the world has pretty much settled on UTF-8, I don’t think this is at all relevant anymore. Only handling UTF-8 would also rid us of the need to use the mbstring functions.

Fixes #681

oliverklee · 2024-09-01T15:09:42Z

Some notes on these two methods, and why I think we can deprecate (and then remove) them:

Neither method is used internally.
getCharset is missing the return statement and hence cannot work.
The recommended way to set the charset (as documented in the README) is to provide the parser with a corresponding Settings instance, but not by calling Parser::setCharset().

JakeQZ · 2024-09-02T23:09:49Z

OK. It seems we're in agreement that these can be deprecated.

Regarding parsing different charsets, I think we should continue to support @charset as long as it's part of the CSS specification. When stringifying, we can render UTF-8. However, this approach won't allow us to stop using the mbstring functions until @charset is deprecated in the CSS spec.

This is the backport of #688, which fixed #681.

) This is the backport of #688, which fixed #681.

sabberworm · 2024-09-05T07:07:32Z

However, this approach won't allow us to stop using the mbstring functions until @charset is deprecated in the CSS spec.

I think we could still drop mbstring, if we use iconv to convert to UTF-8 before parsing.

In essence: have some heuristic to determine the input encoding (BOM, @charset, try a few common charsets and pick the first one that doesn’t produce errors), then convert to UTF-8 and, from that point on, all the tokens of interest to us will be ASCII-only and can be parsed using regular string functions.

oliverklee · 2024-09-05T07:36:04Z

@sabberworm Sounds good! I've added #706 for this.

oliverklee added the to-backport label Aug 27, 2024

oliverklee added this to the 9.0.0: Drop support for PHP < 7.2 and clean up things milestone Aug 27, 2024

oliverklee requested review from sabberworm and JakeQZ August 27, 2024 14:19

oliverklee self-assigned this Aug 27, 2024

oliverklee mentioned this pull request Aug 27, 2024

[TASK] Mark parsing-internal classes and methods as @internal #674

Merged

JakeQZ reviewed Aug 28, 2024

View reviewed changes

oliverklee force-pushed the task/deprecate branch from d9edd5e to d07b100 Compare August 28, 2024 22:03

oliverklee force-pushed the task/deprecate branch from d07b100 to b1ccd56 Compare September 1, 2024 15:02

[TASK] Deprecate Parser::setCharset() and Parser::getCharset()

1ccd661

Fixes #681

oliverklee force-pushed the task/deprecate branch from b1ccd56 to 1ccd661 Compare September 1, 2024 15:07

JakeQZ approved these changes Sep 2, 2024

View reviewed changes

JakeQZ merged commit 1be844b into main Sep 2, 2024
21 checks passed

JakeQZ deleted the task/deprecate branch September 2, 2024 23:11

oliverklee added a commit that referenced this pull request Sep 3, 2024

[TASK] Deprecate Parser::setCharset() and Parser::getCharset()

421f64f

This is the backport of #688, which fixed #681.

oliverklee mentioned this pull request Sep 3, 2024

[TASK] Deprecate Parser::setCharset() and Parser::getCharset() #703

Merged

JakeQZ pushed a commit that referenced this pull request Sep 3, 2024

[TASK] Deprecate Parser::setCharset() and Parser::getCharset() (#703

592f416

) This is the backport of #688, which fixed #681.

oliverklee mentioned this pull request Sep 5, 2024

Add a heuristic for determining the charset #706

Open

JakeQZ mentioned this pull request Jan 25, 2025

[DOCS] Integrate the 8.x changelog into the main changelog #806

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TASK] Deprecate `Parser::setCharset()` and `Parser::getCharset()` #688

[TASK] Deprecate `Parser::setCharset()` and `Parser::getCharset()` #688

oliverklee commented Aug 27, 2024

JakeQZ left a comment

sabberworm commented Aug 29, 2024

oliverklee commented Sep 1, 2024

JakeQZ commented Sep 2, 2024

sabberworm commented Sep 5, 2024 •

edited

Loading

oliverklee commented Sep 5, 2024

[TASK] Deprecate Parser::setCharset() and Parser::getCharset() #688

[TASK] Deprecate Parser::setCharset() and Parser::getCharset() #688

Conversation

oliverklee commented Aug 27, 2024

JakeQZ left a comment

Choose a reason for hiding this comment

sabberworm commented Aug 29, 2024

oliverklee commented Sep 1, 2024

JakeQZ commented Sep 2, 2024

sabberworm commented Sep 5, 2024 • edited Loading

oliverklee commented Sep 5, 2024

[TASK] Deprecate `Parser::setCharset()` and `Parser::getCharset()` #688

[TASK] Deprecate `Parser::setCharset()` and `Parser::getCharset()` #688

sabberworm commented Sep 5, 2024 •

edited

Loading