-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Expand class-level documentation for WP_HTML_Tag_Processor
#44478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
85d23e5
Expand class-level documentation for `WP_HTML_Tag_Processor`
dmsnell c29fc77
Apply suggestions from code review
dmsnell cde4c87
Rework loop counter in example
dmsnell f47d189
Use array shorthand notation in example docs
dmsnell 58b955b
Fix upper-case/lower-case confusion with tag names.
dmsnell 1c1d09f
Reference $tags-> instead of $processor->
dmsnell aa746e6
undo rename
dmsnell File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -17,6 +17,9 @@ | |||||
* E.g. match having class `1<"2` needs to recognize `class="1<"2"`. | ||||||
* @TODO: Decode character references in `get_attribute()` | ||||||
* @TODO: Properly escape attribute value in `set_attribute()` | ||||||
* @TODO: Add slow mode to escape character entities in CSS class names? | ||||||
* (This requires a custom decoder since `html_entity_decode()` | ||||||
* doesn't handle attribute character reference decoding rules. | ||||||
* | ||||||
* @package WordPress | ||||||
* @subpackage HTML | ||||||
|
@@ -28,6 +31,152 @@ | |||||
* of patches to that input. Tokenizes HTML but does not fully | ||||||
* parse the input document. | ||||||
* | ||||||
* ## Usage | ||||||
* | ||||||
* Use of this class requires three steps: | ||||||
* | ||||||
* 1. Create a new class instance with your input HTML document. | ||||||
* 2. Find the tag(s) you are looking for. | ||||||
* 3. Request changes to the attributes in those tag(s). | ||||||
* | ||||||
* Example: | ||||||
* ```php | ||||||
* $tags = new WP_HTML_Tag_Processor( $html ); | ||||||
* if ( $tags->next_tag( [ 'tag_name' => 'option' ] ) ) { | ||||||
* $tags->set_attribute( 'selected', true ); | ||||||
* } | ||||||
* ``` | ||||||
* | ||||||
* ### Finding tags | ||||||
* | ||||||
* The `next_tag()` function moves the internal cursor through | ||||||
* your input HTML document until it finds a tag meeting any of | ||||||
* the supplied restrictions in the optional query argument. If | ||||||
* no argument is provided then it will find the next HTML tag, | ||||||
* regardless of what kind it is. | ||||||
* | ||||||
* If you want to _find whatever the next tag is_ | ||||||
* ```php | ||||||
* $tags->next_tag(); | ||||||
* ``` | ||||||
* | ||||||
* | Goal | Query | | ||||||
* |-----------------------------------------------------------|----------------------------------------------------------------------------| | ||||||
* | Find any tag. | `$tags->next_tag();` | | ||||||
* | Find next image tag. | `$tags->next_tag( [ 'tag_name' => 'img' ] );` | | ||||||
* | Find next tag containing the `fullwidth` CSS class. | `$tags->next_tag( [ 'class_name' => 'fullwidth' ] );` | | ||||||
* | Find next image tag containing the `fullwidth` CSS class. | `$tags->next_tag( [ 'tag_name' => 'img', 'class_name' => 'fullwidth' ] );` | | ||||||
* | ||||||
* If a tag was found meeting your criteria then `next_tag()` | ||||||
* will return `true` and you can proceed to modify it. If it | ||||||
* returns `false`, however, it failed to find the tag and | ||||||
* moved the cursor to the end of the file. | ||||||
* | ||||||
* Once the cursor reaches the end of the file the processor | ||||||
* is done and if you want to reach an earlier tag you will | ||||||
* need to recreate the processor and start over. The internal | ||||||
* cursor can only proceed forward, never backing up. | ||||||
* | ||||||
* #### Custom queries | ||||||
* | ||||||
* Sometimes it's necessary to further inspect an HTML tag than | ||||||
* the query syntax here permits. In these cases one may further | ||||||
* inspect the search results using the read-only functions | ||||||
* provided by the processor or external state or variables. | ||||||
* | ||||||
* Example: | ||||||
* ```php | ||||||
* // Paint up to the first five DIV or SPAN tags marked with the "jazzy" style. | ||||||
* $remaining_count = 5; | ||||||
* while ( $remaining_count > 0 && $tags->next_tag() ) { | ||||||
* if ( | ||||||
* ( 'DIV' === $tags->get_tag() || 'SPAN' === $tags->get_tag() ) && | ||||||
* 'jazzy' === $tags->get_attribute( 'data-style' ) | ||||||
* ) { | ||||||
* $tags->add_class( 'theme-style-everest-jazz' ); | ||||||
* $remaining_count--; | ||||||
* } | ||||||
* } | ||||||
* ``` | ||||||
* | ||||||
* `get_attribute()` will return `null` if the attribute wasn't present | ||||||
* on the tag when it was called. It may return `""` (the empty string) | ||||||
* in cases where the attribute was present but its value was empty. | ||||||
* For boolean attributes, those whose name is present but no value is | ||||||
* given, it will return `true` (the only way to set `false` for an | ||||||
* attribute is to remove it). | ||||||
* | ||||||
* ### Modifying HTML attributes for a found tag | ||||||
* | ||||||
* Once you've found the start of an opening tag you can modify | ||||||
* any number of the attributes on that tag. You can set a new | ||||||
* value for an attribute, remove the entire attribute, or do | ||||||
* nothing and move on to the next opening tag. | ||||||
* | ||||||
* Example: | ||||||
* ```php | ||||||
* if ( $tags->next_tag( [ 'class' => 'wp-group-block' ] ) ) { | ||||||
* $tags->set_attribute( 'title', 'This groups the contained content.' ); | ||||||
* $tags->remove_attribute( 'data-test-id' ); | ||||||
* } | ||||||
* ``` | ||||||
* | ||||||
* If `set_attribute()` is called for an existing attribute it will | ||||||
* overwrite the existing value. Similarly, calling `remove_attribute()` | ||||||
* for a non-existing attribute has no effect on the document. Both | ||||||
* of these methods are safe to call without knowing if a given attribute | ||||||
* exists beforehand. | ||||||
* | ||||||
* ### Modifying CSS classes for a found tag | ||||||
* | ||||||
* The tag processor treats the `class` attribute as a special case. | ||||||
* Because it's a common operation to add or remove CSS classes you | ||||||
adamziel marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
* can do so using this interface. | ||||||
* | ||||||
* As with attribute values, adding or removing CSS classes is a safe | ||||||
* operation that doesn't require checking if the attribute or class | ||||||
* exists before making changes. If removing the only class then the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* entire `class` attribute will be removed. | ||||||
* | ||||||
* Example: | ||||||
* ```php | ||||||
* // from `<span>Yippee!</span>` | ||||||
* // to `<span class="is-active">Yippee!</span>` | ||||||
* $tags->add_class( 'is-active' ); | ||||||
* | ||||||
* // from `<span class="excited">Yippee!</span>` | ||||||
* // to `<span class="excited is-active">Yippee!</span>` | ||||||
* $tags->add_class( 'is-active' ); | ||||||
* | ||||||
* // from `<span class="is-active heavy-accent">Yippee!</span>` | ||||||
* // to `<span class="is-active heavy-accent">Yippee!</span>` | ||||||
* $tags->add_class( 'is-active' ); | ||||||
* | ||||||
* // from `<input type="text" class="is-active rugby not-disabled" length="24">` | ||||||
* // to `<input type="text" class="is-active not-disabled" length="24"> | ||||||
* $tags->remove_class( 'rugby' ); | ||||||
* | ||||||
* // from `<input type="text" class="rugby" length="24">` | ||||||
* // to `<input type="text" length="24"> | ||||||
* $tags->remove_class( 'rugby' ); | ||||||
* | ||||||
* // from `<input type="text" length="24">` | ||||||
* // to `<input type="text" length="24"> | ||||||
* $tags->remove_class( 'rugby' ); | ||||||
* ``` | ||||||
* | ||||||
* ## Design limitations | ||||||
* | ||||||
* @TODO: Expand this section | ||||||
* | ||||||
* - no nesting: cannot match open and close tag | ||||||
* - only move forward, never backward | ||||||
* - class names not decoded if they contain character references | ||||||
* - only secures against HTML escaping issues; requires | ||||||
* manually sanitizing or escaping values based on the needs of | ||||||
* each individual attribute, since different attributes have | ||||||
* different needs. | ||||||
* | ||||||
* @since 6.2.0 | ||||||
*/ | ||||||
class WP_HTML_Tag_Processor { | ||||||
|
@@ -136,16 +285,16 @@ class WP_HTML_Tag_Processor { | |||||
* // and stops after recognizing the `id` attribute | ||||||
* // <div id="test-4" class=outline title="data:text/plain;base64=asdk3nk1j3fo8"> | ||||||
* // ^ parsing will continue from this point | ||||||
* $this->attributes = array( | ||||||
* $this->attributes = [ | ||||||
* 'id' => new WP_HTML_Attribute_Match( 'id', null, 6, 17 ) | ||||||
* ); | ||||||
* ]; | ||||||
* | ||||||
* // when picking up parsing again, or when asking to find the | ||||||
* // `class` attribute we will continue and add to this array | ||||||
* $this->attributes = array( | ||||||
* 'id' => new WP_HTML_Attribute_Match( 'id', null, 6, 17 ), | ||||||
* $this->attributes = [ | ||||||
* 'id' => new WP_HTML_Attribute_Match( 'id', null, 6, 17 ), | ||||||
* 'class' => new WP_HTML_Attribute_Match( 'class', 'outline', 18, 32 ) | ||||||
* ); | ||||||
* ]; | ||||||
* | ||||||
* // Note that only the `class` attribute value is stored in the index. | ||||||
* // That's because it is the only value used by this class at the moment. | ||||||
|
@@ -170,11 +319,11 @@ class WP_HTML_Tag_Processor { | |||||
* Example: | ||||||
* <code> | ||||||
* // Add the `WP-block-group` class, remove the `WP-group` class. | ||||||
* $class_changes = array( | ||||||
* $class_changes = [ | ||||||
* // Indexed by a comparable class name | ||||||
* 'wp-block-group' => new WP_Class_Name_Operation( 'WP-block-group', WP_Class_Name_Operation::ADD ), | ||||||
* 'wp-group' => new WP_Class_Name_Operation( 'WP-group', WP_Class_Name_Operation::REMOVE ) | ||||||
* ); | ||||||
* ]; | ||||||
* </code> | ||||||
* | ||||||
* @since 6.2.0 | ||||||
|
@@ -206,9 +355,9 @@ class WP_HTML_Tag_Processor { | |||||
* | ||||||
* // Correspondingly, something like this | ||||||
* // will appear in the replacements array. | ||||||
* $replacements = array( | ||||||
* $replacements = [ | ||||||
* WP_HTML_Text_Replacement( 14, 28, 'https://my-site.my-domain/wp-content/uploads/2014/08/kittens.jpg' ) | ||||||
* ); | ||||||
* ]; | ||||||
* </code> | ||||||
* | ||||||
* @since 6.2.0 | ||||||
|
@@ -270,9 +419,9 @@ public function next_tag( $query = null ) { | |||||
if ( 's' === $t || 'S' === $t || 't' === $t || 'T' === $t ) { | ||||||
$tag_name = $this->get_tag(); | ||||||
|
||||||
if ( 'script' === $tag_name ) { | ||||||
if ( 'SCRIPT' === $tag_name ) { | ||||||
$this->skip_script_data(); | ||||||
} elseif ( 'textarea' === $tag_name || 'title' === $tag_name ) { | ||||||
} elseif ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) { | ||||||
$this->skip_rcdata( $tag_name ); | ||||||
} | ||||||
} | ||||||
|
@@ -318,7 +467,7 @@ private function skip_rcdata( $tag_name ) { | |||||
$tag_char = $tag_name[ $i ]; | ||||||
$html_char = $html[ $at + $i ]; | ||||||
|
||||||
if ( $html_char !== $tag_char && strtolower( $html_char ) !== $tag_char ) { | ||||||
if ( $html_char !== $tag_char && strtoupper( $html_char ) !== $tag_char ) { | ||||||
$at += $i; | ||||||
continue 2; | ||||||
} | ||||||
|
@@ -937,7 +1086,7 @@ public function get_tag() { | |||||
|
||||||
$tag_name = substr( $this->html, $this->tag_name_starts_at, $this->tag_name_length ); | ||||||
|
||||||
return strtolower( $tag_name ); | ||||||
return strtoupper( $tag_name ); | ||||||
} | ||||||
|
||||||
/** | ||||||
|
@@ -1189,7 +1338,7 @@ private function matches() { | |||||
|
||||||
/* | ||||||
* Otherwise we have to check for each character if they | ||||||
* are the same, and only `strtolower()` if we have to. | ||||||
* are the same, and only `strtoupper()` if we have to. | ||||||
* Presuming that most people will supply lowercase tag | ||||||
* names and most HTML will contain lowercase tag names, | ||||||
* most of the time this runs we shouldn't expect to | ||||||
|
@@ -1199,7 +1348,7 @@ private function matches() { | |||||
$html_char = $this->html[ $this->tag_name_starts_at + $i ]; | ||||||
$tag_char = $this->sought_tag_name[ $i ]; | ||||||
|
||||||
if ( $html_char !== $tag_char && strtolower( $html_char ) !== $tag_char ) { | ||||||
if ( $html_char !== $tag_char && strtoupper( $html_char ) !== $tag_char ) { | ||||||
return false; | ||||||
} | ||||||
} | ||||||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.