Skip to content
Draft
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
805fb55
WIP: HTML API: Add HTML Spec class to convey information related to t…
dmsnell Feb 24, 2023
84e35ef
WIP: HTML API: Expose self-closing flag in Tag Processor
dmsnell Mar 2, 2023
84e25cf
Expand documentation
dmsnell Mar 2, 2023
c18a81a
Appease the linting gods
dmsnell Mar 2, 2023
af8d49d
Add extra tests for syntax peculiarities.
dmsnell Mar 2, 2023
72e5de0
Appease the linting gods
dmsnell Mar 2, 2023
095a110
Appease the linting gods
dmsnell Mar 2, 2023
9b3f90d
Appease the linting gods
dmsnell Mar 2, 2023
09e16b2
Merge branch 'html-api/tag-processor-self-closing-flag' into html-api…
dmsnell Mar 7, 2023
396285b
Add `ensure_support()` and tests
dmsnell Mar 7, 2023
b2e6856
WIP: Work on next_sibling
dmsnell Mar 7, 2023
65c9f5b
Add the rest of it
dmsnell Mar 10, 2023
f5e4d5e
Merge branch 'trunk' into html-api/add-html-processor
dmsnell Mar 21, 2023
247966b
Rename "next_child" to "first_child" to better match its purpose
dmsnell Mar 21, 2023
f1ccd9b
Ensure no children are found for void elements.
dmsnell Mar 22, 2023
fb31ea1
rename next_child to first_child in test names
dmsnell Mar 22, 2023
2af46d0
Merge branch 'trunk' into html-api/add-html-processor
dmsnell Mar 25, 2023
e945f16
Add Trac ticket reference to tests
dmsnell Mar 29, 2023
96053c9
fixup! Add Trac ticket reference to tests
dmsnell Mar 29, 2023
da7880a
Merge branch 'trunk' into html-api/add-html-processor
dmsnell Mar 29, 2023
264ae35
Merge branch 'html-api/tag-processor-self-closing-flag' into html-api…
dmsnell Mar 29, 2023
04a37b5
Linting issues
dmsnell Mar 29, 2023
0975e34
Wrap bookmarking to create special internal bookmarks used in HTML tr…
dmsnell Apr 20, 2023
d30f766
Merge remote-tracking branch 'upstream/trunk' into html-api/add-html-…
dmsnell Apr 20, 2023
cafee03
Remove `ensure_support`
dmsnell Apr 20, 2023
67bef48
Introduce step function and insertion mode
dmsnell Apr 20, 2023
be6e901
Create some IS_SPECIAL flags
dmsnell May 3, 2023
9980a36
Merge remote-tracking branch 'upstream/trunk' into html-api/add-html-…
dmsnell May 3, 2023
2e874e4
Some bookmarking stuff
dmsnell May 4, 2023
d0e3f4a
Merge branch 'trunk' into html-api/add-html-processor
dmsnell May 4, 2023
ece3a68
I think we need a separate actual stack for open elements
dmsnell May 4, 2023
a5284d6
Merge branch 'trunk' into html-api/add-html-processor
dmsnell May 8, 2023
74c9e6b
Merge branch 'trunk' into html-api/add-html-processor
dmsnell May 8, 2023
a4569cd
Merge branch 'trunk' into html-api/add-html-processor
dmsnell May 16, 2023
98180bc
Add WIP stack class, trap exceptions in `step()` to allow nested call…
dmsnell May 17, 2023
0e3ada5
Play with an alternate matching syntax
dmsnell May 17, 2023
739b4aa
Add reset_insertion_mode_appropriately
dmsnell May 17, 2023
2b263b5
Merge remote-tracking branch 'upstream/trunk' into html-api/add-html-…
dmsnell May 17, 2023
e1ec10c
Merge branch 'trunk' into html-api/add-html-processor
dmsnell May 20, 2023
bc61ea8
Add custom "Unsupported Exception" class to differentiate from other …
dmsnell May 20, 2023
6f58b70
Set template insertion mode stack to empty array, not null
dmsnell May 20, 2023
d82b146
Expand SPEC data for IS_SPECIAL and IS_OBSOLETE
dmsnell May 21, 2023
a6e5323
Docs for SPEC and some minor refactoring
dmsnell May 21, 2023
8a2215e
Add algorithms for handling list of active formatting elements.
dmsnell May 21, 2023
d96c05a
Add docs for HEAD and FORM pointers.
dmsnell May 21, 2023
c93d420
Generate implied end tags, poorly and naively.
dmsnell May 21, 2023
2e0e4f9
More docs, remove something not necessary.
dmsnell May 21, 2023
f5c7651
Introduce static factory methods
dmsnell May 22, 2023
3956ad9
Access private methods and properties of friend class
dmsnell May 23, 2023
37ce95e
Merge branch 'trunk' into html-api/add-html-processor
dmsnell May 24, 2023
613ae1a
Replace goto:ignored with recursion
dmsnell May 24, 2023
46e3777
Continue adding IN BODY handling.
dmsnell May 25, 2023
a1a095c
Merge branch 'trunk' into html-api/add-html-processor
dmsnell May 26, 2023
c2dcc37
Merge remote-tracking branch 'upstream/trunk' into html-api/add-html-…
dmsnell May 28, 2023
c29570e
More support
dmsnell May 29, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
259 changes: 259 additions & 0 deletions src/wp-includes/html-api/class-wp-html-processor.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
<?php

class WP_HTML_Processor extends WP_HTML_Tag_Processor {
public $fully_supported_input = null;
public $open_elements = array();

public function ensure_support() {
if ( null !== $this->fully_supported_input ) {
return $this->fully_supported_input;
}

$stack = array();

$p = new WP_HTML_Tag_Processor( $this->html );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart! I kind of don't like that it throws away the progress we have made so far, though

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? my vision for this is that it can grow with our added support.

as for running a first-pass through the document I'm not claiming this is certainly needed. I'm sure it's possible to work it in alongside the normal traversal.

for now though it helped quite a bit, and I figured the cost is meager given that it's a read-only scan.

while ( $p->next_tag( array( 'tag_closers' => 'visit' ) ) ) {
$tag_name = $p->get_tag();

if ( ! $p->is_tag_closer() ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are void elements closers?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they're not, that's why this code branch works

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you probably realized, tag closers only exist when the tag name is preceded by a /, which is distinct from the self-closing flag which appears right before the > closing the tag opener

$element = WP_HTML_Spec::element_info( $tag_name );

$self_closes = $element::is_void || ( ! $element::is_html && $p->has_self_closing_flag() );
if ( ! $self_closes ) {
$stack[] = $tag_name;
}
} else {
if ( end( $stack ) === $tag_name ) {
array_pop( $stack );
continue;
}

$this->fully_supported_input = false;
return false;
}
}

$this->fully_supported_input = 0 === count( $stack );

return $this->fully_supported_input;
}

public function next_tag( $query = null ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we could make it private in the descendant class. It doesn't even make sense to have it public, it can't support the same set of queries as the parent, and it's a pain to deal with PHP's idea of polymorphism where the parent class will call the child implementation of this method anyway.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funny enough I added _doing_it_wrong() here before realizing that I call it from within this tag processor.

is it the tag processor that's calling this?

the one thing I wish we could do is remove $query but it wouldn't let me. still some thinking about this one to do.

if ( false === $this->fully_supported_input || false === $this->ensure_support() ) {
return false;
}

if ( 0 < count( $this->open_elements ) ) {
$element = WP_HTML_Spec::element_info( end( $this->open_elements ) );
// @TODO: Handle self-closing HTML foreign elements: must convey self-closing flag on stack.
if ( $element::is_void ) {
array_pop( $this->open_elements );

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it say if ( ! element::is_void )?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not? I don't get why void elements have any interaction with the open elements stack

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, why are void elements landing on the open elements stack, though?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something to document later and it appeared towards the end of the day. I think I've seen this in the HTML spec for tree-building.

the problem is that we have to push the void element to the open stack in order to get proper depth information and climb out of the nesting we're in. this looks funny here because we aren't currently tracking if the last element on the stack requires no closer (is void or self-closing foreign element). that means whenever we approach the next tag we have to immediately remove any void elements because they cannot enclose over other tags.

it's definitely unexpected until you reason through it, or as I did, get reminded by it with failing tests.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec, popping should happen in the same step as the insertion:

A start tag whose tag name is one of: "area", "br", "embed", "img", "keygen", "wbr"
Reconstruct the active formatting elements, if any.

Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.

Acknowledge the token's self-closing flag, if it is set.

Set the frameset-ok flag to "not ok".

}
}

if ( false === parent::next_tag( array( 'tag_closers' => 'visit' ) ) ) {
return false;
}

$tag_name = $this->get_tag();
$element = WP_HTML_Spec::element_info( $tag_name );

$self_closes = $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() );
if ( $self_closes ) {
$this->open_elements[] = $tag_name;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this part, it's not really an open element. Even in the parsing spec these are only added to the stack for brevity of the document – they're always immediately popped in the same step and before ingesting the next token

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's essential: see above. may not be able to articulate it well yet, but I think one case we might examine is finding the next_sibling() of an <img>. if we don't push void elements onto the stack we'll be looking for a tag that's a parent of the img instead of a sibling.

I think this will matter also for CSS selection. we will need to know the full "tree path" (a term I'm making up here without intending to convey specific information about it) in order to match our current position against the selector.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw it's not what the spec says, is it? Depending on the insertion mode and element name, we may or may not have to push it to the stack of open elements. There may be other operations to do before pushing, too, and the order matters.

return true;
}

if ( $this->is_tag_closer() ) {
array_pop( $this->open_elements );
} else {
$this->open_elements[] = $tag_name;
}

return true;
}

public function next_sibling() {
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) {
return false;
}

$starting_depth = count( $this->open_elements );

while ( $this->next_tag() ) {
$current_depth = count( $this->open_elements );

if ( ! $this->is_tag_closer() && $current_depth === $starting_depth ) {
return true;
}

if ( ! $this->is_tag_closer() && $current_depth < $starting_depth ) {
return false;
}
}

return false;
}

public function next_child() {
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) {
return false;
}

$starting_depth = count( $this->open_elements );

while ( $this->next_tag() ) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should bale out if between the $starting_depth and the child we see anything with $current_depth <= $starting_depth as that would imply the initial tag was closed

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the ensure_support() function that runs I don't think it's possible for this to happen, but I think I can defer some of the support function. I've added a test to make sure, though I haven't yet added any non-HTML elements in the tests.

$current_depth = count( $this->open_elements );

if ( ! $this->is_tag_closer() && $current_depth === $starting_depth + 1 ) {
return true;
}
}

return false;
}

private function find_closing_tag() {
$starting_depth = count( $this->open_elements );

while ( $this->next_tag() ) {
$current_depth = count( $this->open_elements );

if ( $this->is_tag_closer() && $current_depth < $starting_depth ) {
return true;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here for the depth check I mentioned above

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all garbage that will go away

}
}

return false;
}

public function get_inner_content() {
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) {
return false;
}

if ( ! $this->get_tag() || $this->is_tag_closer() ) {
return false;
}

$element = WP_HTML_Spec::element_info( $this->get_tag() );
if ( $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ) ) {
return false;
}

// @TODO: Unique bookmark names
$this->set_bookmark( 'start' );
if ( ! $this->find_closing_tag() ) {
return false;
}
$this->set_bookmark( 'end' );

$start = $this->bookmarks['start']->end + 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's one usage of the end bookmark property, but we could do without it by just storing this in a variable a few lines above. (I'm referring to the idea of ditching the end property)

$end = $this->bookmarks['end']->start - 1;
$inner_content = substr( $this->html, $start, $end - $start + 1 );

$this->release_bookmark( 'start' );
$this->release_bookmark( 'end' );

return $inner_content;
}

public function set_inner_content( $new_html ) {
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) {
return false;
}

if ( ! $this->get_tag() || $this->is_tag_closer() ) {
return false;
}

$element = WP_HTML_Spec::element_info( $this->get_tag() );
if ( $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ) ) {
return false;
}

// @TODO: Unique bookmark names
$this->set_bookmark( 'start' );
if ( ! $this->find_closing_tag() ) {
return false;
}
$this->set_bookmark( 'end' );

$start = $this->bookmarks['start']->end + 1;
$end = $this->bookmarks['end']->start;
$this->lexical_updates[] = new WP_HTML_Text_Replacement( $start, $end, $new_html );
$this->get_updated_html();
$this->seek( 'start' );

$this->release_bookmark( 'start' );
$this->release_bookmark( 'end' );
}

public function get_outer_content() {
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) {
return false;
}

if ( ! $this->get_tag() || $this->is_tag_closer() ) {
return false;
}

$element = WP_HTML_Spec::element_info( $this->get_tag() );
if ( $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ) ) {
$this->set_bookmark( 'start' );
$here = $this->bookmarks['start'];
return substr( $this->html, $here->start, $here->end - $here->start + 1 );
}

// @TODO: Unique bookmark names
$this->set_bookmark( 'start' );
if ( ! $this->find_closing_tag() ) {
return false;
}
$this->set_bookmark( 'end' );

$start = $this->bookmarks['start']->start;
$end = $this->bookmarks['end']->end;
$inner_content = substr( $this->html, $start, $end - $start + 1 );
$this->seek( 'start' );

$this->release_bookmark( 'start' );
$this->release_bookmark( 'end' );

return $inner_content;
}

public function set_outer_content( $new_html ) {
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) {
return false;
}

if ( ! $this->get_tag() || $this->is_tag_closer() ) {
return false;
}

$element = WP_HTML_Spec::element_info( $this->get_tag() );
// @TODO: Replace void and self-closing tags.
if ( $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ) ) {
return false;
}

// @TODO: Unique bookmark names
$this->set_bookmark( 'start' );
if ( ! $this->find_closing_tag() ) {
return false;
}
$this->set_bookmark( 'end' );

$start = $this->bookmarks['start']->start;
$end = $this->bookmarks['end']->end + 1;
$this->lexical_updates[] = new WP_HTML_Text_Replacement( $start, $end, $new_html );
$this->get_updated_html();
$this->bookmarks['start']->start = $start;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done the same in one of my attempts! Why is this one not getting released in get_updated_html, though? I remember I had to create a new WP_HTML_Text_Span and re-insert it into $this->bookmarks

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite sure why it isn't released

$this->bookmarks['start']->end = $start;
$this->seek( 'start' );

$this->release_bookmark( 'start' );
$this->release_bookmark( 'end' );
}
}
Loading