-
Notifications
You must be signed in to change notification settings - Fork 0
Html api/add html processor #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Changes from 12 commits
805fb55
84e35ef
84e25cf
c18a81a
af8d49d
72e5de0
095a110
9b3f90d
09e16b2
396285b
b2e6856
65c9f5b
f5e4d5e
247966b
f1ccd9b
fb31ea1
2af46d0
e945f16
96053c9
da7880a
264ae35
04a37b5
0975e34
d30f766
cafee03
67bef48
be6e901
9980a36
2e874e4
d0e3f4a
ece3a68
a5284d6
74c9e6b
a4569cd
98180bc
0e3ada5
739b4aa
2b263b5
e1ec10c
bc61ea8
6f58b70
d82b146
a6e5323
8a2215e
d96c05a
c93d420
2e0e4f9
f5c7651
3956ad9
37ce95e
613ae1a
46e3777
a1a095c
c2dcc37
c29570e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,259 @@ | ||
<?php | ||
|
||
class WP_HTML_Processor extends WP_HTML_Tag_Processor { | ||
public $fully_supported_input = null; | ||
public $open_elements = array(); | ||
|
||
public function ensure_support() { | ||
if ( null !== $this->fully_supported_input ) { | ||
return $this->fully_supported_input; | ||
} | ||
|
||
$stack = array(); | ||
|
||
$p = new WP_HTML_Tag_Processor( $this->html ); | ||
while ( $p->next_tag( array( 'tag_closers' => 'visit' ) ) ) { | ||
$tag_name = $p->get_tag(); | ||
|
||
if ( ! $p->is_tag_closer() ) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are void elements closers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No they're not, that's why this code branch works There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. as you probably realized, tag closers only exist when the tag name is preceded by a |
||
$element = WP_HTML_Spec::element_info( $tag_name ); | ||
|
||
$self_closes = $element::is_void || ( ! $element::is_html && $p->has_self_closing_flag() ); | ||
if ( ! $self_closes ) { | ||
$stack[] = $tag_name; | ||
} | ||
} else { | ||
if ( end( $stack ) === $tag_name ) { | ||
array_pop( $stack ); | ||
continue; | ||
} | ||
|
||
$this->fully_supported_input = false; | ||
return false; | ||
} | ||
} | ||
|
||
$this->fully_supported_input = 0 === count( $stack ); | ||
|
||
return $this->fully_supported_input; | ||
} | ||
|
||
public function next_tag( $query = null ) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wish we could make it private in the descendant class. It doesn't even make sense to have it public, it can't support the same set of queries as the parent, and it's a pain to deal with PHP's idea of polymorphism where the parent class will call the child implementation of this method anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. funny enough I added is it the tag processor that's calling this? the one thing I wish we could do is remove |
||
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) { | ||
return false; | ||
} | ||
|
||
if ( 0 < count( $this->open_elements ) ) { | ||
$element = WP_HTML_Spec::element_info( end( $this->open_elements ) ); | ||
// @TODO: Handle self-closing HTML foreign elements: must convey self-closing flag on stack. | ||
if ( $element::is_void ) { | ||
array_pop( $this->open_elements ); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't it say There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or not? I don't get why void elements have any interaction with the open elements stack There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh I see, why are void elements landing on the open elements stack, though? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is something to document later and it appeared towards the end of the day. I think I've seen this in the HTML spec for tree-building. the problem is that we have to push the void element to the open stack in order to get proper depth information and climb out of the nesting we're in. this looks funny here because we aren't currently tracking if the last element on the stack requires no closer (is void or self-closing foreign element). that means whenever we approach the next tag we have to immediately remove any void elements because they cannot enclose over other tags. it's definitely unexpected until you reason through it, or as I did, get reminded by it with failing tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. According to the spec, popping should happen in the same step as the insertion:
|
||
} | ||
} | ||
|
||
if ( false === parent::next_tag( array( 'tag_closers' => 'visit' ) ) ) { | ||
return false; | ||
} | ||
|
||
$tag_name = $this->get_tag(); | ||
$element = WP_HTML_Spec::element_info( $tag_name ); | ||
|
||
$self_closes = $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ); | ||
if ( $self_closes ) { | ||
$this->open_elements[] = $tag_name; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like this part, it's not really an open element. Even in the parsing spec these are only added to the stack for brevity of the document – they're always immediately popped in the same step and before ingesting the next token There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's essential: see above. may not be able to articulate it well yet, but I think one case we might examine is finding the I think this will matter also for CSS selection. we will need to know the full "tree path" (a term I'm making up here without intending to convey specific information about it) in order to match our current position against the selector. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Btw it's not what the spec says, is it? Depending on the insertion mode and element name, we may or may not have to push it to the stack of open elements. There may be other operations to do before pushing, too, and the order matters. |
||
return true; | ||
} | ||
|
||
if ( $this->is_tag_closer() ) { | ||
array_pop( $this->open_elements ); | ||
} else { | ||
$this->open_elements[] = $tag_name; | ||
} | ||
|
||
return true; | ||
} | ||
|
||
public function next_sibling() { | ||
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) { | ||
return false; | ||
} | ||
|
||
$starting_depth = count( $this->open_elements ); | ||
|
||
while ( $this->next_tag() ) { | ||
$current_depth = count( $this->open_elements ); | ||
|
||
if ( ! $this->is_tag_closer() && $current_depth === $starting_depth ) { | ||
return true; | ||
} | ||
|
||
if ( ! $this->is_tag_closer() && $current_depth < $starting_depth ) { | ||
return false; | ||
} | ||
} | ||
|
||
return false; | ||
} | ||
|
||
public function next_child() { | ||
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) { | ||
return false; | ||
} | ||
|
||
$starting_depth = count( $this->open_elements ); | ||
|
||
while ( $this->next_tag() ) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should bale out if between the $starting_depth and the child we see anything with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. given the |
||
$current_depth = count( $this->open_elements ); | ||
|
||
if ( ! $this->is_tag_closer() && $current_depth === $starting_depth + 1 ) { | ||
return true; | ||
} | ||
} | ||
|
||
return false; | ||
} | ||
|
||
private function find_closing_tag() { | ||
$starting_depth = count( $this->open_elements ); | ||
|
||
while ( $this->next_tag() ) { | ||
$current_depth = count( $this->open_elements ); | ||
|
||
if ( $this->is_tag_closer() && $current_depth < $starting_depth ) { | ||
return true; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditto here for the depth check I mentioned above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is all garbage that will go away |
||
} | ||
} | ||
|
||
return false; | ||
} | ||
|
||
public function get_inner_content() { | ||
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) { | ||
return false; | ||
} | ||
|
||
if ( ! $this->get_tag() || $this->is_tag_closer() ) { | ||
return false; | ||
} | ||
|
||
$element = WP_HTML_Spec::element_info( $this->get_tag() ); | ||
if ( $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ) ) { | ||
return false; | ||
} | ||
|
||
// @TODO: Unique bookmark names | ||
$this->set_bookmark( 'start' ); | ||
if ( ! $this->find_closing_tag() ) { | ||
return false; | ||
} | ||
$this->set_bookmark( 'end' ); | ||
|
||
$start = $this->bookmarks['start']->end + 1; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's one usage of the |
||
$end = $this->bookmarks['end']->start - 1; | ||
$inner_content = substr( $this->html, $start, $end - $start + 1 ); | ||
|
||
$this->release_bookmark( 'start' ); | ||
$this->release_bookmark( 'end' ); | ||
|
||
return $inner_content; | ||
} | ||
|
||
public function set_inner_content( $new_html ) { | ||
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) { | ||
return false; | ||
} | ||
|
||
if ( ! $this->get_tag() || $this->is_tag_closer() ) { | ||
return false; | ||
} | ||
|
||
$element = WP_HTML_Spec::element_info( $this->get_tag() ); | ||
if ( $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ) ) { | ||
return false; | ||
} | ||
|
||
// @TODO: Unique bookmark names | ||
$this->set_bookmark( 'start' ); | ||
if ( ! $this->find_closing_tag() ) { | ||
return false; | ||
} | ||
$this->set_bookmark( 'end' ); | ||
|
||
$start = $this->bookmarks['start']->end + 1; | ||
$end = $this->bookmarks['end']->start; | ||
$this->lexical_updates[] = new WP_HTML_Text_Replacement( $start, $end, $new_html ); | ||
$this->get_updated_html(); | ||
$this->seek( 'start' ); | ||
|
||
$this->release_bookmark( 'start' ); | ||
$this->release_bookmark( 'end' ); | ||
} | ||
|
||
public function get_outer_content() { | ||
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) { | ||
return false; | ||
} | ||
|
||
if ( ! $this->get_tag() || $this->is_tag_closer() ) { | ||
return false; | ||
} | ||
|
||
$element = WP_HTML_Spec::element_info( $this->get_tag() ); | ||
if ( $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ) ) { | ||
$this->set_bookmark( 'start' ); | ||
$here = $this->bookmarks['start']; | ||
return substr( $this->html, $here->start, $here->end - $here->start + 1 ); | ||
} | ||
|
||
// @TODO: Unique bookmark names | ||
$this->set_bookmark( 'start' ); | ||
if ( ! $this->find_closing_tag() ) { | ||
return false; | ||
} | ||
$this->set_bookmark( 'end' ); | ||
|
||
$start = $this->bookmarks['start']->start; | ||
$end = $this->bookmarks['end']->end; | ||
$inner_content = substr( $this->html, $start, $end - $start + 1 ); | ||
$this->seek( 'start' ); | ||
|
||
$this->release_bookmark( 'start' ); | ||
$this->release_bookmark( 'end' ); | ||
|
||
return $inner_content; | ||
} | ||
|
||
public function set_outer_content( $new_html ) { | ||
if ( false === $this->fully_supported_input || false === $this->ensure_support() ) { | ||
return false; | ||
} | ||
|
||
if ( ! $this->get_tag() || $this->is_tag_closer() ) { | ||
return false; | ||
} | ||
|
||
$element = WP_HTML_Spec::element_info( $this->get_tag() ); | ||
// @TODO: Replace void and self-closing tags. | ||
if ( $element::is_void || ( ! $element::is_html && $this->has_self_closing_flag() ) ) { | ||
return false; | ||
} | ||
|
||
// @TODO: Unique bookmark names | ||
$this->set_bookmark( 'start' ); | ||
if ( ! $this->find_closing_tag() ) { | ||
return false; | ||
} | ||
$this->set_bookmark( 'end' ); | ||
|
||
$start = $this->bookmarks['start']->start; | ||
$end = $this->bookmarks['end']->end + 1; | ||
$this->lexical_updates[] = new WP_HTML_Text_Replacement( $start, $end, $new_html ); | ||
$this->get_updated_html(); | ||
$this->bookmarks['start']->start = $start; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've done the same in one of my attempts! Why is this one not getting released in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not quite sure why it isn't released |
||
$this->bookmarks['start']->end = $start; | ||
$this->seek( 'start' ); | ||
|
||
$this->release_bookmark( 'start' ); | ||
$this->release_bookmark( 'end' ); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smart! I kind of don't like that it throws away the progress we have made so far, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean? my vision for this is that it can grow with our added support.
as for running a first-pass through the document I'm not claiming this is certainly needed. I'm sure it's possible to work it in alongside the normal traversal.
for now though it helped quite a bit, and I figured the cost is meager given that it's a read-only scan.