Skip to content

NodeTraversor visits nodes twice in certain cases after removal #2355

@doublep

Description

@doublep

In short: if NodeVisitor.head() removes the last child, NodeTraversor revisits the previous children at the same level. In some cases this doesn't matter, e.g. if all is expected from NodeVisitor is just modification of the DOM tree. For example, assertion in test canRemoveDuringHead() only asserts how the tree looks afterwards. However, if NodeVisitor acts as accumulator, such duplicate visiting can easily lead to incorrect results.

Reproducer in the form of a test case (e.g. for TraversorTest):

@ParameterizedTest
@ValueSource(strings = {"em", "b"})
void doesntVisitAgainAfterRemoving(String removeTag) {
    Document doc = Jsoup.parse("<div><em>first</em><b>last</b></div>");
    Set<Node> visited = new HashSet<>();
    NodeTraversor.traverse((node, depth) -> {
        if (!visited.add(node))
            fail(String.format("node '%s' is being visited for the second time", node));
        if (removeTag.equals(node.nodeName()))
            node.remove();
    }, doc);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugA confirmed bug, that we should fixfixedAn {bug|improvement} that has been {fixed|implemented}

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions