Skip to content

Parse weirdly if nested html elements (like a figure element in an anchor) are in one line #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks done
talatkuyuk opened this issue Apr 18, 2025 · 7 comments
Closed
4 tasks done
Labels
👀 no/external This makes more sense somewhere else 👎 phase/no Post cannot or will not be acted on

Comments

@talatkuyuk
Copy link

talatkuyuk commented Apr 18, 2025

Initial checklist

Affected package

hast-util-raw

Steps to reproduce

Here is a markdown input containing simple nested html elements are in one line:

const md = `<a href="https://example.com"><figure><img src="image.png" alt=""></figure></a>`;

const unified = require('unified');
const remarkParse = require('remark-parse');
const remarkRehype = require('remark-rehype');
const rehypeRaw = require('rehype-raw');
const rehypeStringify = require('rehype-stringify');

const html = unified()
  .use(remarkParse)
  .use(remarkRehype, { allowDangerousHtml: true })
  .use(rehypeRaw)
  .use(rehypeStringify)
  .processSync(md);

console.log(html);

Actual behavior

If the html elements are in one line, it produces weird anchor behavior and empty paragraph at the end:

<p><a href="https://example.com"></a></p><figure><a href="https://example.com"><img src="image.png" alt=""></a></figure><p></p>

But if the input is like:

const md = `<a href="https://example.com">
<figure><img src="image.png" alt=""></figure>
</a>`;

it behaves normal and produces the expected result.

Expected behavior

I expect it can handle this kind of simple nested html input in one line and the result to be:

<p><a href="https://example.com"><figure><img src="image.png" alt=""></figure></a></p>

I mean hast-util-raw should handle nested html elements even if they are in one line.

Runtime

node

Package manager

npm

Operating system

macos

Build and bundle tools

No response

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Apr 18, 2025
@wooorm
Copy link
Member

wooorm commented Apr 18, 2025

If the html elements are in one line, it produces weird anchor behavior and empty paragraph at the end:

Hi! This has to do with markdown. Not with HTML, or this package. Here’s your input pasted on the CommonMark dingus: https://spec.commonmark.org/dingus/?text=%3Ca%20href%3D%22https%3A%2F%2Fexample.com%22%3E%3Cfigure%3E%3Cimg%20src%3D%22image.png%22%20alt%3D%22%22%3E%3C%2Ffigure%3E%3C%2Fa%3E%0A%0A%3Ca%20href%3D%22https%3A%2F%2Fexample.com%22%3E%0A%3Cfigure%3E%3Cimg%20src%3D%22image.png%22%20alt%3D%22%22%3E%3C%2Ffigure%3E%0A%3C%2Fa%3E. You should be able to see the same here on GH too.

@wooorm wooorm closed this as not planned Won't fix, can't repro, duplicate, stale Apr 18, 2025
@wooorm wooorm added the 👀 no/external This makes more sense somewhere else label Apr 18, 2025

This comment has been minimized.

@github-actions github-actions bot added 👎 phase/no Post cannot or will not be acted on and removed 🤞 phase/open Post is being triaged manually labels Apr 18, 2025
@talatkuyuk
Copy link
Author

talatkuyuk commented Apr 18, 2025

Here’s your input pasted on the CommonMark dingus:

When I go to dingus, I see the input pasted; then, when I click the HTML tab in the right panel, I see the HTML result for that is in one line:

<p><a href="https://example.com"><figure><img src="image.png" alt=""></figure></a></p>

It is expected output as should be (not weird one). Am I doing wrong? or am I right about the issue?

@wooorm
Copy link
Member

wooorm commented Apr 19, 2025

It’s about the <p>s being added. There is a difference between the two test cases

@wooorm
Copy link
Member

wooorm commented Apr 19, 2025

Perhaps I am unclear what you mean. Can you maybe make your input/actual/expected examples smaller?

@talatkuyuk
Copy link
Author

talatkuyuk commented Apr 19, 2025

It is not about <p> is being added or not. It is about anchor <a> behavior if it is outer. Here are two input/actual/expected examples in a simplest way. Consider the setup is like below:

const unified = require('unified');
const remarkParse = require('remark-parse');
const remarkRehype = require('remark-rehype');
const rehypeRaw = require('rehype-raw');
const rehypeStringify = require('rehype-stringify');

const html = unified()
  .use(remarkParse)
  .use(remarkRehype, { allowDangerousHtml: true })
  .use(rehypeRaw)
  .use(rehypeStringify)
  .processSync(input);

input markdown (in one line, outer <a> inner <figure>):

<a href="https://example.com"><figure><img src="image.png" alt=""></figure></a>

actual output (weird, empty anchor within <p> in the beginning, and empty paragraph at the end):

<p><a href="https://example.com"></a></p><figure><a href="https://example.com"><img src="image.png" alt=""></a></figure><p></p>

expected output (I saw the expected output in dingus HTML tab, and hast-util-raw should ensure that result !):

<p><a href="https://example.com"><figure><img src="image.png" alt=""></figure></a></p>

On the other hand, I changed the order of nesting elements to see the behavior:

input markdown (in one line, outer <figure> inner <a>):

<figure><a href="https://example.com"><img src="image.png" alt=""></a></figure>

actual output (not weird, it is expected, and no problem !):

<figure><a href="https://example.com"><img src="image.png" alt=""></a></figure>

I stress that the two inputs are in one line, just changed the order of nesting.

@wooorm
Copy link
Member

wooorm commented Apr 21, 2025

Thanks for providing more info! I made your example smaller:

/**
 * @import {Root} from 'hast'
 */

import {raw} from 'hast-util-raw'

/** @type {Root} */
const tree = {
  type: 'root',
  children: [
    {
      type: 'element',
      tagName: 'p',
      properties: {},
      children: [
        {type: 'raw', value: '<a>'},
        {type: 'raw', value: '<figure>'},
        {type: 'raw', value: '</figure>'},
        {type: 'raw', value: '</a>'}
      ]
    }
  ]
}

const reformatted = raw(tree)

console.dir(reformatted, {depth: null})

Yields:

{
  type: 'root',
  children: [
    {
      type: 'element',
      tagName: 'p',
      properties: {},
      children: [
        { type: 'element', tagName: 'a', properties: {}, children: [] }
      ]
    },
    {
      type: 'element',
      tagName: 'figure',
      properties: {},
      children: []
    },
    { type: 'element', tagName: 'p', properties: {}, children: [] }
  ],
  data: { quirksMode: false }
}

Perhaps this smaller example will make it more visible: the “problem” is that there is a <figure> inside a <p>. That cannot be. When <figure> is seen, the a is first closed, and the p is closed. Then the figure is opened, closed, the stray </a> is ignored, and the stray </p> first causes it to be opened and then immediately closed.

You can see the same behavior in a browser by pasting this in an empty new tab: document.body.innerHTML = '<p><a><figure></figure></a></p>'. And then inspecting the DOM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👀 no/external This makes more sense somewhere else 👎 phase/no Post cannot or will not be acted on
Development

No branches or pull requests

2 participants