Skip to content

Pandoc drops “unknown” HTML elements when converting to markdown #1756

@dullroar

Description

@dullroar

Consider the following simple HTML:

<!DOCTYPE html>
<body>
<p>Test
  <object height="355" width="425">
    <param name="movie" value="http://www.youtube.com/v/DKk9rv2hUfA&amp;rel=1">
    <param name="wmode" value="transparent">
    <embed height="355" src="http://www.youtube.com/v/DKk9rv2hUfA&amp;rel=1" type="application/x-shockwave-flash" width="425">
  </object>
</p>
</body>

I want to convert that to markdown, and for the elements that don't have markdown equivalents (object, etc.) to just pass them through as HTML unchanged. However, when I run it through pandoc (v1.13.1 on Windows) with the following command line:

pandoc --from=html --to=markdown --output=C:\Temp\test.md C:\Temp\test.html

...the only output I get in test.md is:

Test

Note: I have already seen this question and answer, but when I try --parse-raw it simply passes through all the HTML as HTML, which is not what I want. In the StackExchange thread I started on this I was told that indeed --parse-raw is what I want and should work, but that the embed element seems to be triggering a bug that then causes everything to be brought over as raw HTML, and was given the suggestion to post a bug report, so here I am.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions