Skip to content

parse_html ignoring white-spaces and newlines for <pre><code> ... </pre></code> html #4

@qknight

Description

@qknight

I'm having this issue ivanceras/sauron#107 and thought the cause was this parser.

However, recreating them in tests/html.rs sheds a different light on the issue and only the last test fails but the <pre><code>...</code></pre> stuff is alright as in your parser does not remove spaces, newlines or (supposedly) tabs.

So maybe you want to add these tests also and fix the implementation - if you think it is worth fixing it.

correct and working

#[test]
fn test_pre_code() {
    let html = r#"<div><p> test </p>
<pre><code>
0
  1
  <p>foo</p>
  2
3</code></pre>
</div>"#;
    let expected = "<div><p> test </p>\n<pre><code>\n0\n  1\n  <p>foo</p>\n  2\n3</code></pre>\n</div>";
    let doc = parse(html).unwrap();
    println!("html: {}", html);
    println!("render: {}", render(&doc));
    assert_eq!(expected, render(&doc));
}

#[test]
fn test_pre_code_2() {
    let html = r#"<pre><code>
<span>asdf</span>
  <span>asdf</span>
  <span>asdf</span>
</code></pre>"#;
let expected = r#"<pre><code>
<span>asdf</span>
  <span>asdf</span>
  <span>asdf</span>
</code></pre>"#;

  let doc = parse(html).unwrap();
  println!("html: {}", html);
  println!("render: {}", render(&doc));
  assert_eq!(expected, render(&doc));
}

#[test]
fn test_no_pre_no_code_2() {
    let html = r#"<span>asdf</span>
  <span>asdf</span>
  <span>asdf</span>"#;

  let expected = r#"<span>asdf</span><span>asdf</span><span>asdf</span>"#;

  let options = RenderOptions {
		lowercase_tagname: true,
		minify_spaces: true,
		..Default::default()
	};
  let doc = parse(html).unwrap();
  println!("html: {}", html);
  println!("render: {}", doc.render(&options));
  assert_eq!(expected, doc.render(&options));
}

incorrect

#[test]
fn test_pre_code3() {
    let html = r#"<div><p> test </p>
0
  1
  2
3
</div>"#;
  // it returns this
  // "<div><p> test </p>\n0\n1\n2\n3\n</div>"
  // BUT it should be like this
let expected = r#"<div><p>test</p>0 1 2 3</div>"#;
  let options = RenderOptions {
    lowercase_tagname: true,
    minify_spaces: true,
    decode_entity: true,
    encode_content: true,
    remove_endtag_space: true,
    always_close_void: true,
    remove_attr_quote: true,
    remove_comment: true,
    ..Default::default()
  };
  let doc = parse(html).unwrap();
    println!("html: {}", html);
    println!("render: {}", doc.render(&options));
    assert_eq!(expected, doc.render(&options));
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions