Skip to content

LaTeX reader: Improve \noindent and \textgreek parsing #1783

@adunning

Description

@adunning

In a LaTeX document (credit to one of my students), I have noticed that Pandoc is randomly dropping text from \it and \textgreek commands. Try, for instance:

pandoc -f latex -t markdown << EOT

\medskip
\noindent {\it Hypothesize} is composed of the noun {\it hypothesis} and the verb forming the suffix {\it -ize} (OED s.v. 'hypothesize, v.'). The suffix could be traced from late Latin {\it -iz\={a}re, -\={i}z\={a}re}, to Greek \textgreek{-ίζειν} (formative of verbs) (OED s.v. '-ize, suffix').

EOT

Expected output:

*Hypothesize* is composed of the noun *hypothesis* and the verb forming the
suffix *-ize* (OED s.v. ’hypothesize, v.’). The suffix could be traced from late 
Latin *-izāre, -īzāre*, to Greek ίζειν (formative of verbs) (OED s.v. ’-ize, suffix’).

Actual output:

is composed of the noun <span>*hypothesis*</span> and the verb forming the
suffix <span>*-ize*</span> (OED s.v. ’hypothesize, v.’). The suffix
could be traced from late Latin <span>*-izāre, -īzāre*</span>, to Greek
(formative of verbs) (OED s.v. ’-ize, suffix’).

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions