Skip to content

Escape LaTeX special characters #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 17, 2020

Conversation

csemken
Copy link
Contributor

@csemken csemken commented Jun 28, 2020

Fixes #60.

Uses the list of LaTeX special characters in pandas.io.formats.latex.

Currently escapes special characters in variable names and table notes. Later, the method could be applied to all cells in a _gen_row method, as suggested in #41 and #46.

@toobaz
Copy link
Collaborator

toobaz commented Jun 28, 2020

This is a subtle issue, see pandas-dev/pandas#21673 (comment) .

In short, if we escape automatically we do fix the sepal_lenght example you make in #60 , but we break code doing, for instance

table.rename_covariates({'bi' : '$b_i$')

... and while this has a workaround (take the render_latex output and replace the escaped version of '$b_i$' back to '$b_i$' before saving it as a file), it is much less trivial than a simple

table.rename_covariates({'sepal_lenght' : 'sepal\_lenght')

(which on the other hand has the obvious inconvenience that this table wouldn't look as good in html, or other formats).

I think the long term solution is to create a str subclass that is aware of what must, or must not, be escaped.

In the short term, I'm OK with the feature you propose, but it should be disabled by default, its documentation should mention the problem above (e.g. the '$b_i$' example), and it could be enabled via a bool attribute of the Stargazer class.

@csemken
Copy link
Contributor Author

csemken commented Jun 28, 2020

Ok, I can see you have given this a lot of thought already. My thinking was that the simplest and probably most common usage would be variable names as text without formatting – where I would expect Stargazer to escape all characters, so that the table looks the same in HTML and LaTeX – and that users would turn off escaping if they apply “render-specific” formatting ($b_i$ or b<sub>i</sub> with escape=False). But I agree that a formatting-aware “intelligent” str class would be even better.

Anyway, I have added an escape option and set the default to False.

BTW, the render arguments are currently passed on to the render method. This leads to some redundant argument defaults. Alternatively, making the options attributes – as I did – raises a Pylint warning. To avoid both issues we could pass the arguments to the Renderer.__init__ and make them attributes (as does pandas.io.formats.format.LatexFormatter).

@toobaz
Copy link
Collaborator

toobaz commented Jul 10, 2020

BTW, the render arguments are currently passed on to the render method. This leads to some redundant argument defaults. Alternatively, making the options attributes – as I did – raises a Pylint warning. To avoid both issues we could pass the arguments to the Renderer.__init__ and make them attributes (as does pandas.io.formats.format.LatexFormatter).

The fact that an attribute is set outside of __init__ doesn't bother me too much in terms of tidiness... but it does a bit in terms of expected behavior. I want to avoid that (maybe in future code changes, when the renderers might not be disposable any more) the fact that you called render or not affects the behavior of the renderer. So indeed I would prefer if you could either pass escape around without ever making it an attribute, or instead pass it in the LaTeXRenderer(self) and store it inside __init__. To be honest I even have a weak preference for the former (implemented via **kwargs), but both are fine to me.

@toobaz
Copy link
Collaborator

toobaz commented Jul 10, 2020

pass it in the LaTeXRenderer(self) and store it inside __init__

In turn, this can be done either by overriding LatexRenderer.__init__ (so that it still calls Renderer.__init__), or (better) by making Renderer.__init__ store a dict with all the **kwargs it gets.

@csemken
Copy link
Contributor Author

csemken commented Jul 13, 2020

this can be done either by overriding LatexRenderer.__init__ (so that it still calls Renderer.__init__), or (better) by making Renderer.__init__ store a dict with all the **kwargs it gets.

Okay, I implemented the second solution.

I also added a test and docstrings.

Copy link
Collaborator

@toobaz toobaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, just one comment left

@toobaz toobaz merged commit c2c2b3e into StatsReporting:master Jul 17, 2020
@toobaz
Copy link
Collaborator

toobaz commented Jul 17, 2020

Thanks @csemken !

@toobaz
Copy link
Collaborator

toobaz commented Jul 29, 2023

I think the long term solution is to create a str subclass that is aware of what must, or must not, be escaped.

Just for the records, the newly introduced Label does this (and a bit more), see the examples notebook.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Escape LaTeX special characters
2 participants