Skip to content
This repository was archived by the owner on Apr 14, 2022. It is now read-only.

Consider not converting spaces to   entities in kind=markdown #422

Closed
cpbotha opened this issue Nov 20, 2018 · 13 comments
Closed

Consider not converting spaces to   entities in kind=markdown #422

cpbotha opened this issue Nov 20, 2018 · 13 comments
Assignees
Milestone

Comments

@cpbotha
Copy link

cpbotha commented Nov 20, 2018

In getting the Microsoft PyLS working with Emacs, see https://vxlabs.com/2018/11/19/configuring-emacs-lsp-mode-and-microsofts-visual-studio-code-python-language-server/ -- I had to hack around the fact that PyLS converts all spaces in the Python source docstrings to   elements.

As far as I can tell, this is strictly speaking valid Markdown, although it's making life difficult for non-HTML based MarkDown renderers, like those in VIM and Emacs.

Would you consider disabling this function, or at least making it configurable?

In short, I would like contents with kind=markdown to use simple spaces instead of   entities.

@jakebailey
Copy link
Member

This might actually get a bit more dicey than it seems, unfortunately. I believe there are a couple cases where we have to "cheat" with using HTML-style whitespace (spaces/line breaks) in order to work around issues we have with tooltips in VS Code. For example, in some random cases, triple-quoted code strings (which we use for the first line to give the function signature before the docs) won't end and will cause intermittent issues like #328. In other cases, I believe using HTML allows us to position documentation as it is written in docstrings, whereas markdown is of course a language which does not care about whitespace.

That being said, there are definitely a few cases I've seen where we do goofy things like encode some HTML bits twice, which might be part of the problem you're seeing. It'd certainly be great to make things consistent and revisit the behavior to see if anything we have is no longer needed.

I noticed in that post that you've set the preferred format to plaintext. Is that option not working for you, or are you just talking about the case of specifying markdown and the editor providing that functionality (then not being able to deal with HTML-style stuff)?

@cpbotha
Copy link
Author

cpbotha commented Nov 20, 2018

I understand that you're being constrained to a certain extent by Visual Studio Code.

However, we've all seen zillions of markdown files, most of which don't contain this many   entities. Markdown renderers the world over seem to do a great job of showing these  -free files. :)

I would already be very happy if you could make an exception or configuration for only  , but I of course understand if you make the call that's not going to be reasonable for you to add.

With regard to your last question: Even when I specify preferredFormat as plaintext, I get those   entities, but only in some cases. For example, with import pandas as pd, hovering over the pandas will get me plaintext, whilst hovering over the pd will get me the same docs as markdown. Super strange.

@jakebailey
Copy link
Member

jakebailey commented Nov 20, 2018

Oh, I didn't mean that we wouldn't want to change things; I just wanted to write down my thoughts as to why I believe this happens to be the current behavior and some places to look and challenges to consider when writing a fix.

Those  s shouldn't be showing up in the plaintext output, for sure. Are you also certain that the client is sending plaintext in the client capabilities? We currently have two different places to set the documentation formatting (see #319). EDIT: Eh, I forgot that #319 is to actually use the typical LSP way to indicate content formatting type. I don't think setting that will work currently.

@cpbotha
Copy link
Author

cpbotha commented Nov 21, 2018

I'm sending preferredFormat: plaintext as part of the initializationOptions, just like vs.code is doing (but they are sending markdown)

Doing this affects some things but not others. I can confirm that I still get markdown documentation as a response to textDocument/hover -- I looked at the raw json coming back from the LS.

@MikhailArkhipov
Copy link

Unfortunately, a lot of Python documentation in files is formatted with space/tab indentation - such as samples. If whitespace is not converted to &nbsp, the text does not appear properly (as in the original doc string) formatted.

@jakebailey
Copy link
Member

Is that actually an issue if the language server is configured to output plaintext? The documentation in files is by definition plaintext, because it's the source code. I think it'd be better to not do any sort of modification to docstrings when in plaintext mode. Formatting for showing in markdown is likely a different story.

@MikhailArkhipov
Copy link

Plaintext means no colors in VS Code. Since extension with Jedi traditionally provided colored code (as do other extensions), we should be doing the same.

@cpbotha
Copy link
Author

cpbotha commented Nov 22, 2018

@MikhailArkhipov The point was that even when the LS is instructed via its API to return plaintext, it still returns markdown and   entities, sometimes.

Also, I don't understand your reasoning with regards to TABs and &nbsp vs just spaces. In both cases you have to make the call how many spaces to use.  -entities only serve to act as further LS-imposed constraints on the vs.code markdown renderer (and other more compliant markdown renderers). (or are you translating TABs into another more specific html entity?)

Furthermore, from a software engineering standpoint, this really goes against separation of concerns. Ideally the MS LS should generate fully compliant, textbook markdown, whilst vs.code and other clients simply should simply render that compliant, textbook markdown.

Although markdown strictly speaking might allow for   and other html entities, that was never its intention. Remember that it started as a plaintext markup that could be equally well parsed by humans reading emails.

As I mentioned earlier: Fundamentally, most (all?) markdown renderers render  -less markdown without a hitch. I find it hard to believe that vs.code is the only one that has trouble with this.

All of that being said: I don't really care how you decide to deviate from what everybody else is doing, as long as you stay consistent in how you do it so I don't have to keep on updating my patches to Emacs. ;P

@MikhailArkhipov
Copy link

LS sends   in markdown triple-tick blocks with html language specified so they are part of HTML fragment. Works same way triples work here at GH. Typically markdown looks like block of triple-tick-python followed by triple-tick-html. See microsoft/vscode#49018 for an example.

@jakebailey
Copy link
Member

jakebailey commented Nov 23, 2018

VS Code always requests markdown, never plaintext, so I don't think fixing plaintext output to actually be plaintext will affect VS Code (not that plaintext can't contain   if it means it). I'm not sure how it would affect VS, though, if at all.

I'm a bit confused what triple backtick html blocks have to do with this, though. That's for code formatting HTML (like triple backtick then python would be). Example:

<body>
<p>Hello &nbsp; world!
</body>

Is:

```html
<body>
<p>Hello &nbsp; world!
</body>
```

That doesn't mean that HTML is rendered, but that it's highlighted as HTML. When I was looking into #328, I didn't see us outputting triple backtick html blocks at any point. Searching the repo for "html" doesn't net anything either.

If we want that bottom docstring to be spaced as plaintext with the original characters, I'd think that a plain triple backtick block would work, or explicitly stating plaintext. IIRC we do the former.

@MikhailArkhipov
Copy link

HTML is to keep documentation text preformatted/indented with significant whitespace.

@jakebailey
Copy link
Member

All backtick blocks should be able to do that, like:

        I'm prefaced by 8 spaces!

Which is:

```
        I'm prefaced by 8 spaces!
```

@jakebailey
Copy link
Member

With #596, the docstring handling has been completely rewritten for the new language server. There is no longer any manual spacing of elements, instead converting from the docstring to something markdown will be able to format. I don't think we'll be inserting any more HTML-specific whitespace (unless it can't be avoided, which I can't see happening). If plaintext is specified, all docstrings returned will come through unmodified, other than removing consistent indents.

I'll file another bug for handling LSP capability-based MarkupKind support, but that's a further improvement.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants