Skip to content

Snippet templates that contain UTF-8 characters are corrupted on platforms with a non-UTF-8 default encoding #585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RudiKlassen-zz opened this issue Feb 19, 2019 · 12 comments
Labels
Milestone

Comments

@RudiKlassen-zz
Copy link

Our documents or project is set as follows:

Build Management: Gradle
IDE project encoding: ISO-8859-1
Snippet file encoding: UTF-8
Configuration for MockMvcBuilders: snippets().withDefaults().withEncoding("UTF-8");

We generate the snippets via Spring MVC test.
If I check the contents of the snippets in the build directory after generation, the umlauts or certain special characters are broken there. For me it looks like the encoding configuration of the MockMvcBuilder is ignored and the project encoding is used. But this should not be the case.
(Or am I doing something wrong?)

@wilkinsona
Copy link
Member

wilkinsona commented Feb 19, 2019

I can't tell if you're doing something wrong as I don't know enough about what you're doing. REST Docs could be causing the problem, or the problem could be happening before the request and response reach REST Docs. If you'd like me to spend some time investigating, please take the time to provide a complete and minimal sample that reproduces the problem.

@wilkinsona wilkinsona added the status: waiting-for-feedback Feedback is required before progress can be made label Feb 19, 2019
@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback Feedback is required before progress can be made labels Feb 19, 2019
@wilkinsona wilkinsona added status: waiting-for-feedback Feedback is required before progress can be made and removed status: feedback-provided Feedback has been provided labels Feb 19, 2019
@RudiKlassen-zz
Copy link
Author

Hi @wilkinsona ,
i have just prepared a small sample project, which can be found here https://github.com/RudiKlassen/SpringRestDocsExampleProject

To reproduce the error, proceed as follows: (I used IntelliJ)

Test 1
Set the Project Encoding to ISO-8859-1
Run the JUnit test CrudControllerTest
Open build/generated-snippets/read/fullDocumentation.adoc

Result: In the document you can see that the inserted umlauts broke during the conversion.

Test 2
Change Project Encoding to UTF-8,
clean up the build directory an run the test again.

Result: The umlauts are now displayed correctly in the generated document.

However, as far as I understand it, this should not happen, because the snipping encoding is determined by the document settings in MockMvcRestDocumentation.There I set the encoding to UTF-8. But I'm not sure if it's my fault or a bug. Can you please take a look?

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback Feedback is required before progress can be made labels Feb 25, 2019
@wilkinsona
Copy link
Member

Thanks for the additional information. The snippet encoding only affects the encoding used to write out the snippets. It does not and cannot affect the encoding of anything before it reaches the snippets. If you set your project's encoding to ISO-8859-1 and use UTF-8 characters in its source code, the encoding problem will have occurred before REST Docs is involved.

@wilkinsona wilkinsona added status: invalid Suggestion or bug report that we don't feel is valid and removed status: feedback-provided Feedback has been provided status: waiting-for-triage Untriaged issue labels Feb 25, 2019
@ghost
Copy link

ghost commented Apr 28, 2020

Hey,

i have executed the example project with 'gradlew build' in the Linux Bash and under Windows in the CMD. I find the generated snippet under ./build/generated-snippets/read/fullDocumentation.adoc. Between the two builds I have executed 'gradlew clean'.

Now to the build results:

  • under Windows with the CMD: umlauts are broken.
  • under Linux with the Bash: umlauts are correct.

For me it looks like restdocs is using the encoding of the operating system. The file is UTF-8 encoded, but the content is ISO-8859-1 encoded when I created the file in Windows.

But under Linux both are UTF-8.
Can you look at this again and explain how to fix this?
My complete IDE is set to UTF-8.

Translated with www.DeepL.com/Translator (free version)

@wilkinsona
Copy link
Member

wilkinsona commented Apr 28, 2020

@pDiller REST Docs uses UTF-8 by default to write the snippets and does not use the operating system's encoding. I would guess that something is using the operating system's encoding which is breaking the umlauts before the data reached REST Docs. If you are running Gradle from the command line, your IDE's configuration won't have any effect. I'd recommend checking how your build is configured and what encoding it is using.

@ghost
Copy link

ghost commented Apr 28, 2020

@wilkinsona of course my IDE Settings had nothing to do with the gradle build. I just mentioned it, because @RudiKlassen did it. All the classes are in UTF-8 and, as you can see, the Configuration in our Tests says that we have configured UTF-8 as well. I don´t really understand what our build has to do with the configuration which is made programatically in the tests. Can you give us a little hint? Everything we do is just let restdocs generate the snippets...

@wilkinsona
Copy link
Member

Everything we do is just let restdocs generate the snippets...

REST Docs has to generate the snippets from some input and I suspect it's that input that contains broken umlauts. Once they're broken REST Docs can't do anything to fix them.

Can you give us a little hint?

As with the original problem, I can't give any more hints without some more information. If you can provide a small sample project that reproduces the problem, I can take a look. If you want to investigate yourself, I'd recommend debugging MockMvcRequestConverter (or the equivalent class for WebFlux or REST Assured) and checking the input into REST Docs in the convert method.

@ghost
Copy link

ghost commented Apr 28, 2020

REST Docs has to generate the snippets from some input and I suspect it's that input that contains broken umlauts. Once they're broken REST Docs can't do anything to fix them.

I don´t see where we are manipulating our input. were just have a simple api.

As with the original problem, I can't give any more hints without some more information. If you can provide a small sample project that reproduces the problem, I can take a look. If you want to investigate yourself, I'd recommend debugging MockMvcRequestConverter (or the equivalent class for WebFlux or REST Assured) and checking the input into REST Docs in the convert method.

@RudiKlassen already posted a project where you can test this: https://github.com/RudiKlassen/SpringRestDocsExampleProject

I will debug this, but i would prefer if you can have a look too.

@wilkinsona
Copy link
Member

The project from @RudiKlassen was user error. ISO-8859-1 encoding was being used, and as a result UTF-8 characters were broken, before the data reached REST Docs. If you want me to spend some time investigating your problem you're going to have to spend some time providing a sample that reproduces it.

@ghost
Copy link

ghost commented Apr 28, 2020

I just cloned the project and made the gradle builds i explained in my comments before. There is NO! setting for ISO-8859-1. Everything is in UTF-8... So i don´t understand why you can´t use the example. So if you clone the project and just execute the gradle commands in the different environments. You will see the wrong output with the cmd console. The unix console will produce the correct output. While you have nothing changed in the project at all.

@wilkinsona
Copy link
Member

wilkinsona commented Apr 28, 2020

Thanks. It wasn't clear to me that "the example project" was referring to the project provided by Rudi. Having looked more closely, I was mistaken before. There's a snippet template in the sample project that uses UTF-8 characters. REST Docs' snippet encoding only applies to the encoding of the output that REST Docs produces. It does not apply to the snippet templates when they're read and passed to JMustache. It should be possible to fix that.

In the meantime, you can make your build more robust and platform independent by configuring the test tasks to run with the file.encoding system property set to UTF-8.

@wilkinsona wilkinsona reopened this Apr 28, 2020
@wilkinsona wilkinsona changed the title Generated snippets contain broken characters Snippet templates that contain UTF-8 characters are corrupted on platforms with a non-UTF-8 default encoding Apr 28, 2020
@wilkinsona wilkinsona added type: bug A bug and removed status: invalid Suggestion or bug report that we don't feel is valid labels Apr 28, 2020
@wilkinsona wilkinsona added this to the 2.0.5.RELEASE milestone Apr 28, 2020
@ghost
Copy link

ghost commented Apr 28, 2020

Thank you for looking at this bug again! Next time i will write it more clearly ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants