Snippet templates that contain UTF-8 characters are corrupted on platforms with a non-UTF-8 default encoding #585

RudiKlassen-zz · 2019-02-19T10:38:46Z

Our documents or project is set as follows:

Build Management: Gradle
IDE project encoding: ISO-8859-1
Snippet file encoding: UTF-8
Configuration for MockMvcBuilders: snippets().withDefaults().withEncoding("UTF-8");

We generate the snippets via Spring MVC test.
If I check the contents of the snippets in the build directory after generation, the umlauts or certain special characters are broken there. For me it looks like the encoding configuration of the MockMvcBuilder is ignored and the project encoding is used. But this should not be the case.
(Or am I doing something wrong?)

wilkinsona · 2019-02-19T11:14:35Z

I can't tell if you're doing something wrong as I don't know enough about what you're doing. REST Docs could be causing the problem, or the problem could be happening before the request and response reach REST Docs. If you'd like me to spend some time investigating, please take the time to provide a complete and minimal sample that reproduces the problem.

RudiKlassen-zz · 2019-02-25T16:41:44Z

Hi @wilkinsona ,
i have just prepared a small sample project, which can be found here https://github.com/RudiKlassen/SpringRestDocsExampleProject

To reproduce the error, proceed as follows: (I used IntelliJ)

Test 1
Set the Project Encoding to ISO-8859-1
Run the JUnit test CrudControllerTest
Open build/generated-snippets/read/fullDocumentation.adoc

Result: In the document you can see that the inserted umlauts broke during the conversion.

Test 2
Change Project Encoding to UTF-8,
clean up the build directory an run the test again.

Result: The umlauts are now displayed correctly in the generated document.

However, as far as I understand it, this should not happen, because the snipping encoding is determined by the document settings in MockMvcRestDocumentation.There I set the encoding to UTF-8. But I'm not sure if it's my fault or a bug. Can you please take a look?

wilkinsona · 2019-02-25T16:52:28Z

Thanks for the additional information. The snippet encoding only affects the encoding used to write out the snippets. It does not and cannot affect the encoding of anything before it reaches the snippets. If you set your project's encoding to ISO-8859-1 and use UTF-8 characters in its source code, the encoding problem will have occurred before REST Docs is involved.

ghost · 2020-04-28T05:53:30Z

Hey,

i have executed the example project with 'gradlew build' in the Linux Bash and under Windows in the CMD. I find the generated snippet under ./build/generated-snippets/read/fullDocumentation.adoc. Between the two builds I have executed 'gradlew clean'.

Now to the build results:

under Windows with the CMD: umlauts are broken.
under Linux with the Bash: umlauts are correct.

For me it looks like restdocs is using the encoding of the operating system. The file is UTF-8 encoded, but the content is ISO-8859-1 encoded when I created the file in Windows.

But under Linux both are UTF-8.
Can you look at this again and explain how to fix this?
My complete IDE is set to UTF-8.

Translated with www.DeepL.com/Translator (free version)

wilkinsona · 2020-04-28T07:59:45Z

@pDiller REST Docs uses UTF-8 by default to write the snippets and does not use the operating system's encoding. I would guess that something is using the operating system's encoding which is breaking the umlauts before the data reached REST Docs. If you are running Gradle from the command line, your IDE's configuration won't have any effect. I'd recommend checking how your build is configured and what encoding it is using.

ghost · 2020-04-28T08:08:41Z

@wilkinsona of course my IDE Settings had nothing to do with the gradle build. I just mentioned it, because @RudiKlassen did it. All the classes are in UTF-8 and, as you can see, the Configuration in our Tests says that we have configured UTF-8 as well. I don´t really understand what our build has to do with the configuration which is made programatically in the tests. Can you give us a little hint? Everything we do is just let restdocs generate the snippets...

wilkinsona · 2020-04-28T08:13:18Z

Everything we do is just let restdocs generate the snippets...

REST Docs has to generate the snippets from some input and I suspect it's that input that contains broken umlauts. Once they're broken REST Docs can't do anything to fix them.

Can you give us a little hint?

As with the original problem, I can't give any more hints without some more information. If you can provide a small sample project that reproduces the problem, I can take a look. If you want to investigate yourself, I'd recommend debugging MockMvcRequestConverter (or the equivalent class for WebFlux or REST Assured) and checking the input into REST Docs in the convert method.

ghost · 2020-04-28T08:19:35Z

REST Docs has to generate the snippets from some input and I suspect it's that input that contains broken umlauts. Once they're broken REST Docs can't do anything to fix them.

I don´t see where we are manipulating our input. were just have a simple api.

As with the original problem, I can't give any more hints without some more information. If you can provide a small sample project that reproduces the problem, I can take a look. If you want to investigate yourself, I'd recommend debugging MockMvcRequestConverter (or the equivalent class for WebFlux or REST Assured) and checking the input into REST Docs in the convert method.

@RudiKlassen already posted a project where you can test this: https://github.com/RudiKlassen/SpringRestDocsExampleProject

I will debug this, but i would prefer if you can have a look too.

wilkinsona · 2020-04-28T08:35:11Z

The project from @RudiKlassen was user error. ISO-8859-1 encoding was being used, and as a result UTF-8 characters were broken, before the data reached REST Docs. If you want me to spend some time investigating your problem you're going to have to spend some time providing a sample that reproduces it.

ghost · 2020-04-28T08:40:22Z

I just cloned the project and made the gradle builds i explained in my comments before. There is NO! setting for ISO-8859-1. Everything is in UTF-8... So i don´t understand why you can´t use the example. So if you clone the project and just execute the gradle commands in the different environments. You will see the wrong output with the cmd console. The unix console will produce the correct output. While you have nothing changed in the project at all.

wilkinsona · 2020-04-28T09:17:36Z

Thanks. It wasn't clear to me that "the example project" was referring to the project provided by Rudi. Having looked more closely, I was mistaken before. There's a snippet template in the sample project that uses UTF-8 characters. REST Docs' snippet encoding only applies to the encoding of the output that REST Docs produces. It does not apply to the snippet templates when they're read and passed to JMustache. It should be possible to fix that.

In the meantime, you can make your build more robust and platform independent by configuring the test tasks to run with the file.encoding system property set to UTF-8.

ghost · 2020-04-28T09:21:46Z

Thank you for looking at this bug again! Next time i will write it more clearly ;-)

spring-projects-issues added the status: waiting-for-triage Untriaged issue label Feb 19, 2019

wilkinsona added the status: waiting-for-feedback Feedback is required before progress can be made label Feb 19, 2019

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback Feedback is required before progress can be made labels Feb 19, 2019

wilkinsona added status: waiting-for-feedback Feedback is required before progress can be made and removed status: feedback-provided Feedback has been provided labels Feb 19, 2019

RudiKlassen-zz closed this as completed Feb 25, 2019

RudiKlassen-zz reopened this Feb 25, 2019

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback Feedback is required before progress can be made labels Feb 25, 2019

wilkinsona closed this as completed Feb 25, 2019

wilkinsona added status: invalid Suggestion or bug report that we don't feel is valid and removed status: feedback-provided Feedback has been provided status: waiting-for-triage Untriaged issue labels Feb 25, 2019

wilkinsona reopened this Apr 28, 2020

wilkinsona changed the title ~~Generated snippets contain broken characters~~ Snippet templates that contain UTF-8 characters are corrupted on platforms with a non-UTF-8 default encoding Apr 28, 2020

wilkinsona added type: bug A bug and removed status: invalid Suggestion or bug report that we don't feel is valid labels Apr 28, 2020

wilkinsona added this to the 2.0.5.RELEASE milestone Apr 28, 2020

wilkinsona closed this as completed in 9162888 Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snippet templates that contain UTF-8 characters are corrupted on platforms with a non-UTF-8 default encoding #585

Snippet templates that contain UTF-8 characters are corrupted on platforms with a non-UTF-8 default encoding #585

RudiKlassen-zz commented Feb 19, 2019

wilkinsona commented Feb 19, 2019 •

edited

Loading

RudiKlassen-zz commented Feb 25, 2019

wilkinsona commented Feb 25, 2019

ghost commented Apr 28, 2020

wilkinsona commented Apr 28, 2020 •

edited

Loading

ghost commented Apr 28, 2020

wilkinsona commented Apr 28, 2020

ghost commented Apr 28, 2020 •

edited by ghost

Loading

wilkinsona commented Apr 28, 2020

ghost commented Apr 28, 2020

wilkinsona commented Apr 28, 2020 •

edited

Loading

ghost commented Apr 28, 2020

Snippet templates that contain UTF-8 characters are corrupted on platforms with a non-UTF-8 default encoding #585

Snippet templates that contain UTF-8 characters are corrupted on platforms with a non-UTF-8 default encoding #585

Comments

RudiKlassen-zz commented Feb 19, 2019

wilkinsona commented Feb 19, 2019 • edited Loading

RudiKlassen-zz commented Feb 25, 2019

wilkinsona commented Feb 25, 2019

ghost commented Apr 28, 2020

wilkinsona commented Apr 28, 2020 • edited Loading

ghost commented Apr 28, 2020

wilkinsona commented Apr 28, 2020

ghost commented Apr 28, 2020 • edited by ghost Loading

wilkinsona commented Apr 28, 2020

ghost commented Apr 28, 2020

wilkinsona commented Apr 28, 2020 • edited Loading

ghost commented Apr 28, 2020

wilkinsona commented Feb 19, 2019 •

edited

Loading

wilkinsona commented Apr 28, 2020 •

edited

Loading

ghost commented Apr 28, 2020 •

edited by ghost

Loading

wilkinsona commented Apr 28, 2020 •

edited

Loading