Skip to content

feat: Add automatic charset=UTF-8 to text content types#3071

Merged
ahopkins merged 7 commits into
mainfrom
content-charset
Dec 4, 2025
Merged

feat: Add automatic charset=UTF-8 to text content types#3071
ahopkins merged 7 commits into
mainfrom
content-charset

Conversation

@Tronic

@Tronic Tronic commented Jun 25, 2025

Copy link
Copy Markdown
Member

Refactor content type handling to automatically append charset=UTF-8 to text/* MIME types when serving static files and file responses.

  • Add new guess_content_type() utility function that wraps mimetypes.guess_type()
  • Automatically append '; charset=UTF-8' to text content types
  • Replace direct mimetypes.guess_type() usage with new utility
  • Update static file serving and file() response functions to use new utility

Fix #2987

Tronic added 2 commits June 25, 2025 14:45
Refactor content type handling to automatically append charset=UTF-8
to text/* MIME types when serving static files and file responses.

- Add new guess_content_type() utility function that wraps mimetypes.guess_type()
- Automatically append '; charset=UTF-8' to text content types
- Replace direct mimetypes.guess_type() usage with new utility
- Update static file serving and file() response functions to use new utility
@Tronic Tronic requested a review from a team as a code owner June 25, 2025 20:51
@Tronic

Tronic commented Jun 25, 2025

Copy link
Copy Markdown
Member Author

Looks like tests are (still) broken. I cannot run them on my own box either without plenty of (unrelated) failures. Further work may be required on the PR, in particular with any existing tests that could've been broken by the change.

@Tronic

Tronic commented Jun 26, 2025

Copy link
Copy Markdown
Member Author

I manually ran only the relevant tests because so many things fail with Sanic test server "address already in use" making the full test results unmanageable. Fixed a few issues and optimized the tests run a bit faster avoiding overlap.

I note that file() should probably default to HTTP default (i.e. application/octet-stream), rather than text/plain as it was hardcoded to do. In particular this affects if some arbitrary format data files are sent (e.g. test.dat would be responded as plain text), although one can see the reasoning to send a bare README or such (legacy) files as text so it can be displayed in browsers rather than downloaded. This however is in conflict with how Sanic behaves for the static file handler.

For the sake of maintaining compatibility with prior versions, this PR does not change the file() fallback type, other than by appending to the charset to it as well. app.static() just as before defaults to HTTP default, which has no charset.

…() interface as it is related. Removed duplicate tests, clarified naming.
ahopkins
ahopkins previously approved these changes Nov 30, 2025
@ahopkins ahopkins merged commit f7002a3 into main Dec 4, 2025
27 checks passed
@ahopkins ahopkins deleted the content-charset branch December 4, 2025 05:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Content type on static files lacks charset=UTF-8

2 participants