Skip to content

configparser accepts invalid keys and sections when writing #65697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
The-Compiler mannequin opened this issue May 13, 2014 · 8 comments
Open

configparser accepts invalid keys and sections when writing #65697

The-Compiler mannequin opened this issue May 13, 2014 · 8 comments
Labels
stdlib Python modules in the Lib dir

Comments

@The-Compiler
Copy link
Mannequin

The-Compiler mannequin commented May 13, 2014

BPO 21498
Nosy @ambv, @The-Compiler, @isidentical

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/ambv'
closed_at = None
created_at = <Date 2014-05-13.14:47:29.111>
labels = ['type-bug', 'library']
title = 'configparser accepts keys beginning with comment_chars when writing'
updated_at = <Date 2020-01-11.15:02:37.046>
user = 'https://github.com/The-Compiler'

bugs.python.org fields:

activity = <Date 2020-01-11.15:02:37.046>
actor = 'BTaskaya'
assignee = 'lukasz.langa'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2014-05-13.14:47:29.111>
creator = 'The Compiler'
dependencies = []
files = []
hgrepos = []
issue_num = 21498
keywords = []
message_count = 2.0
messages = ['218463', '359797']
nosy_count = 3.0
nosy_names = ['lukasz.langa', 'The Compiler', 'BTaskaya']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue21498'
versions = []

Linked PRs

@The-Compiler
Copy link
Mannequin Author

The-Compiler mannequin commented May 13, 2014

When adding something to a configparser instance which has a key beginning with a comment char, it writes the data to a file without generating an error, and when reading the file back obviously the data is different as it's a comment:

    >>> cp = configparser.ConfigParser()
    >>> cp.read_dict({'DEFAULT': {';foo': 'bar'}})
    >>> cp.write(sys.stdout)
    [DEFAULT]
    ;foo = bar

This was discussed on python-dev here:
https://mail.python.org/pipermail/python-dev/2014-April/134293.html

Of course there are other corner cases as well, like having a key like "[foo]" or "=bar".

I think whatever data I pass into a configparser should also come out again when reading the file back.

Since there's no escaping in configparser, I think the ideal solution would be configparser refusing to write ambigious values.

While this is technically a backwards-incompatible change, applications doing this were broken in the first place, so validation while writing will not break anything.

Validating when setting values would be better of course, but this can potentially break applications where configparser is used without actually writing a file.

@The-Compiler The-Compiler mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels May 13, 2014
@ambv ambv self-assigned this May 27, 2014
@isidentical
Copy link
Member

Any update on this? Discussion over Python-Dev looks like finished without a consensus/resolution.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@jaraco jaraco changed the title configparser accepts keys beginning with comment_chars when writing configparser accepts invalid keys when writing Jan 15, 2025
@jaraco jaraco removed the type-bug An unexpected behavior, bug, or error label Jan 15, 2025
@marc-hb
Copy link

marc-hb commented Jan 15, 2025

#69909 is similar - and much more detailed.

@jaraco
Copy link
Member

jaraco commented Jan 15, 2025

See also #69909, #67490, #65122, where the prior consensus seems to have been to be lenient to what the user provides, and leave it up to them to provide valid inputs. I can see the flip side that it would be valuable to provide some level of validation and reasonableness checks. However, it appears there are cases where other parsers might accept inputs that Python's configparser might not accept (e.g. section names like [Test[2]_foo] as mentioned in #69909), where it might be valuable for a Python script to be able to generate such outputs even if it can't read it. It won't be possible with a simple implementation to please both users that want to generate idiosyncratic output and those that want to reject it.

I've re-classified this as a feature and not a bug as the current approach is stable and defended and any change is going to be some kind of enhancement.

I think the next thing that needs to happen is to draft a design that addresses the concerns laid out in the various discussions. Such a design should probably start by synthesizing all of the concerns brought up in these different threads (the issues mentioned above and other discussions linked from those). An LLM might be helpful in performing such a synthesis.

@jaraco jaraco marked this as a duplicate of #69909 Jan 15, 2025
@jaraco jaraco changed the title configparser accepts invalid keys when writing configparser accepts invalid keys and sections when writing Jan 15, 2025
@jaraco jaraco unassigned ambv Jan 15, 2025
@jaraco
Copy link
Member

jaraco commented Jan 15, 2025

I'm happy to sponsor this work (guide, direct, mentor), but I won't be working on it myself and I don't have a strong opinion on the design. The key is going to be to garner consensus.

@marc-hb
Copy link

marc-hb commented Jan 16, 2025

where it might be valuable for a Python script to be able to generate such outputs even if it can't read it.

A debatable feature but not a bug, granted.

I've re-classified this as a feature and not a bug as the current approach is stable

Classifying this entire issue as a feature does not make sense to me... The demoes I gave in in:

... both create very basic, valid .ini files that return different data - no matter which .ini parser you read them back with. This is basically the same as SQL injection which I don't think anyone would ever classify as a "feature". Also, there's no backwards-compatibility considerations in cases where something clearly never worked.

It's pretty common to turn everything upside and down and rewrite everything to fix a collection of bugs all at once (many developers love to write new code and prefer that sort of approach), yet that does not make each of those bugs a "feature".

As mentioned in #69909, it is possible to use configparser without serializing to an .ini file and parsing anything back. Probably not a common use case for "somethingparser" but still a valid use case. No parsing bug in such a use case and no need to change that either.

@lincolnj1
Copy link
Contributor

lincolnj1 commented Jan 20, 2025

inputs that Python's configparser might not accept (e.g. section names like [Test[2]_foo] as mentioned in

This appears to have been fixed. The regex for parsing section names (RawConfigParser._SECT_TMPL = \[.+\]) will greedily consume brackets on a given line. See below:

>>> import configparser
>>> import sys
>>> cfg = configparser.ConfigParser()
>>> cfg.read_dict({'T[e]st' : {'foo' : 'bar'}})
>>> cfg.write(sys.stdout)
[T[e]st]
foo = bar

>>> with open('example.ini', 'w') as fp:
...     cfg.write(fp)
...
>>> file2cfg = configparser.ConfigParser()
>>> with open('example.ini', 'r') as fp:
...     file2cfg.read_file(fp)
...
>>> file2cfg.write(sys.stdout)
[T[e]st]
foo = bar

>>>

Such a design should probably start by synthesizing all of the concerns brought up in these different threads

Since our primary concern is writing and reading back different data, I think we should only sanitize/reject data in the write() function, not prevent users from setting potentially bad data in the configparser. As far as I understand, the remaining concerns are as follows: line comments are not escaped, newlines are not escaped (which allows for injection), and key names can contain delimiters which corrupt the result of a following read.

On comments, I'm not sure what we'd gain from escaping them aside from making it more difficult to comment files programmatically. I'd suggest leaving existing functionality as is: leading comment prefixes in keys and trailing prefixes in values cause the parser to ignore the rest of the line, otherwise they are read back.

>>> import configparser
>>> import sys
>>> cfg = configparser.ConfigParser()
>>> cfg.read_dict({'test' : {';foo' : 'bar', '\;foo' : 'bar'}})
>>> cfg.set('test', 'c;', 'd')
>>> cfg.set('test', 'e', 'f ; this is a comment')
>>> cfg.read_dict({';test2' : {'a' : 'b'}}
>>> cfg.write(sys.stdout)
[test]
;foo = bar
\;foo = bar
c; = d
e = f ; this is a comment

[;test2]
a = b

>>> with open('example.ini', 'w') as fp:
...     cfg.write(fp)
...
>>> file2cfg = configparser.ConfigParser(inline_comment_prefixes=(';', '#'))
>>> with open('example.ini', 'r') as fp:
...     file2cfg.read_file(fp)
...
>>> file2cfg.write(sys.stdout)
[test] #note the parser skipped the first element since it was a comment
\;foo = bar
c; = d
e = f

[;test2]
a = b

Things get interesting with newlines. Using marc’s example in (#69909 (comment)) but escaping the newline in ‘[privileged_section]\nevil’ produces:

[controlled_section]

[privileged_section]\nevil

Which reads back as:

[controlled_section]

[privileged_section]

It seems escaping newlines is not sufficient to defend against this particular attack, but escaping the leading ‘[‘ does prevent a malicious user from adding new sections in this manner.

On delimiters, since they are user-defined and cannot necessarily be escaped, I'd suggest creating a new InvalidKeyError class and raising one if a user attempts to write a key with a delimiter to a file. This would halt file writing since write() does not try to catch an error on _write_section(), which is probably fine.

@jaraco
Copy link
Member

jaraco commented Jan 26, 2025

Classifying this entire issue as a feature does not make sense to me...

Fair enough. I'm not opposed to there being selective fixes for uncontested flaws.

jaraco pushed a commit that referenced this issue Feb 23, 2025
…ead (#129270)

---------

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
…erly read (python#129270)

---------

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

5 participants