-
Notifications
You must be signed in to change notification settings - Fork 18k
url.parseQuery supporting & but not ";" as separator #2210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
I am aware of the "recommendation". However, I am unaware of any commonly used clients that send ; instead of &, and given the lack of use in any clients I don't see much point to supporting it on the server. What web server libraries support ; ? You pointed at HTML 4.01. I would be more inclined if the HTML 5 spec said something. Russ Owner changed to @rsc. Status changed to WaitingForReply. |
[seems replying to [email protected] did not work] > I am aware of the "recommendation". > > However, I am unaware of any commonly used clients that > > send ; instead of &, and given the lack of use in any > > clients I don't see much point to supporting it on the server. > > What web server libraries support ; ? I've checked a few. Some do, some do not: Python: yes http://hg.python.org/cpython/file/2.7/Lib/urlparse.py#l379 pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')] Ruby (CGI): yes http://ruby-doc.org/stdlib/libdoc/cgi/rdoc/classes/CGI.html#M000108 query.split(/[&;]/).each do |pairs| Ruby (/usr/lib/ruby/1.8/webrick/httputils.rb): yes def parse_query(str) ... str.split(/[&;]/).each{|x| Haskell (CGI): no http://hackage.haskell.org/packages/archive/cgi/3001.1.8.2/doc/html/src/Network-CGI-Protocol.html#CGIRequest where (nv,rs) = break (=='&') s (n,v) = break (=='=') nv Inferno: no inferno/appl/svc/httpd/cgiparse.b Android: no core/java/android/net/Uri.java getQueryParameters Could not look into: .NET Framework 4 (HttpUtility.ParseQueryString): ? http://msdn.microsoft.com/de-de/library/ms150046.aspx As for myself I am using a set of proprietary C libraries (cgi, html templating) I wrote >10 years ago, which form a web application running in embedded systems. I'm trying to extend this system, or replace parts of it, using Go, and tried to examine how it would fit together. I agree that there shouldn't be a problem with web clients and form data, as they always use "&". What I had in mind are hyperlinks containing query strings, like `<a href="foo.cgi?sort=1;limit=20;columns=3">...</a>'. Such links are emitted by some of my cgi programs as part of html pages. In such cases I used to use ";" as separator, as ascii-only query strings were easy to construct even from shell scripts. > > You pointed at HTML 4.01. I would be more inclined if > > the HTML 5 spec said something. Apparently the HTML5 spec only says -- in the url-encoded form data section -- that `&` has to be used ("append a single U+0026 AMPERSAND character"), and I found nothing about form-data-like query strings as part of URIs used in href attributes. So one might suppose that such strings should also contain &, not ";". I can't tell how common the use of ";" still is. As one can see from sites like the google search page, in many cases & is properly escaped as &, but not always: <a href="/advanced_search?q=form+data+parse_query&hl=de&ie=UTF-8&prmd=ivns" class="gl nobr" id="sflas">Erweiterte Suche</a> vs. <a class=gb1 href="http://www.google.de/search?q=form+data+parse_query&um=1&ie=UTF-8&tbm=isch&source=og&sa=N&hl=de&tab=wi">Bilder</a> Same for amazon.com, nytimes.com. One can get an idea why ";" was used alternatively, as in many generated pages both & and & get inserted as separators in href attributes, depending on whether a programmer took care or not. Perhaps it is the best to forget about the semicolon for now, and see if there will be a section in a new revision of the HTML5 spec. Besides, it is probably better to fix broken query-string generators in cgi programs (with increasing use of utf-8 strings in query values there has to be a proper escaping anyway, so one should be able to insert the &s easily). Michael |
For Perl there exist multiple ways to parse a query string, two of them are provided by CGI.pm from the Perl core, and Apache2::Request from libapreq. Both support "&" and ";": CGI.pm: http://codesearch.google.com/#E4XixW5gvCc/pub/CPAN/src/latest.tar.bz2%7CNU9eyGOUCk8/perl-5.12.1/cpan/CGI/lib/CGI.pm&type=cs&l=792 sub parse_params ... my(@pairs) = split(/[&;]/,$tosplit); libapreq: http://svn.apache.org/viewvc/httpd/apreq/trunk/library/param.c?view=markup#l158 APREQ_DECLARE(apr_status_t) apreq_parse_query_string ... { ... for (;;++qs) { switch (*qs) { ... case '&': case ';': ... s = apreq_param_decode(...); In PHP, query string parsing is done by function ext/standard/string.c:parse_str() and ext/mbstring/mb_gpc.c:mbstr_treat_data(), which is using a configurable parameter "arg_separator.input" from php.ini, which is "&" per default. If the semicolon needs to be supported too, one will have to edit arg_separator.input appropriately. Php.ini says: ; List of separator(s) used by PHP to parse input URLs into variables. ; PHP's default setting is "&". ; NOTE: Every character in this directive is considered as separator! ; http://php.net/arg-separator.input ; Example: ;arg_separator.input = ";&" Drupal, btw, is also using PHP's parse_str, in drupal_parse_url(). |
This issue was closed by revision 686181e. Status changed to Fixed. |
So the one issue I've found so far... RFC does not require form data posted in a body to be URL Encoded, as far as I can tell. I'm running into this issue as I have a value with an unencoded ";" in form value of a string. parseQuery treats the body and the url query as being similar, where one has restrictions that the other doesn't. I could be wrong. |
Nevermind. After some more digging the wording is as such in RFC 1866, that the default encoding is url encoded. I actually found it to be a bug in my http client library explicitly not encoding ";". 8.2.1. The form-urlencoded Media Type The default encoding for all forms is `application/x-www-form- urlencoded'. A form data set is represented in this media type as follows: 8.2.3. Forms with Side-Effects: METHOD=POST If the service associated with the processing of a form has side effects (for example, modification of a database or subscription to a service), the method should be `POST'. To process a form whose action URL is an HTTP URL and whose method is `POST', the user agent conducts an HTTP POST transaction using the action URI, and a message body of type `application/x-www-form- urlencoded' format as above. The user agent should display the response from the HTTP POST interaction just as it would display the response from an HTTP GET above. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
by mt4swm:
The text was updated successfully, but these errors were encountered: