Skip to content

mod_security causes CGI scripts to timeout #2101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
markblackman opened this issue May 22, 2019 · 20 comments
Open

mod_security causes CGI scripts to timeout #2101

markblackman opened this issue May 22, 2019 · 20 comments
Assignees
Labels
2.x Related to ModSecurity version 2.x Platform - Apache
Milestone

Comments

@markblackman
Copy link

markblackman commented May 22, 2019

Describe the bug

After enabling mod_security in our Apache 2.4 configuration, more and more, but not all, CGI scripts will timeout and not usually immediately after restart but after a few minutes. As soon as mod_security is disabled, the issue goes away and all CGI scripts behave normally.

Logs and dumps

There are no mod_security logs, only Apache error logs like so....

[Wed May 22 12:40:46.262612 2019] [cgi:warn] [pid 198045:tid 139887210845952] [client 10.235.31.231:0] AH01220: Timeout waiting for output from CGI script /var/www/global-cgi-bin/web-info
[Wed May 22 12:40:46.262653 2019] [cgi:error] [pid 198045:tid 139887210845952] [client 10.235.31.231:0] Script timed out before returning headers: web-info

To Reproduce

Steps to reproduce the behavior:

A curl command line that mimics the original request and reproduces the problem.

curl -v http://somesite.corp.com/cgi-bin/any-shell-script

Expected behavior

I expected the CGI script to return it's output

Server (please complete the following information):

Rule Set (please complete the following information):

  • Running any public or commercial rule set? Owasp
  • What is the version number? Owasp 3.0

Additional context

This may be connected to filter processing issues seen in issues 2091, 2093

Re-reading the traces and the source code, everything points to the httpd parent never seeing any output from the CGI script, not even headers, in cgi_read_bucket.

https://github.com/apache/httpd/blob/2.4.39/modules/generators/mod_cgi.c#L694

How could mod_security interfere with the mod_cgi buckets in the output bucket brigade?

@victorhora victorhora added 2.x Related to ModSecurity version 2.x Platform - Apache labels May 23, 2019
@victorhora victorhora added this to the v2.9.4 milestone May 23, 2019
@victorhora
Copy link
Contributor

@markblackman, I'm wondering if any other Apache module is interfering? Do you have any other module enabled other than mod_cgi and mod_security? It could be great if you could test with a very minimal Apache configuration and a minimal set of modules to exclude the possibility of some other module interfering...

@victorhora victorhora self-assigned this May 23, 2019
@markblackman
Copy link
Author

@victorhora Hi, we certainly have a large number of modules, however, the behaviour we see is very binary. When we enable mod_security, our CGI scripts start failing, although not immediately, and when we disable mod_security, our CGI scripts start working again. This is a very repeatable and reproducible pattern and we have seen it many times in the last year. Unfortunately, what we cannot do is reproduce this outside of our large-scale end-user environments. So far we simply cannot reproduce it in the lab, although we have not tried very hard yet, other than simple high-concurrency loading. I suspect we might have to run continuous loads over an hour or so to see the issue in the lab. I suspect there is a problem in the handling of the output bucket brigade in some cases. cgi_bucket_read just times out on the child script output (stdout) file descriptor. Maybe a mod_security filter is not putting the file descriptor in the right place in the bucket brigade?

@zimmerle
Copy link
Contributor

Hi @markblackman,

Now that #2091 got merged, can you test it again within the v2/master branch?

@markblackman
Copy link
Author

You suspect that the changes in #2091 might have influenced the CGI timeouts? We have been running Rainer's patches for about a month now and we know the CGI problem will return even with Rainer's deltas.

@markblackman
Copy link
Author

markblackman commented Jul 9, 2019

We have made some partial progress tracking this down. We can see that the CGI child process is experiencing a segmentation fault, presumably after forking, but before exec'ing either suexec or the child.

The backtraces always look like this and we can see the pconf pool is always the pool with the damaged child clean-up function pointer.

Core was generated by /apache24/bin/httpd -f /apache24/conf/dynamic/apache24/httpd.conf -Dnick=uki1n2.

Program terminated with signal SIGSEGV, Segmentation fault.

#0 0x0000000000000000 in ?? ()
(gdb) where
#0 0x0000000000000000 in ?? ()
#1 0x00007f8c08a2d50e in run_child_cleanups (cref=0x79f158) at memory/unix/apr_pools.c:2642
#2 0x00007f8c08a2d53f in cleanup_pool_for_exec (p=0x79f138) at memory/unix/apr_pools.c:2649
#3 0x00007f8c08a2d559 in cleanup_pool_for_exec (p=0x79f138) at memory/unix/apr_pools.c:2652
#4 0x00007f8c08a2d559 in cleanup_pool_for_exec (p=0x79d128) at memory/unix/apr_pools.c:2652
#5 0x00007f8c08a2d582 in apr_pool_cleanup_for_exec () at memory/unix/apr_pools.c:2657
#6 0x00007f8c08a3a819 in apr_proc_create (new=0x7f8ad01b77d0, progname=0x5715e0 "/versions64/apache-2.4.39/bin/suexec", args=0x7f8ad01b7800, env=0x7f8ad01b6c00, attr=0x7f8ad01b7378, pool=0x7f8b0038fb78)
at threadproc/unix/proc.c:425
#7 0x000000000053fb70 in ap_unix_create_privileged_process (newproc=0x7f8ad01b77d0, progname=0x7f8b00299110 "/var/www/global-cgi-bin/some-cgi-script", args=0x7f8ad01b6be8, env=0x7f8ad01b6c00, attr=0x7f8ad01b7378, ugid=0xf64b230,
p=0x7f8b0038fb78) at unixd.c:194
#8 0x000000000053fbfb in ap_os_create_privileged_process (r=0x7f8b0038fbf0, newproc=0x7f8ad01b77d0, progname=0x7f8b00299110 "/var/www/global-cgi-bin/some-cgi-script", args=0x7f8ad01b6be8, env=0x7f8ad01b6c00, attr=0x7f8ad01b7378,
p=0x7f8b0038fb78) at unixd.c:210
#9 0x00007f8c06dd24a6 in run_cgi_child (script_out=0x7f8b45595af0, script_in=0x7f8b45595ae8, script_err=0x7f8b45595ae0, command=0x7f8b00299110 "/var/www/global-cgi-bin/some-cgi-script", argv=0x7f8ad01b6be8, r=0x7f8b0038fbf0,
p=0x7f8b0038fb78, e_info=0x7f8b45595aa0, do_trace=1) at mod_cgi.c:543
#10 0x00007f8c06dd4073 in cgi_handler (r=0x7f8b0038fbf0) at mod_cgi.c:1100
#11 0x000000000047a78c in ap_run_handler (r=0x7f8b0038fbf0) at config.c:170
#12 0x000000000047b2b4 in ap_invoke_handler (r=0x7f8b0038fbf0) at config.c:444
#13 0x00000000004ade8a in ap_process_async_request (r=0x7f8b0038fbf0) at http_request.c:453
#14 0x00000000004a971d in ap_process_http_async_connection (c=0x7f8b00377de8) at http_core.c:154
#15 0x00000000004a9937 in ap_process_http_connection (c=0x7f8b00377de8) at http_core.c:248
#16 0x0000000000489a57 in ap_run_process_connection (c=0x7f8b00377de8) at connection.c:42
#17 0x00007f8c07c17fab in process_socket (thd=0x1682eb8, p=0x7f8b00377ab8, sock=0x7f8b00377b40, cs=0x7f8b00377d40, my_child_num=14, my_thread_num=20) at event.c:1050
#18 0x00007f8c07c1ac96 in worker_thread (thd=0x1682eb8, dummy=0x74032080) at event.c:2083
#19 0x00007f8c08a3b808 in dummy_worker (opaque=0x1682eb8) at threadproc/unix/thread.c:142
#20 0x00007f8c083a9806 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f8c07f0062d in clone () from /lib64/libc.so.6
#22 0x0000000000000000 in ?? ()

@zimmerle
Copy link
Contributor

zimmerle commented Jul 9, 2019

Hi @markblackman

Did you have a chance to report this on mod_cgi as well? I am asking because as per the backtrace, it seems like the crash is reverberating from there. Even if the problem is not there, mod_cgi devs may be able to tell more about what could cause such behavior.

@markblackman
Copy link
Author

We're working directly with Rainer Jung of the Apache team to track it down. I just wondered if anyone in the ModSecurity team might have any ideas when/why a pconf pool would get corrupted or damaged.

@markblackman
Copy link
Author

We have reason to believe the corruption/damage might be happening on a request with a body (the POST case).

@zimmerle
Copy link
Contributor

There are some optimizations on the pmMatch that may try to update a memory area that was previously allocated during the configuration load. On those tests, are you running with or without rules? unload the rules cause any impact?

@markblackman
Copy link
Author

Running with OWASP 3 rules plus trivial tweaks.

@rainerjung
Copy link

@zimmerle By pmMatch you probably mean the @pm operator. By "may try to update a memory area that was previously allocated during the configuration load" you are referring to the use of memcpy(), strcat(), strncat() or strncpy() copying data to a target that was allocated from a pool? So a potentially wrong or missing length check?

@markblackman
Copy link
Author

So our current hypothesis is that the pconf pool is corrupted by the ap_varbuf_grow routine (in child pool ptemp) during start-up configuration and more than one kind of rule triggers it, but maybe not all rules trigger it.

Not yet tested the 'no-rules' case, but using the apr pool debug option is pointing to an issue associated with ap_varbuf_grow

@markblackman
Copy link
Author

This is in the Apache configuration code though, not the mod_security code. Possibly these very long configuration lines trigger some configuration processing bugs.

@markblackman
Copy link
Author

markblackman commented Jul 15, 2019

The general configuration corruption issue is reproducible with ease as far as I can tell with the APR pool debugging on. I built APR (--enable-pool-debug=all), APR-util, httpd-2.4.39 and mod_security 2.9.3 from scratch on a stock, plain FreeBSD 12 and got a segmentation violation during the apache start-up phase with the core rule set 3.1.1 configured. The segmentation violation points to a particular rule, but removing that rule just seems to trigger it elsewhere.

Here's the lldb stack trace from start-up

POOL DEBUG: [56197/34368745472]  PALLOC (    150860/    167174/    285044) 0x80089d1e0 "pconf" <strings/apr_strings.c:118> (2173/2173/0) 
POOL DEBUG: [56197/34368745472]  PALLOC (     16403/     16403/    285133) 0x80089d6e0 "ptemp" <core.c:1399> (323/323/0) 
POOL DEBUG: [56197/34368745472]  PALLOC (     16411/     16411/    285141) 0x80089d6e0 "ptemp" <util.c:762> (324/324/0) 
POOL DEBUG: [56197/34368745472] PCALLOC (    150932/    167343/    285213) 0x80089d1e0 "pconf" <config.c:1155> (2174/2174/0) 
POOL DEBUG: [56197/34368745472]  PALLOC (    151013/    167424/    285294) 0x80089d1e0 "pconf" <strings/apr_strings.c:118> (2175/2175/0) 
Process 56197 stopped
* thread #1, name = 'httpd', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x000000080041260f libapr-1.so.0`apr_allocator_alloc [inlined] allocator_alloc(allocator=0x0000000000000000, in_size=3343) at apr_pools.c:306
   303 	    /* First see if there are any nodes in the area we know
   304 	     * our node will fit into.
   305 	     */
-> 306 	    if (index <= allocator->max_index) {
   307 	#if APR_HAS_THREADS
   308 	        if (allocator->mutex)
   309 	            apr_thread_mutex_lock(allocator->mutex);
(lldb) thread backtrace
* thread #1, name = 'httpd', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x000000080041260f libapr-1.so.0`apr_allocator_alloc [inlined] allocator_alloc(allocator=0x0000000000000000, in_size=3343) at apr_pools.c:306
    frame #1: 0x00000008004125b4 libapr-1.so.0`apr_allocator_alloc(allocator=0x0000000000000000, size=3343) at apr_pools.c:515
    frame #2: 0x0000000000266142 httpd`ap_varbuf_grow(vb=0x00007fffffffdc98, new_len=3327) at util.c:2901
    frame #3: 0x0000000000265eb2 httpd`ap_varbuf_cfg_getline(vb=0x00007fffffffdc98, cfp=0x00000008009f74b0, max_len=16777216) at util.c:1192
    frame #4: 0x000000000025c6f4 httpd`ap_build_config(parms=0x00007fffffffdd10, p=0x000000080089d1e0, temp_pool=0x000000080089d6e0, conftree=0x00007fffffffe538) at config.c:1422
    frame #5: 0x000000000025ce7a httpd`ap_process_resource_config(s=<unavailable>, fname="/opt/httpd/etc/crs/rules/REQUEST-903.9002-WORDPRESS-EXCLUSION-RULES.conf", conftree=0x00007fffffffe538, p=0x000000080089d1e0, ptemp=0x000000080089d6e0) at config.c:1885
    frame #6: 0x000000000025d0d0 httpd`process_resource_config_nofnmatch(s=0x0000000800902000, fname="/opt/httpd/etc/crs/rules/REQUEST-903.9002-WORDPRESS-EXCLUSION-RULES.conf", conftree=0x00007fffffffe538, p=0x000000080089d1e0, ptemp=0x000000080089d6e0, depth=0, optional=0) at config.c:1979
    frame #7: 0x000000000025d479 httpd`process_resource_config_fnmatch(s=<unavailable>, path=<unavailable>, fname=<unavailable>, conftree=<unavailable>, p=<unavailable>, ptemp=<unavailable>, depth=0, optional=<unavailable>) at config.c:2071
    frame #8: 0x000000000025d30e httpd`process_resource_config_fnmatch(s=<unavailable>, path=<unavailable>, fname="rules", conftree=<unavailable>, p=<unavailable>, ptemp=<unavailable>, depth=0, optional=<unavailable>) at config.c:2015
    frame #9: 0x000000000025d30e httpd`process_resource_config_fnmatch(s=<unavailable>, path=<unavailable>, fname="crs", conftree=<unavailable>, p=<unavailable>, ptemp=<unavailable>, depth=0, optional=<unavailable>) at config.c:2015
    frame #10: 0x000000000025d30e httpd`process_resource_config_fnmatch(s=<unavailable>, path=<unavailable>, fname="etc", conftree=<unavailable>, p=<unavailable>, ptemp=<unavailable>, depth=0, optional=<unavailable>) at config.c:2015
    frame #11: 0x000000000025d30e httpd`process_resource_config_fnmatch(s=<unavailable>, path=<unavailable>, fname="httpd", conftree=<unavailable>, p=<unavailable>, ptemp=<unavailable>, depth=0, optional=<unavailable>) at config.c:2015
    frame #12: 0x000000000025d30e httpd`process_resource_config_fnmatch(s=<unavailable>, path=<unavailable>, fname="opt", conftree=<unavailable>, p=<unavailable>, ptemp=<unavailable>, depth=0, optional=<unavailable>) at config.c:2015
    frame #13: 0x000000000025d00d httpd`ap_process_fnmatch_configs(s=0x0000000800902000, fname="/opt/httpd/etc/crs/rules/*.conf", conftree=0x00007fffffffe538, p=0x000000080089d1e0, ptemp=0x000000080089d6e0, optional=0) at config.c:2135
    frame #14: 0x0000000000282abf httpd`include_config(cmd=0x00007fffffffe8c0, dummy=0x00007fffffffe7e0, name=<unavailable>) at core.c:3306
    frame #15: 0x000000000025f11d httpd`invoke_cmd(cmd=0x0000000000210a68, parms=0x00007fffffffe8c0, mconfig=0x00007fffffffe7e0, args=<unavailable>) at config.c:928
    frame #16: 0x000000000025c213 httpd`ap_build_config_sub at config.c:1703
    frame #17: 0x000000000025c1b4 httpd`ap_build_config_sub(p=0x000000080089d1e0, temp_pool=<unavailable>, l=<unavailable>, parms=0x00007fffffffe8c0, current=0x00007fffffffe878, curr_parent=0x00007fffffffe880, conftree=0x00000000002ab100) at config.c:1167
    frame #18: 0x000000000025c71b httpd`ap_build_config(parms=0x00007fffffffe8c0, p=0x000000080089d1e0, temp_pool=0x000000080089d6e0, conftree=0x00000000002ab100) at config.c:1424
    frame #19: 0x000000000025ce7a httpd`ap_process_resource_config(s=<unavailable>, fname="/opt/httpd/conf/httpd.conf", conftree=0x00000000002ab100, p=0x000000080089d1e0, ptemp=0x000000080089d6e0) at config.c:1885
    frame #20: 0x000000000025e2dd httpd`ap_read_config(process=<unavailable>, ptemp=0x000000080089d6e0, filename=<unavailable>, conftree=0x00000000002ab100) at config.c:2522
    frame #21: 0x000000000025962b httpd`main(argc=1, argv=0x00007fffffffeb28) at main.c:659
    frame #22: 0x000000000025911b httpd`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1.c:76
(lldb)

Exactly the same stack trace we see from gdb on an ancient SLES11 SP4 system

@markblackman
Copy link
Author

For some reason the allocator (in the ptemp pool) has become null

@markblackman
Copy link
Author

If it helps, here's vb, but I think you can reproduce this with ease.

(lldb) print *vb
(ap_varbuf) $0 = {
  buf = 0x00000008012b7000 "        SecRule &ARGS:action "@eq 1"             "t:none,            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[0][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[1][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[2][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[3][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[4][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[5][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[6][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[7][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[8][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[9][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[10][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[11][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[12][text],            ctl:ruleRemoveTargetByTag=CRS;ARGS:widget-text[13][text],          "
  avail = 1663
  strlen = 1662
  pool = 0x000000080089d6e0
  info = 0x0000000000000000
}

@markblackman
Copy link
Author

These all seem to be side effects of using pool debugging and are not related to the original issue.

@markblackman
Copy link
Author

Based on what we're seeing with cleanup function issues, I would say we have been seeing precisely what was seen in #890 and the corresponding PR 2049

@markblackman
Copy link
Author

Applying the PR in #2049 has solved our problem.

@markblackman
Copy link
Author

Although PR #2049 deals with our coredump problem (due to corrupted cleanup functions), we're seeing a lot more CPU utilization than we expected and attaching to the active httpd processes shows a lot of time spent in poll()

$ fgrep '#0' /tmp/gdb*.out  | awk -Fin '{print $2}' | sort | uniq -c | sort -nr | head -30
   1845  pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    587  poll () from /lib64/libc.so.6
     64  __pthread_mutex_lock_full () from /lib64/libpthread.so.0
     47  epoll_wait () from /lib64/libc.so.6
      8  select () from /lib64/libc.so.6
      8  read () from /lib64/libpthread.so.0
      8  nanosleep () from /lib64/libc.so.6
      5  pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      4  match () from /versions64/pcre-8.41/lib/libpcre.so.1
      3  rsaz_1024_sqr_avx2 () from /versions64/openssl-1.0.2-latest/lib/libcrypto.so.1.0.0
      1  pcre_exec () from /versions64/pcre-8.41/lib/libpcre.so.1
      1  normalize_path_
      1  msre_action_setvar_parse (msr=0x7fb4f8610088, mptmp=0x7fb504448c38, rule=0x52d8b90, action=0x52d9810)
      1  getaddr
      1  apr_vformatter (flush_func=0x7fb6616178c6 <pspr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to ModSecurity version 2.x Platform - Apache
Projects
None yet
Development

No branches or pull requests

4 participants