Skip to content

nginx reload出现了core, 原因是 lua module初始化流程中有修改 ngx_cycle导致。 #2421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhangqiang-01 opened this issue May 12, 2025 · 0 comments

Comments

@zhangqiang-01
Copy link

zhangqiang-01 commented May 12, 2025

我们在回归测试时遇到这样一个问题:
nginx reload时,nginx master进程会收到子进程退出信号,并调用 ngx_process_get_status强制释放共享内存锁。

#0 0x000055e2ccdd6a9a in ngx_shmtx_force_unlock (mtx=0x68, pid=3086274) at src/core/ngx_shmtx.c:155
#1 0x000055e2cce03804 in ngx_unlock_mutexes (pid=3086274) at src/os/unix/ngx_process.c:647
#2 0x000055e2cce0371a in ngx_process_get_status () at src/os/unix/ngx_process.c:596
#3 0x000055e2cce03378 in ngx_signal_handler (signo=17, siginfo=0x7f43389fbcf0, ucontext=0x7f43389fbbc0)
at src/os/unix/ngx_process.c:493

f1 && dump shared_memory list

(gdb) p i
$1 = 45
(gdb) p shm_zone[44]
$2 = {data = 0x7f433706fee0, shm = {addr = 0x7f428adb0000 "", size = 1048576, name = {len = 14,
data = 0x7f433706fe98 "cc_rule_shared"}, log = 0x7f43374d8080, exists = 0},
init = 0x55e2ccf5b7a4 <ngx_http_lua_shared_memory_init>, tag = 0x55e2cd4bd500 <ngx_http_lua_module>, sync = 0x0,
noreuse = 0}
(gdb) p shm_zone[45]
$3 = {data = 0x7f4338123f78, shm = {addr = 0x0, size = 32768, name = {len = 11, data = 0x7f43378d1c41 "tlog_server"},
log = 0x7f425f70c080, exists = 0}, init = 0x55e2ccfd144c <ngx_http_tlog_check_shm_zone_init>,
tag = 0x55e2cd4c2640 <ngx_http_tlog_upstream_module>, sync = 0x0, noreuse = 0}

上述的log对象理论上是同一个地址 (ngx_cycle->log)。 排除了内存越界等问题后(asan方式检查),通过在代码中埋点,发现在 child信号中获取到的ngx_cycle是一个正在初始化中的cycle。经过进一步排查,发现ngx_cycle的修改发生在 ngx_http_lua_shared_memory_init 中,我们通过对此处代码添加flag, 记录上述的core是否发生在 ngx_cycle修改前后,进一步确定了此问题。

ngx_http_lua_shared_memory_init

if (lmcf->shm_zones_inited == lmcf->shm_zones->nelts
    && lmcf->init_handler && !ngx_test_config)
{
    saved_cycle = ngx_cycle;

    in_update_cycle = 0;  // 我们添加了这几行代码,确定了core发生时,in_update_cycle为1
    ngx_cycle = ctx->cycle;
    in_update_cycle = 1;
    rc = lmcf->init_handler(ctx->log, lmcf, lmcf->lua);  // 如果这个地方耗时较长,很很容复现。我们的lua code文件数量比较多。
    ngx_cycle = saved_cycle;
    in_update_cycle = 0;

    if (rc != NGX_OK) {
        /* an error happened */
        return NGX_ERROR;
    }
}

根本原因:
nginx收到reconfigure信号后,只是记录了一个reconfigure标志。在for循环中处理reload操作。 而child信号的处理会中断正在执行的流程,并访问ngx_cycle。 如果上述ngx_cycle被修改为一个正在初始化中的cycle, 就导致了问题出现。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant