Longest prefix match for contexts sometimes doesn't work

*tl;dr;* From what I can tell, mod_proxy_cluster tries to select a node based on the longest context match. However, very rarely, this does not work. Maybe a race condition due to missing locks?

## Our Problem


We have a mod_proxy_cluster loadbalancer, where two nodes register, each with two contexts:

```
node1: /
node1: /onlyOnNode1
node2: /
node2: /onlyOnNode2
```

When I send a request `/onlyOnNode1` to Apache, this request is sometimes mistakenly forwarded to `node2` by mod_proxy_cluster. 


The system will run fine for days or weeks, and only then will the problematic behavior occur. 
Once the problematic behavior occurs, it doesn't seem to go away until restarting at least Apache or the Tomcat.
However, because the problematic state is very rare, I haven't been able to investigate it much.


## Looking at the code

From what I've been able from the code, this part here searches all longest matches for a URL:

https://github.com/modcluster/mod_proxy_cluster/blob/3cd7ad67257de0ab8f017d70d6d79bda5ff9158d/native/common/common.c#L426-L435

So in my case assuming the `context_table` given above, `max` should be set to `strlen(/onlyOnNode1) == 12`, and the `length` array should look like this: `1, 12, 1, 0` .

Then, we do this:

https://github.com/modcluster/mod_proxy_cluster/blob/3cd7ad67257de0ab8f017d70d6d79bda5ff9158d/native/common/common.c#L440-L480

Here, we ignore any context whose corresponding length-array-entry isn't `== max == 12`, so the only one remaining would be `node1: /onlyOnNode1`. 

But this can't have happened, otherwise I wouldn't be here. 
If my context dind't pass the status check on lines 456-465, we should end up here instead:

https://github.com/modcluster/mod_proxy_cluster/blob/9b2f5109345d504efffc6ece156ae015c96d7dc6/native/mod_proxy_cluster/mod_proxy_cluster.c#L2709

But this didn't happen either. Instead we get to a request on Tomcat Node2.

Any ideas how this could happen? 


## Possible race condition?

One idea is, that for some reason, maybe a race condition, the entry `node1: /onlyOnNode1` is missing completely from the context table. In that case, it might select `node2: /` instead depending on the order of the context-table. 

I wasn't able to make sense of the locking code, however, and how it influences (calls to) `read_context_table` or maybe it's actually missing?


## MCM Status Servlet

Current dump from the MCM servlet (only enabled after we ran into this issue a few times, haven't seen it since):
```
Node myapp1 (https://myapp1.mycompany:8443):
Enable Contexts Disable Contexts Stop Contexts
Balancer: myapp,LBGroup: ,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: -1,Ttl: 60000000,Status: OK,Elected: 34929,Read: 967163,Transferred: 590290274,Connected: 0,Load: 100
Virtual Host 1:
Contexts:
/, Status: ENABLED Request: 0 Disable Stop
/onlyOnNode1, Status: ENABLED Request: 0 Disable Stop
Aliases:
localhost

Node myapp2 (https://myapp2.mycompany:8443):
Enable Contexts Disable Contexts Stop Contexts
Balancer: myapp,LBGroup: ,Flushpackets: Off,Flushwait: 10000,Ping: 10000000,Smax: -1,Ttl: 60000000,Status: OK,Elected: 22738,Read: 679877,Transferred: 371948068,Connected: 0,Load: 100
Virtual Host 1:
Contexts:
/, Status: ENABLED Request: 0 Disable Stop
/onlyOnNode2, Status: ENABLED Request: 0 Disable Stop
Aliases:
localhost
```

Enabling tracing does not seem to print anything about the context selection. Also it doesn't seem to be possible to restrict tracing to the mod_proxy_cluster module.

	if (strncmp(uri, context->context, len) == 0) {
	if (uri[len] == '\0' \|\| uri[len] == '/' \|\| len == 1) {
	status[j] = context->status;
	length[j] = len;
	if (len > max) {
	max = len;
	}
	}
	}
	}

	/* find the best matching contexts */
	nbest = 1;
	for (j = 0; j < sizecontext; j++) {
	if (length[j] == max) {
	nbest++;
	}
	}

	best = apr_palloc(r->pool, sizeof(node_context) * nbest);
	nbest = 0;
	for (j = 0; j < sizecontext; j++) {
	if (length[j] == max) {
	contextinfo_t *context;
	int ok = 0;
	context = &context_table->context_info[j];
	/* Check status */
	switch (status[j]) {
	case ENABLED:
	ok = 1;
	break;
	case DISABLED:
	/* Only the request with sessionid ok for it */
	if (hassession_byname(r, context->node, route, node_table)) {
	ok = 1;
	}
	break;
	}
	if (ok) {
	best[nbest].node = context->node;
	best[nbest].context = context->id;
	nbest++;
	}
	}
	}
	if (nbest == 0) {
	return NULL;
	}
	best[nbest].node = -1;
	/* Save the result */
	apr_table_setn(r->notes, "proxy-context", (char *)best);
	return best;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Longest prefix match for contexts sometimes doesn't work #353

Our Problem

Looking at the code

Possible race condition?

MCM Status Servlet

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Longest prefix match for contexts sometimes doesn't work #353

Description

Our Problem

Looking at the code

Possible race condition?

MCM Status Servlet

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions