Description
Preconditions (*)
- Magento 2.3.4-p1
- PWA Studio 12.1.0 (potentially optional)
- Setup Magento with Redis Sessions
Helpful Insights
- New Relic APM Transaction Traces
- Tested tech stack version (though this may not be relevant)
- PHP 7.4
- MySQL 8.0
- Varnish 6.4
- Redis 5
- Elasticsearch 7.12
Steps to reproduce (*)
Configure Magento
- Ensure that default configuration of Redis session locking is enabled by setting disable_locking to 0:
cat app/etc/env.php | grep disable_locking
bin/magento -n setup:config:set --session-save-redis-disable-locking=0
Load the PWA storefront home page
- Load home page (example: https://syseng-seldon.cldev.io/) which will make multiple graphql calls to the MAGENTO_BACKED_URL
- In Chrome Developer Tools on the Network tab you can use a filter to only display requests for the specific domain that are graphql calls
domain:syseng-seldon.cldev.io -media -static graphql
- Multiple (15) /graphql requests are all sent concurrently to the configured MAGENTO_BACKEND_URL of the PWA app around the same time
Simulate the PWA storefront requests on ANY Magento site
- Create a static html file (example: graphqlrepeat.html) in /pub/ of the magento root containing fetch
<!DOCTYPE html>
<html lang="en">
<head><title>JavaScript HTTP Requests</title></head>
<body><script>
var domain = "syseng-seldon.cldev.io";
fetch("https://" + domain + "/graphql?query=query+GetStoreConfigForCarouselEE%7BstoreConfig%7Bid+product_url_suffix+magento_wishlist_general_is_enabled+enable_multiple_wishlists+__typename%7D%7D&operationName=GetStoreConfigForCarouselEE&variables=%7B%7D", {
"headers": {
"accept": "*/*",
"accept-language": "en-US,en;q=0.9",
"authorization": "",
"cache-control": "no-cache",
"content-type": "application/json",
"pragma": "no-cache",
"sec-ch-ua": "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"96\", \"Google Chrome\";v=\"96\"",
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": "\"macOS\"",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"store": "default",
"x-magento-cache-id": "null"
},
"referrer": "https://" + domain + "/",
"referrerPolicy": "strict-origin-when-cross-origin",
"body": null,
"method": "GET",
"mode": "cors",
"credentials": "include"
});
</script></body>
</html>
-
Replace the domain variable with the domain of the magento site you're testing
-
Load the HTML page containing js that will fetch multiple graphql requests concurrently
-
You can open Chrome Developer Tools on the Network tab and where you see one of the GraphQL calls, you can right click on the request line and "Copy as fetch" to get the javascript fetch statement for making that same request in an html page js inline script.
-
You can do this to re-create all the requests for a specific page by adding each fetch() call to the html file, or duplicate the exact same fetch() graphql request (40x) to reproduce the session locking behavior being seen here.
Simulate SOME of the expected behavior by globally disabling redis session locking on ALL requests
- Simulate SOME of the expected behavior by globally disabling redis session locking on ALL requests
cat app/etc/env.php | grep disable_locking
bin/magento -n setup:config:set --session-save-redis-disable-locking=1
Expected result (*)
When concurrent GraphQL GET requests are made from a visitor, requests should be able to complete in parallel to keep page load time minimal, while still allowing requests that require session locking (where important session data is being written) in order to prevent some other request from overwriting data in the session. Important data getting overwritten in a session can negatively affect critical application behavior.
In simulating some of the correct behavior with Redis session locking disabled, I was able to load the home page and all 15 graphql requests within a window of 600 ms. There are other pages that may contain many more GraphQL requests where this can be even more important to have concurrent requests complete in parallel.
Home Page GraphQL Measurements - with Session Locking Disabled
First started at: 393ms
Last started at: 586ms
Last duration was: 401ms
Last ended at: 586ms + 401ms = 987ms
Total window of execution: 586ms + 401ms - 393ms = 594ms
GraphQL Requests Total: 15
Page Finished 1020ms
Looking at a waterfall of how the concurrent GraphQL requests complete, can reveal that multiple requests are completing at or within close to the same time.
Actual result (*)
With Redis session locking enabled which is the default and recommended safe behavior for redis session configs, the concurrent requests queue up, each waiting in sequence for a redis session lock to clear before the next request is able to complete.
Home Page GraphQL Measurements - with Session Locking Disabled
First started at: 313ms
Last started at: 323ms
Last duration was: 2800ms
Last ended at: 323ms + 2800ms = 3123ms
Total window of execution: 323ms + 2800ms - 313ms = 2810ms
GraphQL Requests Total: 15
Page Finished 3120ms
This makes it look like several of the graphql requests are taking an excessive amount of time to complete, while others completed in less time, but while these requests started close to the same time, they spent a lot of time waiting for session locking to clear, resulting in requests being completed in sequence rather than in parallel.
Cause of Behavior
With the PWA Studio (Client Side React App) running as the frontend "storefront" of Magento and sending GraphQL calls to the backend of Magento there are different behavioral patterns in how requests are sent to and processed by the web server from how we have been seeing interactions when using the Magento "theme" as the frontend.
The PWA sends multiple concurrent AJAX calls to /graphql
end points on the web server, and these requests are all processed by the Magento backend PHP application. It turns out that Magento architecture has it creating a session and locking that session regardless of the type of request or response being issued.
Some /graphql
requests are able to be cached by Varnish and because cached requests in Varnish do not execute code and therefore do not interact with redis sessions. This can mask the fact that each request that hits the backend will lock the session for the request and cause requests to be completed in sequence. Many of the graphql requests are not able to be cached in Varnish at this time.
We have seen this behavior in the Magento "theme" frontend also, but there are typically only a few requests that typically happen concurrently, and we see these results show up in New Relic on transaction traces a lot for AJAX calls that are likely to be running concurrently with other requests with the same session.
The problem is not just isolated to AJAX calls like customer/section/load
on the Magento "theme" frontend, it also tends to occasionally interfere with other requests on product pages, or AJAX calls in the checkout. In the example below the unlucky visitor ended up waiting 8.8s to get an initial response for loading the page instead of what should have taken 300ms because they had some other request that was locking their session. It doesn't happen often, but it's not pleasant when it does.
The big behavioral difference is that most of the requests in the Magento "theme" frontend do NOT happen concurrently, so session locking on a couple concurrent requests doesn't affect things overall that much, and it tends to be more of an exception and most requests (even customer/section/load AJAX requests) are not delayed waiting for session locks to release on average, so this known deficiency is only causing mild problems with the Magento "theme" frontend overall, and limiting concurrent AJAX calls is a way of working around it.
This session locking causes lots of problems with a PWA by delaying concurrent AJAX calls causing them to wait in line and finish sequentially as the locks they are waiting on clear. This ends up making requests randomly appear like they are taking a really long time to complete while under the hood they are primarily just waiting in line for the session to be available so it can lock the session and complete it's request. This makes it very hard to identify from the client side where the problem is as it relates to concurrent requests for the same session and running the requests independently results in a very fast response.
While it's possible to disable session locking, this can cause serious issues with requests that make changes to session data and overwrite each other, and that can result in serious problems on checkout where payments are captured but orders fail to be saved in Magento resulting sometimes in a customer re-submitting the order and getting charged multiple times if they are able to get the order to complete successfully.
The library used by Magento (v2.3.4) is the latest available (colinmollenhour/php-redis-session-abstract v1.4.4) and it does support the ability to process read only requests without locking a session when the global config is set to utilize session locking... it just so happens that Magento does not utilize this functionality and opens all session connections without specifying if it needs to write to the session or not, thus defaulting to write mode and session locking.
- https://github.com/magento/magento2/blob/2.4.3-p1/composer.lock#L321
- https://github.com/colinmollenhour/php-redis-session-abstract/releases
- https://github.com/colinmollenhour/php-redis-session-abstract/blob/v1.4.4/src/Cm/RedisSession/Handler.php#L434
- https://github.com/magento/magento2/blob/2.4.3-p1/lib/internal/Magento/Framework/Session/SaveHandler/Redis.php#L63
- https://github.com/colinmollenhour/php-redis-session-abstract/blob/v1.4.4/src/Cm/RedisSession/Handler.php#L263
Possible Solution
Requests that come into Magento as GET
requests are typically expected to return generic publicly identical response that can be cached by Varnish, while POST
requests are explicitly not allowed to be cached by Varnish as they are expected to contain visitor/session/customer private data in the responses. Important write operations should typically happen in POST
requests, and those types of requests would be expected to need and utilize session locking, while GET
requests would generally be for returning generic data that is not visitor/session/customer specific and the same response would be returned to all requests and not involve any kind of write to the session. If any session write activity were to occur on GET
requests, it would likely be to update the timestamp of the most recent request to indicate the session is still active and has not expired yet (this is an assumption that should be verified).
- Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.