Skip to content

Support multiple workers for NODEFS /wordpress mounts #2231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 230 commits into from
Jun 22, 2025

Conversation

brandonpayton
Copy link
Member

@brandonpayton brandonpayton commented Jun 4, 2025

Motivation for the change, related issues

We want to support concurrent php-wasm workers, but concurrent workers can corrupt the SQLite DB without file locking.

Emscripten's libc provides dummy fcntl() and flock() lock implementations, so the API calls succeed without any locking taking place.

The PR implements:

  • Advisory file locking with custom fcntl() and flock() functions.
    • Currently this is JSPI-only, but we plan to implement broader support in a follow-up PR.
  • An experimental multiple php-wasm worker feature that can be enabled when a real, shared directory is mounted as the /wordpress dir.

Implementation details

This is a big PR that:

  • Adds experimental support for multiple php-wasm workers
    • Adds an --experimentalMultiWorker arg
    • Can be used when JSPI is enabled and a real FS dir is mounted as the /wordpress directory.
    • Default worker count is CPU_COUNT - 1.
    • Specific number can be passed like --experimentalMultiWorker=7.
    • Requests are naively routed to worker with fewest in-progress requests.
    • Emscripten mods:
      • We override NODEFS.createNode() to add an isSharedFS flag to all NODEFS nodes. This way we can tell whether file-locking is needed and possible for an FS node, even if wrapped with PROXYFS.
      • We override FS.hashAddNode() to skip caching of isSharedFS nodes. Otherwise, multiple workers will have separate NODEFS caches and can have different, conflicting views of the underlying FS. For example, one may believe a file exists based on its cache, even though the underlying file was deleted by another worker.
  • Supports both whole-file locks via flock() and byte-range locks via fcntl().
    • Our libsqlite3 build uses fcntl() for locking.
    • WordPress itself uses the PHP flock() function a small amount (and that function unsurprisingly appears to connect to the platform flock() implementation)
    • Before granting a lock to php-wasm, we make sure we hold a native file lock (with the host OS or platform) that is sufficient for the requested lock. We require an exclusive native lock in order to grant any exclusive php-wasm locks. If there are only shared php-wasm locks, we require a native shared lock.
    • If we cannot obtain a comparable native lock, we deny the php-wasm lock.
    • When a lock is released, we review the remaining php-wasm locks and downgrade the native file lock if its current level is no longer needed. For example, if we hold an exclusive whole-file lock at the OS level but only shared php-wasm locks remain, we downgrade the OS lock to shared.
    • When a file descriptor is closed, all its locks are released.
    • When a PHP request exits, all its locks are released.
  • Implements a getpid() override that actually returns the process ID we assign via the @php-wasm/node loader. (Emscripten's getpid() always returns 42 😅)
  • Adds an --experimentalTrace arg that enables detail tracing messages. Currently, these messages are just for logging, but as long as there are no issues with the trace facility, we could add others.
    • In JS, the trace function in Emscripten JS library is a printf style function, js_wasm_trace(format, ...args). The purpose of using the printf style is that no formatting has to take place when js_wasm_trace() is called unless tracing is really enabled.
    • In C, the trace function in php_wasm.c is wasm_trace(). It relays its messages to js_wasm_trace().

Testing Instructions (or ideally a Blueprint)

  • CI
  • Manual test
    • Try all major CLI commands. Bonus: Convert to automated test.
    • TODO: instructions to run in multi-worker mode

brandonpayton and others added 30 commits March 12, 2025 14:33
Adjusts a slight mistake in the condition:

```ts
// This returns true if activeSute is undefined since
// undefined is different than "none"
activeSite?.metadata.storage !== 'none'
```

The goal is to only return `true` from selectSitesLoaded() if we have an
active, non-temporary site
…sions (#83)

## Motivation for the change, related issues

Our changelog workflow is currently broken because it is missing extra
GitHub token secrets. If we add these secrets, we will have to update
them occasionally. I think we may be able to do without and would like
to try that.

In addition, we need to backfill changelog entries while omitting links
to PRs (since we cannot link to our private PRs).

## Implementation details

This PR:
- Updates our changelog update workflow to persist credentials in the
local git config and attempts to follow the example of pushing a commit
after actions/checkout:

https://github.com/actions/checkout?tab=readme-ov-file#push-a-commit-using-the-built-in-token
- Omits PR links from new changelog entries unless they point to
github.com/WordPress/wordpress-playground
- Backfills changelog entries for v1.0.25 to v1.0.29.

## Testing Instructions (or ideally a Blueprint)

- Temporarily disable protections on running the workflow
- Make the workflow target the PR branch
- Manually run the workflow and see if it commits a changelog update to
the PR branch
- If successful, remove the changelog commit from the PR branch
- Re-enable protections
- Re-enable checking out trunk only
- Merge
## Motivation for the change, related issues

Testing Playground CLI with bun (via `npx nx dev playground-cli`) is
much faster than building and running via node (via `npx nx start
playground-cli`). This PR adds the option to run `@php-wasm/cli` the
same way.

## Implementation details

This PR adds a `dev` target to the php-wasm-cli project. The `dev`
target runs `@php-wasm/cli` using `bun --watch`.

## Testing Instructions (or ideally a Blueprint)

- CI
- Manually try the new target with `npx nx dev php-wasm-cli "-r 'echo
\"huzzah\n\";'"`
- Note: For some reason, extra args need to be quoted because they are
not escaped when forwarded, at least with our current version of nx and
nx:run-commands.
Also add support for Node.js workers
@brandonpayton
Copy link
Member Author

brandonpayton commented Jun 21, 2025

Would it be any possible to port it as a C library or so? Or is fs-ext our only option? Either way, with typescript loader for node we're good.

I think the os-lock package will do fine. It doesn't crash bun and does real OS locking for nix and Windows. The API is promised, so I'll need to adjust the declared FileLockManager interface to make the declare methods async. Should be no problem.

The os-lock package seemed to work, but I started seeing failures and stuttering behavior after switching to it. I also discovered that it resolved the Bun crash but then exposed another issue:
Bun's magic doesn't include resolving Vite ?worker&url imports, and those imports are the only way I've been able to keep Vite from inlining the worker script as a data URI starting with data:video/mp2t; (the .ts file extension is also associated with being an "MPEG transport stream" 🤷‍♂️ ).

So I just punted and switched test-built-npm-packages from Bun to Node. It looks like Playground CLI will not be Bun-friendly, at least for the near future (cc @bgrgicak).

Because no changes were required, the lock manager API remains unpromised for now.

If we want folks to be able to run production builds of Playground CLI with Bun, it probably wouldn't be too hard to roll our own addon if we can't find a suitable fs-ext alternative that runs on Bun. The addon is a simple lock/unlock passthrough to platform APIs.

@brandonpayton
Copy link
Member Author

I need to step away for the day but plan to resume in the morning. Will see what kind of compromises we can make to keep the big merges moving.

@brandonpayton
Copy link
Member Author

I'm making some progress with the test-built-npm-packages tests.

I've been able to run the CommonJS tests with the CLI server cleaning up and not causing hanging, but the tests for ES modules have been apparently conflicting with vitest and tinypool. IIUC, when a Playground worker is terminated, tinypool detects this as an unexpected exit.

To avoid this issue and complexity, I'm working on just using the Node.js test framework which should be simpler and contain fewer surprises. It is working for a single PHP version but crashes when trying multiple PHP versions.

Maybe testing built packages with ES modules for a single PHP version is good enough to merge this PR, especially since the CommonJS tests are testing all supported PHP versions. We could continue debugging this afterward.

Will push my changes after a bit more troubleshooting.

@brandonpayton
Copy link
Member Author

brandonpayton commented Jun 22, 2025

The built npm package tests are passing because I switched the ES module tests to a manual test runner that runs one test per process. Without that, the second invocation of runCLI() crashes the test process in both Vitest (regardless of configured pool type) and the builtin Node test runner.

Using a manual test runner with a one-test-per-process approach seems a bit silly to me. But it works around the issue of Workers conflicting with the test runners.

What is left:

  1. Some very recent bug is breaking multi-worker setup (or maybe just my test script)
  • I will track this down first.
  • We can also add a multi-worker test to the "unbuilt Playground CLI tests" and maybe to the built package tests. But these could be done in a follow-up PR.
  1. The asyncify sqlite3 tests that are consistently failing
  2. The Playground CLI automated tests that are currently disabled entirely because Vitest doesn't work well with embedded the playground Workers.

@adamziel, if you are still interested in helping with this PR, 2 and 3 are up for grabs at the moment. We could actually fix these things or punt for a short time to enable merging XDebug and Blueprints v2 for Playground.

@brandonpayton
Copy link
Member Author

brandonpayton commented Jun 22, 2025

I also haven't started looking at review comments because I've been digging into test failures. IIRC @adamziel said they weren't blockers, but I still intend to address them, even if in a follow-up PR.

@brandonpayton
Copy link
Member Author

brandonpayton commented Jun 22, 2025

Recent work:

  • To address issues in the test-built-npm-packages tests:
    • Added an async disposal chain from RunCLIServer -> PHPWorker -> PHPRequestHandler -> PHPProcessManager
    • Reworked the ES module tests to run a separate process per test.
      • Uses a custom runner script that invokes the tests using the Node.js builtin test framework
      • No test cases were passing with Vitest. There appeared to be errors related to Vitest's use of tinypool, regardless of what pool type was selected.
      • I don't like the custom runner, but it appears to work so we don't have to give up the tests while merging this, XDebug, and the Blueprints v2 PRs.
      • We could do more debugging to find the issue with repeated calls to runCLI() in the same process, but I suspect we have more important things to work on. The most annoying thing about the issue is that it makes deeper unit testing harder. But we could write more scripts to do integration testing using the actual CLI program rather than the runCLI() function.
  • Extended the unbuilt-Playground-CLI tests to test single-worker Asyncify, single-worker JSPI, and multi-worker JSPI runs.
    • It seems like we might have some kind of race going on with multi-worker init, and @adamziel's comment about where we might use bootPHPRequestHandler() instead of bootWordPress() may be a hint about the reason.
    • That said, the unbuilt CLI tests run the same multi-worker invocation twice and pass, so I'm not sure what may be going on yet.
  • Started skipping the asyncify-sqlite3 tests because they appear to be leading to GH Actions crashes.
  • Left the cli-run.spec.ts tests disabled. There appear to be conflicts between Vitest and the PHP worker threads. We'll need to find a solution to re-enable these.

@adamziel I think we are in a place where we could merge this and then create a follow-up PR to address review comments. I also have an idea about what might be leading to Asyncify fd_close() crashes, so we can see about that as well. What do you think? Should we merge this, merge XDebug and Blueprints v2 support, and refine after?

@brandonpayton
Copy link
Member Author

Edited the previous comment to add this line:

  • Left the cli-run.spec.ts tests disabled. There appear to be conflicts between Vitest and the PHP worker threads. We'll need to find a solution to re-enable these.

@adamziel
Copy link
Collaborator

Let's merge :)

@adamziel
Copy link
Collaborator

adamziel commented Jun 22, 2025

Also, if vitest is so problematic, we could move to another library entirely (in a follow-up pr)

@brandonpayton brandonpayton merged commit ff727fb into trunk Jun 22, 2025
73 of 75 checks passed
@brandonpayton brandonpayton deleted the add-fcntl-for-nodejs branch June 22, 2025 17:07
adamziel added a commit that referenced this pull request Jun 25, 2025
## Motivation for the change, related issues

#2231 overrides
`FS.hashAddNode` with `function hashAddNodeIfNotSharedFS(node)` where
additional logic applies if `is_shared_fs_node(node)` is true. Only
NODEFS nodes were supposed to be considered as coming from a shared fs.
Unfortunately, the internal logic of `is_shared_fs_node()` also returned
true for MEMFS nodes. This caused a FS error 44 for the following
operation where `runtime2` attempts to create a directory in a
`/wordpress` directory mounted from `runtime1`:

```ts
import { loadNodeRuntime } from "@php-wasm/node";
import { getLoadedRuntime } from "@php-wasm/universal";

const opts = {
	emscriptenOptions: { ENV: { DOCROOT: '/wordpress' } }
};
const runtime1 = getLoadedRuntime(await loadNodeRuntime('8.3', opts));
runtime1.FS.mkdir("/wordpress");

const runtime2 = getLoadedRuntime(await loadNodeRuntime('8.3', opts));
runtime2.FS.mkdir("/wordpress");

runtime2.FS.mount(
	runtime2.PROXYFS,
	{ root: '/wordpress', fs: runtime1.FS },
	'/wordpress'
);

// This works:
// runtime1.FS.mkdir("/wordpress/wp-content");

// This doesn't:
runtime2.FS.mkdir("/wordpress/wp-content");
```

Specifically, the FS error 44 was triggered inside `is_shared_fs_node()`
when calling NODEFS operations on these non-NODEFS nodes.

## Implementation details

Adds a check confirming the shared node comes from NODEFS.

## Testing Instructions (or ideally a Blueprint)

* Confirm the reproduction above works without errors.
* Once #2285 lands, we'll be able to add a unit test

cc @brandonpayton
Copy link
Member Author

@brandonpayton brandonpayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I grabbed the unresolved concerns from this PR and added them to the follow up issue here:
#2293

Planning to work on that issue next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Inbox
Development

Successfully merging this pull request may close these issues.

2 participants