Skip to content

Use the NODEFS filesystem in statfs when available #23912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

bgrgicak
Copy link
Contributor

@bgrgicak bgrgicak commented Mar 13, 2025

🚧 WIP

statfs always returns default values with NODEFS, because the default file system is MEMFS which doesn't support statfs.

In this PR we detect if NODEFS is available and use it as the node when calling statfsNode.
This ensures statfsNode has access to NODEFS and the requested path.

statfs is only supported in NODEFS and NODERAWFS, but this change won't affect NODERAWFS because when used it's the default file system and it has it's own implementation of statfsNode.

Testing instructions

I tested this by running ./test/runner other.test_unistd_fstatfs, but that only ensures we are getting values, the test doesn't check for real values.

@hoodmane
Copy link
Collaborator

Can you give an example of the behavior that you are trying to fix? test_unistd_fstatfs runs in NODEFS as well as NODERAWFS so there is basic coverage for this there. A good start point would be to update that test to add whatever additional requirements you're after and show it failing on NODEFS but succeeding on NODERAWFS.

@hoodmane
Copy link
Collaborator

the test doesn't check for real values

Right, if you could update test/unistd/fstatfs.c to have assertions about the values returned that'd be great. You may need something like:

#if NODEFS || NODERAWFS
// In nodefs and noderawfs assert that the values returned are correct.
// In wasmfs and memfs, `fstatfs` is a stub so maybe the values there are expected to be wrong?
#endif

@bgrgicak
Copy link
Contributor Author

Can you give an example of the behavior that you are trying to fix?

I'm running into this issue in WordPress Playground. When PHP calls disk_total_space, it uses statfs to fetch stats, but we always get the default value.

Right, if you could update test/unistd/fstatfs.c to have assertions about the values returned that'd be great. You may need something like:

I would love to add a test, but I'm not sure how to do it in Emscripten.
I tested it in WordPress Playground by comparing the result of statfs of statfsSync('/') (from node:fs).

In the test each FS will return the default values. I checked it by adding assert(buf.f_blocks == 1000000); the test.

To properly test NODEFS and NODERAWFS I would need to have real OS data in the response, but I'm not sure how test that.

test_unistd_fstatfs runs in NODEFS as well as NODERAWFS so there is basic coverage for this there. A good start point would be to update that test to add whatever additional requirements you're after and show it failing on NODEFS but succeeding on NODERAWFS.

The issue doesn't come up in NODERAWFS because when used it's the default FS, unlike NODEFS which isn't the default.

@bgrgicak
Copy link
Contributor Author

I see ./test/runner other.test_unistd_fstatfs_nodefs is failing, this is probably a good place for me to look at. 👀

@bgrgicak
Copy link
Contributor Author

I definitely need to research this more.
Test number of block in statfs adds tests that pass on main and confirm that fstatfs and statfs can return real values, but on this branch, fstatfs(f, &buf) starts failing.

@bgrgicak
Copy link
Contributor Author

I will get back to this PR next week.
First I want to demonstrate in a test how NODEFS is inaccessible when not mounted and how this breaks statfs.

After I have the test, I will explore how it could be improved.

src/lib/libfs.js Outdated
// file system which doesn't have a statfs function.
// To ensure statfsNode has access to the statfs function, we pass in
// the NODEFS object.
if (this.filesystems.NODEFS) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

statfs should figure out which filesystem the path is in and then return the filesystem info for that filesystem.

i.e. if path is in the MEMFS you should get the memfs info. If path is a NODEFS you should get the node FS info.

The presense of NODEFS alone should not be a factor, right? The question should be "is path part of a node FS mount" I think,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this for a week and I'm not sure how to handle it, so let me start with our usecase and why the above approach doesn't work for us.

WordPress Playground mounts all files into MEMFS because some files like the WordPress code are being downloaded from remote URL while a plugin might be imported from a local folder. So, the resulting codebase that we are running inside Emscription is composed out of multiple data sources.

Curently when the code asks for available disk space using statfs("/") it get's the default values because / is a MEMFS directory.

I was able to work around this by overriding statfs during onRuntimeInitialized.

PHPRuntime.FS.root.node_ops = {
	...PHPRuntime.FS.root.node_ops,
	statfs: PHPRuntime.FS.filesystems.NODEFS.node_ops.statfs,
};
PHPRuntime.FS.root.mount.opts.root = '/';	

Checking what FS the path belongs to, makes sense overall, but in some cases like the above it would be great if we could override it and force a FS to be used.

Specifically for statfs this makes sense to force NODEFS, because it only works in NODEFS (and NODERAWFS).If the requested path is available in that NODEFS it would be reasonable to override MEMFS and return the result from NODEFS.node_ops.statfs.statfs.

What do you think? How would you approach our usecase?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curently when the code asks for available disk space using statfs("/") it get's the default values because / is a MEMFS directory.

Which code is doing this? Asking for the available disk space by using statfs("/") seem odd since normally the root FS has very little space in it right? Where is that code that is doing this? Why is it using /?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But won't the statfs from node give very different results if you stat / vs /tmp vs /home, for example. What is the code in question interested in exactly?

} catch (e) {
// Older versions of Node don't support statfsSync
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only needed for the one test I would just do this in fstatfs.c itself. You can just do it in an EM_ASM or EM_JS block on startup?

@bgrgicak
Copy link
Contributor Author

I found a way to work around this issue in our project.

The root cause of the issue is that if I run disk_total_space('/') it calls FS.statfs('/').
Because we use MEMFS+NODEFS, the root node (/) is a MEMFS node which by default doesn't have statfs in FS.root.node_ops.

To work around this I used the onRuntimeInitialized function from Emscripten to add statfs from NODEFS to FS.root.node_ops and set FS.root.mount.opts.root to . because it was undefined, but required by the statfsNode function.

The resulting solution looks similar to this:

onRuntimeInitialized() {
	FS.root.node_ops = {
		...FS.root.node_ops,
		statfs: FS.filesystems.NODEFS.node_ops.statfs,
	};
	FS.root.mount.opts.root = '.';
}

@bgrgicak
Copy link
Contributor Author

The way Emscripten handles the root path (/) makes sense, because the node is MEMFS, so I don't think that there is anything that we need to change in Emscripten.

It would be reasonable to always force the use of NODEFS.statfs in the statfsNode function because statfs only works in NODEFS/NODERAWFS, but that would make statfs an exception, so it's probably better to avoid it.

Thank you for helping me @hoodmane @sbc100

@bgrgicak bgrgicak closed this Mar 28, 2025
@sbc100
Copy link
Collaborator

sbc100 commented Mar 28, 2025

But why you expect disk_total_space('/') to return the amount of free space on anything but the root FS?

The root fs is normally not very big, and so its not a good proxy for how much free space there is, for example, in writable locations such as /home or /tmp.

@bgrgicak
Copy link
Contributor Author

But why you expect disk_total_space('/') to return the amount of free space on anything but the root FS?

This might be specific to our use case. Playground allows you to run WordPress sites in WASM and it needs to work just as if you installed Apache/Nginx on your machine.
In the case of statfs, this means that when WordPress asks for the disk space of /, it's actually referring to the host machine root and not the VFS root.

This specific case could be resolved by us using NODERAWFS, but we need MEMFS for some Playground features, so NODERAWFS isn't an option for us.

The root fs is normally not very big, and so its not a good proxy for how much free space there is, for example, in writable locations such as /home or /tmp.

In our case, fs.statfsSync('/') returns the actual host machine disk size which is what PHP expects.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 31, 2025

In our case, fs.statfsSync('/') returns the actual host machine disk size which is what PHP expects.

But you are aware that it returns the info on the root partition only right? This partition is not normally writable by normal users whos data often lives in a separate /home partition? (Or /tmp partition). I'm not sure why wordpress would be interested in the space available on the root partition, as this info is not very useful.

@bgrgicak
Copy link
Contributor Author

bgrgicak commented Apr 7, 2025

But you are aware that it returns the info on the root partition only right?

How do you mean info on the root partition only?
When I run this locally using Node, I get the disk size of my machine, which is exactly what we need.

From what I understand fs.statfsSync returns the disk space of the underlying volume, so both /home and /tmp will return the same result in case they are on the same volume.

@sbc100
Copy link
Collaborator

sbc100 commented Apr 7, 2025

How do you mean info on the root partition only? When I run this locally using Node, I get the disk size of my machine, which is exactly what we need.

From what I understand fs.statfsSync returns the disk space of the underlying volume, so both /home and /tmp will return the same result in case they are on the same volume.

Right, but for most/many UNIX/linux machines the root volume / and the /home and /tmp are all separate filesystems. Its more portable to be more specific. For example, if you are are interested in how much space is available in /home/foo/myproject is it not a good idea to do fs.statfsSync('/'), much better to do fs.statfsSync('/home/foo/myproject').

@bgrgicak
Copy link
Contributor Author

bgrgicak commented Apr 8, 2025

That's ok. /home/foo/myproject would in my case provide the disk space of the /home volume.
My only issue was with / because Emscripten would correctly detect it as a MEMFS directory and return default values.

This is why my initial suggestion was to use the NODEFS statfs implementation by default and fallback on MEMFS only if the directory doesn't exist in the underlying FS.

bgrgicak added a commit to WordPress/wordpress-playground that referenced this pull request May 7, 2025
## Motivation for the change, related issues

This PR adds `statfs` support to the Emscripten root file system to
enable PHP-wasm Node to get real filesystem stats for the `/` path.

Currently, in PHP-wasm Node, PHP functions like `disk_total_space('/')`
return the default hardcoded value for MEMFS instead of the actual disk
space.

This happens because Emscripten automatically detects the filesystem for
a given path, and because the root path always uses the MEMFS
filesystem, Emscripten will use the MEMFS `statfs` implementation.

PHP-wasm Node includes two Emscripten filesystems, MEMFS and NODEFS.
MEMFS doesn't have access to the operating system's filesystem, so it
can only return a hardcoded value.
NODEFS has access to the OS filesystem through the `fs` module, and
specifically for `statfs` it uses `fs.statfsSync` to get real data.

## Implementation details

This PR uses Emscripten's `onRuntimeInitialized` event to override the
`statfs` implementation in the Emscripten root FS with the NODEFS
implementation of `statfs`.

It also defines the `FS.root.mount.opts.root` path as `.` to ensure the
root node has a path value defined.
Otherwise when we call `FS.statfs('/')` it would use the root node's
root path which is `undefined` and this would throw an error because it
would call `fs.statfsSync(undefined)` in the NODEFS implementation of
`statfs`.

I explored fixing this [issue in
Emscripten](emscripten-core/emscripten#23912),
but I wasn't able to find a working solution there, so I ended up
patching it in Playground.

## Testing Instructions (or ideally a Blueprint)

- CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants