Description
A new system call called "cachestat" will be introduced in Linux 6.5 (currently RC5), which enables user-space applications to retrieve detailed page cache statistics for a specific file. This syscall provides valuable insights for making informed decisions about file operations and resource management.
While Linux already has the "mincore" syscall (which we have already in x/sys/unix and runtime) for checking page residency in memory, "cachestat" offers more extensive page cache statistics, aiming to enhance scalability. The syscall's output includes information about cached pages, dirty pages, pages marked for writeback, evicted pages, and recently evicted pages.
This new syscall seems to be quite a lot faster than mincore in some benchmarks shared here.
Some possible use cases for it:
- Databases can make query decisions based on the in-memory cache state of indexes. Offering visibility into the writeback algorithm to diagnose performance issues effectively.
- Implementing workload-aware writeback pacing, estimating IO managed by page cache for smarter synchronization and batching.
- Providing a memory usage computation for large files and directories, akin to the disk usage analysis performed by the 'du' tool.
Copied from Phoronix
struct cachestat_range {
__u64 off;
__u64 len;
};
struct cachestat {
__u64 nr_cache;
__u64 nr_dirty;
__u64 nr_writeback;
__u64 nr_evicted;
__u64 nr_recently_evicted;
}
asmlinkage long sys_cachestat(unsigned int fd,
struct cachestat_range __user *cstat_range,
struct cachestat __user *cstat, unsigned int flags);