Skip to content

meta/redis: implement batch clone#6715

Open
vyalamar wants to merge 4 commits intojuicedata:mainfrom
vyalamar:revive/redis-batchclone
Open

meta/redis: implement batch clone#6715
vyalamar wants to merge 4 commits intojuicedata:mainfrom
vyalamar:revive/redis-batchclone

Conversation

@vyalamar
Copy link

@vyalamar vyalamar commented Mar 8, 2026

Summary

Implement Redis doBatchClone for recursive clone traversal.

The SQL backend already batches non-directory clones. This change adds the corresponding Redis path so recursive clone can process files and symlinks in bounded sub-batches instead of falling back to per-entry Redis transactions for every non-directory child.

Behavior

  • batch source reads with pipelined Redis operations
  • batch destination writes in a single transactional pipeline per sub-batch
  • aggregate sliceRef increments by slice key
  • return ENOTSUP for unsupported or uncertain cases so the caller can safely fall back to the existing per-entry path

Notes

  • hardlinks are not preserved as hardlinks; cloned entries become independent files with nlink = 1, matching existing clone behavior
  • the Redis batch path watches source inode/xattr keys and destination parent keys, but not chunk list keys
  • the existing fallback remains in the base clone path when Redis batch clone returns ENOTSUP

Tests

go test -run TestRedisBatchClone ./pkg/meta -count=1

Added coverage for:

  • shared chunk refs
  • mixed files and symlinks
  • space accounting
  • multi-chunk files
  • partial failure behavior

Benchmark

I posted the local benchmark method, commands, and raw output in a follow-up PR comment.

@CLAassistant
Copy link

CLAassistant commented Mar 8, 2026

CLA assistant check
All committers have signed the CLA.

@vyalamar vyalamar marked this pull request as ready for review March 8, 2026 10:31
@vyalamar
Copy link
Author

vyalamar commented Mar 8, 2026

Adding the exact local benchmark method and temporary harness used for the current numbers on the latest pushed branch state.

Method:

  • baseline checkout: /tmp/juicefs-bench-main from origin/main
  • current checkout: this branch at a0926162
  • Redis: 127.0.0.1:6379, DB 10
  • workload: create a source directory with 1000 files, write one 4 KiB slice per file, then call Meta.Clone(...) with clone concurrency 4
  • measurements: clone wall time and Redis total_commands_processed delta from INFO stats

Commands:

cd /tmp/juicefs-bench-main
go run ./.tmp_clonebench -redis-addr 127.0.0.1:6379 -db 10 -files 1000 -concurrency 4 -run-id baseline_1000_clean

cd /Users/vyalamar/Documents/juicefs
go run ./.tmp_clonebench -redis-addr 127.0.0.1:6379 -db 10 -files 1000 -concurrency 4 -run-id pr_1000_clean

Results:

  • baseline: clone_ms=472, cmd_delta=13056
  • current: clone_ms=87, cmd_delta=7050
  • time speedup: 5.43x
  • time reduction: 81.6%
  • Redis command reduction: 46.0%

@vyalamar
Copy link
Author

vyalamar commented Mar 8, 2026

These tests are passing.

``bash
go test -v -run TestRedisClient -count=1 -timeout 15m ./pkg/meta
go test -v -run TestRedisBatchClone -count=1 -timeout 10m ./pkg/meta


@vyalamar
Copy link
Author

vyalamar commented Mar 8, 2026

@zhijian-pro @eakman-datadog please help review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Redis backend implementation of doBatchClone so recursive clone traversal can clone non-directory entries in bounded batches using pipelined reads and transactional batched writes, matching the SQL backend’s batching behavior and reducing per-entry Redis transaction overhead.

Changes:

  • Implement (*redisMeta).doBatchClone with pipelined source reads and transactional batched destination writes, including aggregated slice refcount updates.
  • Add Redis-specific batch-clone tests covering shared chunk refs, mixed files/symlinks, space accounting, multi-chunk files, and partial-failure behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/meta/redis.go Implements Redis doBatchClone batching logic (read pipeline + TxPipelined writes + sliceRef aggregation).
pkg/meta/redis_batchclone_test.go Adds targeted tests for Redis batch clone correctness and edge cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@vyalamar
Copy link
Author

vyalamar commented Mar 9, 2026

~/Downloads > /tmp/juicefs_baseline_bench_clean.log
~/Downloads > cat /tmp/juicefs_baseline_bench_clean.log
+/tmp/run_baseline_bench_clean.sh:3> export GOCACHE=/tmp/juicefs-go-build-cache
+/tmp/run_baseline_bench_clean.sh:4> mkdir -p /tmp/juicefs-go-build-cache
+/tmp/run_baseline_bench_clean.sh:5> redis-cli FLUSHALL
OK
+/tmp/run_baseline_bench_clean.sh:6> cd /tmp/juicefs-bench-main
+/tmp/run_baseline_bench_clean.sh:7> go run ./.tmp_clonebench -redis-addr 127.0.0.1:6379 -db 10 -files 1000 -concurrency 4 -run-id baseline_1000_clean
2026/03/08 20:30:38.347614 juicefs[82188] <INFO>: Meta address: redis://127.0.0.1:6379/10 [NewClient@interface.go:627]
2026/03/08 20:30:38.350953 juicefs[82188] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly. [checkRedisInfo@info.go:84]
2026/03/08 20:30:38.351250 juicefs[82188] <INFO>: Ping redis latency: 192.041µs [checkServerConfig@redis.go:4301]
redis: 2026/03/08 20:30:39 redis.go:478: auto mode fallback: maintnotifications disabled due to handshake error: ERR unknown subcommand 'maint_notifications'. Try CLIENT HELP.
RESULT run_id=baseline_1000_clean arity=10 files=1000 clone_ms=472 count=1001 total=1 cmd_before=1475319 cmd_after=1488375 cmd_delta=13056 net_in_delta=526386 net_out_delta=335781
~/Downloads > cat /tmp/juicefs_pr_bench_clean.log
+/tmp/run_pr_bench_clean.sh:3> export GOCACHE=/tmp/juicefs-go-build-cache
+/tmp/run_pr_bench_clean.sh:4> mkdir -p /tmp/juicefs-go-build-cache
+/tmp/run_pr_bench_clean.sh:5> redis-cli FLUSHALL
OK
+/tmp/run_pr_bench_clean.sh:6> cd /Users/vyalamar/Documents/juicefs
+/tmp/run_pr_bench_clean.sh:7> go run ./.tmp_clonebench -redis-addr 127.0.0.1:6379 -db 10 -files 1000 -concurrency 4 -run-id pr_1000_clean
2026/03/08 20:31:05.853998 juicefs[82256] <INFO>: Meta address: redis://127.0.0.1:6379/10 [NewClient@interface.go:627]
2026/03/08 20:31:06.028092 juicefs[82256] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly. [checkRedisInfo@info.go:84]
2026/03/08 20:31:06.028736 juicefs[82256] <INFO>: Ping redis latency: 479.792µs [checkServerConfig@redis.go:4301]
redis: 2026/03/08 20:31:07 redis.go:478: auto mode fallback: maintnotifications disabled due to handshake error: ERR unknown subcommand 'maint_notifications'. Try CLIENT HELP.
RESULT run_id=pr_1000_clean arity=10 files=1000 clone_ms=87 count=1001 total=1 cmd_before=1506413 cmd_after=1513463 cmd_delta=7050 net_in_delta=420052 net_out_delta=285086
~/Downloads > cat /tmp/juicefs_baseline_bench_clean.log
+/tmp/run_baseline_bench_clean.sh:3> export GOCACHE=/tmp/juicefs-go-build-cache
+/tmp/run_baseline_bench_clean.sh:4> mkdir -p /tmp/juicefs-go-build-cache
+/tmp/run_baseline_bench_clean.sh:5> redis-cli FLUSHALL
OK
+/tmp/run_baseline_bench_clean.sh:6> cd /tmp/juicefs-bench-main
+/tmp/run_baseline_bench_clean.sh:7> go run ./.tmp_clonebench -redis-addr 127.0.0.1:6379 -db 10 -files 1000 -concurrency 4 -run-id baseline_1000_clean
2026/03/08 20:30:38.347614 juicefs[82188] <INFO>: Meta address: redis://127.0.0.1:6379/10 [NewClient@interface.go:627]
2026/03/08 20:30:38.350953 juicefs[82188] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly. [checkRedisInfo@info.go:84]
2026/03/08 20:30:38.351250 juicefs[82188] <INFO>: Ping redis latency: 192.041µs [checkServerConfig@redis.go:4301]
redis: 2026/03/08 20:30:39 redis.go:478: auto mode fallback: maintnotifications disabled due to handshake error: ERR unknown subcommand 'maint_notifications'. Try CLIENT HELP.
RESULT run_id=baseline_1000_clean arity=10 files=1000 clone_ms=472 count=1001 total=1 cmd_before=1475319 cmd_after=1488375 cmd_delta=13056 net_in_delta=526386 net_out_delta=335781

@vyalamar
Copy link
Author

vyalamar commented Mar 9, 2026

@jiefenghuang please help review.

@vyalamar
Copy link
Author

@jiefenghuang can you please help review.

if n == "." || n == ".." {
continue
}
if n == "file_A" || n == "file_B" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not major, but for good measure, I would remove this if-statement. It shouldn't bee needed with the skip on line 95.

Comment on lines +42 to +57
metaClient, err := newRedisMeta("redis", "127.0.0.1:6379/11", testConfig())
if err != nil {
t.Fatalf("create meta: %v", err)
}
m, ok := metaClient.(*redisMeta)
if !ok {
t.Fatalf("expected *redisMeta, got %T", metaClient)
}
defer m.Shutdown()

if err := m.Reset(); err != nil {
t.Fatalf("reset meta: %v", err)
}
if err := m.Init(testFormat(), true); err != nil {
t.Fatalf("init meta: %v", err)
}
Copy link
Contributor

@eakman-datadog eakman-datadog Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not major. For readability and for easier reuse, I would shove this stuff in a helper function assuming it's the same for each test function--looked like it was.

Copy link
Contributor

@eakman-datadog eakman-datadog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vyalamar did a first pass. Looks great! A few very minor comments. Going to take a second pass most likely tomorrow. But looks good so far.

@jiefenghuang
Copy link
Contributor

@vyalamar Sorry for the slow response. I’ll review it in the next few days.

@vyalamar
Copy link
Author

@vyalamar Sorry for the slow response. I’ll review it in the next few days.

sure Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants