Skip to content

Commit 4d5965d

Browse files
committed
ZIL: "crash" the ZIL if the pool suspends during fallback
If the ZIL runs into trouble, it calls txg_wait_synced(), which blocks on suspend. We want it to not block on suspend, instead returning an error. On the surface, this is simple: change all calls to txg_wait_synced_flags(TXG_WAIT_SUSPEND), and then thread the error return back to the zil_commit() caller. Handling suspension means returning an error to all commit waiters. This is relatively straightforward, as zil_commit_waiter_t already has zcw_zio_error to hold the write IO error, which signals a fallback to txg_wait_synced_flags(TXG_WAIT_SUSPEND), which will fail, and so the waiter can now return an error from zil_commit(). However, commit waiters are normally signalled when their associated write (LWB) completes. If the pool has suspended, those IOs may not return for some time, or maybe not at all. We still want to signal those waiters so they can return from zil_commit(). We have a list of those in-flight LWBs on zl_lwb_list, so we can run through those, detach them and signal them. The LWB itself is still in-flight, but no longer has attached waiters, so when it returns there will be nothing to do. (As an aside, ITXs can also supply completion callbacks, which are called when they are destroyed. These are directly connected to LWBs though, so are passed the error code and destroyed there too). At this point, all ZIL waiters have been ejected, so we only have to consider the internal state. We potentially still have ITXs that have not been committed, LWBs still open, and LWBs in-flight. The on-disk ZIL is in an unknown state; some writes may have been written but not returned to us. We really can't rely on any of it; the best thing to do is abandon it entirely and start over when the pool returns to service. But, since we may have IO out that won't return until the pool resumes, we need something for it to return to. The simplest solution I could find, implemented here, is to "crash" the ZIL: accept no new ITXs, make no further updates, and let it empty out on its normal schedule, that is, as txgs complete and zil_sync() and zil_clean() are called. We set a "restart txg" to four txgs in the future (syncing + TXG_SIZE), at which point all the internal state will have been cleared out, and the ZIL can resume operation (handled at the top of zil_clean()). This commit adds zil_crash(), which handles all of the above: - sets the restart txg - capture and signal all waiters - zero the header zil_crash() is called when txg_wait_synced_flags(TXG_WAIT_SUSPEND) returns because the pool suspended (ESHUTDOWN). The rest of the commit is just threading the errors through, and related housekeeping. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]>
1 parent 25bb5b1 commit 4d5965d

File tree

5 files changed

+366
-31
lines changed

5 files changed

+366
-31
lines changed

cmd/zilstat.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ cols = {
4747
"cec": [5, 1000, "zil_commit_error_count"],
4848
"csc": [5, 1000, "zil_commit_stall_count"],
4949
"cSc": [5, 1000, "zil_commit_suspend_count"],
50+
"cCc": [5, 1000, "zil_commit_crash_count"],
5051
"ic": [5, 1000, "zil_itx_count"],
5152
"iic": [5, 1000, "zil_itx_indirect_count"],
5253
"iib": [5, 1024, "zil_itx_indirect_bytes"],

include/sys/zil.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -498,10 +498,13 @@ typedef struct zil_stats {
498498
* (see zil_commit_writer_stall())
499499
* - suspend: ZIL suspended
500500
* (see zil_commit(), zil_get_commit_list())
501+
* - crash: ZIL crashed
502+
* (see zil_crash(), zil_commit(), ...)
501503
*/
502504
kstat_named_t zil_commit_error_count;
503505
kstat_named_t zil_commit_stall_count;
504506
kstat_named_t zil_commit_suspend_count;
507+
kstat_named_t zil_commit_crash_count;
505508

506509
/*
507510
* Number of transactions (reads, writes, renames, etc.)
@@ -549,6 +552,7 @@ typedef struct zil_sums {
549552
wmsum_t zil_commit_error_count;
550553
wmsum_t zil_commit_stall_count;
551554
wmsum_t zil_commit_suspend_count;
555+
wmsum_t zil_commit_crash_count;
552556
wmsum_t zil_itx_count;
553557
wmsum_t zil_itx_indirect_count;
554558
wmsum_t zil_itx_indirect_bytes;

include/sys/zil_impl.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,7 @@ struct zilog {
221221
uint64_t zl_cur_left; /* current burst remaining size */
222222
uint64_t zl_cur_max; /* biggest record in current burst */
223223
list_t zl_lwb_list; /* in-flight log write list */
224+
list_t zl_lwb_crash_list; /* log writes in-flight at crash */
224225
avl_tree_t zl_bp_tree; /* track bps during log parse */
225226
clock_t zl_replay_time; /* lbolt of when replay started */
226227
uint64_t zl_replay_blks; /* number of log blocks replayed */
@@ -245,6 +246,9 @@ struct zilog {
245246
*/
246247
uint64_t zl_max_block_size;
247248

249+
/* After crash, txg to restart zil */
250+
uint64_t zl_restart_txg;
251+
248252
/* Pointer for per dataset zil sums */
249253
zil_sums_t *zl_sums;
250254
};

module/zfs/dataset_kstats.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ static dataset_kstat_values_t empty_dataset_kstats = {
4444
{ "zil_commit_error_count", KSTAT_DATA_UINT64 },
4545
{ "zil_commit_stall_count", KSTAT_DATA_UINT64 },
4646
{ "zil_commit_suspend_count", KSTAT_DATA_UINT64 },
47+
{ "zil_commit_crash_count", KSTAT_DATA_UINT64 },
4748
{ "zil_itx_count", KSTAT_DATA_UINT64 },
4849
{ "zil_itx_indirect_count", KSTAT_DATA_UINT64 },
4950
{ "zil_itx_indirect_bytes", KSTAT_DATA_UINT64 },

0 commit comments

Comments
 (0)