core, triedb/pathdb: final integration (snapshot integration pt 5) #30661

rjl493456442 · 2024-10-23T07:15:49Z

In this pull request, snapshot generation in the pathdb is ported from the legacy
state snapshot. Additionally, in path mode, the legacy state snapshot is now
handled by the pathdb-based snapshot implementation.

Note: the existing snapshot data will be re-generated regardless of it was fully constructed or not

core/blockchain.go

core/blockchain_repair_test.go

core/blockchain_snapshot_test.go

core/state/database.go

core/state/statedb_test.go

tests/block_test_util.go

MariusVanDerWijden · 2024-12-30T12:04:30Z

triedb/pathdb/database.go

@@ -362,6 +430,7 @@ func (db *Database) Enable(root common.Hash) error {
 	// reset the persistent state id back to zero.
 	batch := db.diskdb.NewBatch()
 	rawdb.DeleteTrieJournal(batch)
+	rawdb.DeleteSnapshotRoot(batch)


Could you please explain why we need to delete the snapshot root here, I don't really understand

This database entry indicates that the snapshot data on disk is associated with the specified state root.

The function Enable(root) means the trie node data on disk associates with the given state root, any leftover snapshot data should be purged and regenerated.

Deleting the SnapshotRoot entry signifies that all leftover storage data is considered obsolete.

MariusVanDerWijden · 2024-12-30T12:20:17Z

triedb/pathdb/flush.go

+			}
+		}
+	}
+	for addrHash, storages := range storageData {


I'm wondering whether it makes a difference in the speed of flushing if we sort the keys before trying to insert them, iirc some databases handle ordered key insertions better than random keys. This way we could also break on the first key that exceeds the genMarker

The random insertion here won't significantly impact the performance, as the data receiver is a in-memory batch. But I would love to measure the performance difference later.

triedb/pathdb/context.go

MariusVanDerWijden · 2024-12-30T12:48:20Z

triedb/pathdb/generate.go

+// into two parts.
+func splitMarker(marker []byte) ([]byte, []byte) {
+	var accMarker []byte
+	if len(marker) > 0 { // []byte{} is the start, use nil for that


Do we need to check here that len(marker) >= HashLength?

the generation marker could only be:

[]byte{}

[]byte{common.AddressLength}

[]byte{common.AddressLength + common.HashLength}

If the marker is corrupted, then the system could be very wrong and I don't mind panic here.

MariusVanDerWijden · 2024-12-30T13:37:38Z

Took a first look now and it looks pretty good so far. I added a few questions/things I didn't understand while reviewing. The biggest changes (generator, generator_test, context) are modified copies from the snapshot package

rjl493456442 · 2025-05-14T13:29:11Z

triedb/pathdb/disklayer.go

+		if dl.generator != nil {
+			dl.generator.stop()
+			progress = dl.generator.progressMarker()
+			log.Info("Terminated snapshot generation")


MariusVanDerWijden

LGTM

fatal error: concurrent map iteration and map write goroutine 3503687081 [running]: internal/runtime/maps.fatal({0x1c0e410?, 0xc0d9bbe000?}) runtime/panic.go:1053 +0x18 internal/runtime/maps.(*Iter).Next(0xc003b47280?) internal/runtime/maps/table.go:683 +0x86 github.com/ethereum/go-ethereum/triedb/pathdb.(*stateSet).accountList.Keys[...].func1() maps/iter.go:27 +0x71 slices.AppendSeq[...](...) slices/iter.go:50 slices.Collect[...](...) slices/iter.go:58 slices.SortedFunc[...](0xc003b473d0, 0x1f96be0) slices/iter.go:72 +0xd0 github.com/ethereum/go-ethereum/triedb/pathdb.(*stateSet).accountList(0xc003d26120) github.com/ethereum/go-ethereum/triedb/pathdb/states.go:179 +0x13a github.com/ethereum/go-ethereum/triedb/pathdb.newDiffAccountIterator({0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...)

MariusVanDerWijden

LGTM

…thereum#30661) In this pull request, snapshot generation in pathdb has been ported from the legacy state snapshot implementation. Additionally, when running in path mode, legacy state snapshot data is now managed by the pathdb based snapshot logic. Note: Existing snapshot data will be re-generated, regardless of whether it was previously fully constructed.

rjl493456442 requested review from karalabe and holiman as code owners October 23, 2024 07:15

rjl493456442 force-pushed the snapshot-integration-p5 branch from d249035 to 133900e Compare October 23, 2024 07:24

holiman changed the title ~~Snapshot integration p5~~ core, triedb/pathdb: final integration (snapshot integration pt 5) Oct 23, 2024

MariusVanDerWijden reviewed Oct 28, 2024

View reviewed changes

core/blockchain.go Outdated Show resolved Hide resolved

holiman mentioned this pull request Nov 8, 2024

all: unify the trie database and snapshot in path mode #30159

Closed

fjl added the pbss-archive label Nov 28, 2024

rjl493456442 force-pushed the snapshot-integration-p5 branch 5 times, most recently from b879386 to 985a84e Compare December 23, 2024 03:47

rjl493456442 mentioned this pull request Dec 30, 2024

triedb/pathdb: introduce lookup structure to optimize state access #30971

Open