Skip to content

feat: Add firewood as a database #1029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

alarso16
Copy link
Contributor

@alarso16 alarso16 commented Jul 2, 2025

Why this should be merged

This adds much of the Firewood state database implementation, without adding it as a configurable option in coreth.

How this works

This reimplements everything with core/state/database.go by adding the Firewood backend for both reading and hashing from the tries (the Trie implementation) and for writing changes (the triedb.Backend implementation).

Note: starting this PR, Windows can no longer be built on coreth.

How this was tested

This is broken off from #959, which is currently bootstrapping to mainnet and Fuji. Additionally, there is a small amount of fuzz testing to ensure that hashing, reading, and committing works as expected.

Need to be documented?

No.

Need to update RELEASES.md?

No - this code is not yet available to consumers.

@alarso16 alarso16 force-pushed the alarso16/firewood-trie branch from 0506448 to b9ac3fa Compare July 2, 2025 17:51
@alarso16 alarso16 marked this pull request as ready for review July 2, 2025 17:54
@alarso16 alarso16 requested a review from a team as a code owner July 2, 2025 17:54
@alarso16 alarso16 added enhancement New feature or request go Pull requests that update go code labels Jul 2, 2025
case *firewood.AccountTrie:
return t.Copy()
case *firewood.StorageTrie:
return nil // The storage trie just wraps the account trie, so we don't need to copy it separately.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not we still call t.Copy (or t.AccountTrie.Copy) though? (bc AccountTrie is not an interface but struct)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't - all storage tries just point to the same account trie. If we copy the account trie (all the real information) and then copy the storage trie (just the pointer), then how will they know what account trie to use? They really should be re-allocated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason there's a storage trie at all in the first place is this interface - because we don't need to explicitly hash the storage tries (firewood does that internally), we only need the 1-tier trie

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see but I sense there is an invariant that this depends on:
an AccountTrie for the related StorageTrie will always be copied. If that so we should comment it somewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments (and an explicit method).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could still use the Copy method and return nil, but I think this is more clear.

)

var (
_ Database = (*firewoodAccessorDb)(nil)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this check is needed. Since the first member of FirewoodAccessorDb is Database already, isn't it 100% guaranteed to be true?

Is this just a guard to prevent someone from removing Database from firewoodAccessorDb`?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much. My understanding is that this is good practice in general.

Comment on lines 67 to 68
// It checks the rawdb package for HashDB or PathDB schemes.
// If it's neither of those, it checks to see if the string is set to FirewoodScheme.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments don't seem to match the code, which checks for FirewoodScheme first, and then just delegates to ParseStateScheme, which might have it's own restrictions, possibly obsoleting this comment.

Suggest renaming to CheckForMismatchedStorageScheme or explaining why this is a bad name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a perfectly reasonable name, but I think it should attempt to match the eth-specific name, like our other overloaded methods from libevm in these custom* packages. It does exactly the same thing as ParseStateScheme, except also checks for Firewood

Comment on lines 93 to 96
// If the account has been deleted, we should return nil
if val, exists := a.dirtyKeys[string(accountKey)]; exists && len(val) == 0 {
return nil, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we can do this once we have the accountKey on line 88

return nil, nil
}
// Decode and return the updated storage value
_, decoded, _, err := rlp.Split(updateValue)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check if rlp.Split returns a list and always assume it's a byte array. Not sure if this is okay or not, but it does seem strange.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The byte array is provided. This is following the example in libevm's trie/secure_trie.go

a.dirtyKeys[string(combinedKey[:])] = data
a.updateKeys = append(a.updateKeys, combinedKey[:])
a.updateValues = append(a.updateValues, data)
a.hasChanges = true // Mark that there are changes to commit
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't this be inferred from a.DirtyKeys.len()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a "small" optimization to avoid rehashing if we haven't made any changes. This is called 2-3 times every block without any additional changes, so this is just to avoid the call. As Cey noted, the dirty keys are never reset, and instead pile on top of each other. If we ever implement the AddToProposal API (which creates a proposal and commits on top of it in memory, tracking to ensure we drop properly), then we could just use the length

// We already have this proposal, but should create a new context with the correct hash.
// This solves the case of a unique block hash, but the same underlying proposal.
if _, exists := existing.Parent.Hashes[parentHash]; exists {
log.Debug("firewood: proposal already exists, updating hash", "root", root.Hex(), "parent", parentRoot.Hex(), "block", block, "hash", hash.Hex())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want a metric here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but it really shouldn't ever happen. This is just technically permitted, so it should be enforced

Comment on lines 198 to 199
db.proposalLock.Lock()
defer db.proposalLock.Unlock()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this too early? Can't we wait until we called ExtractTrieDBUpdatePayload?

Also when is this released? Can't we release it while firewood is processing the proposal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move it a couple lines later.
However, we have to wait to release this until the end of the function. If we are proposing from a proposal during a commit, we have a race, so until we address that deeper in Firewood, we need to exclusively guarantee that only one of these operations is happening

@alarso16 alarso16 requested review from rkuris and ceyonur July 7, 2025 20:31
Co-authored-by: Ceyhun Onur <[email protected]>
Signed-off-by: Austin Larson <[email protected]>
Copy link
Collaborator

@ceyonur ceyonur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few nits and more cleanups + Initialized seems always returning true even for "uninitialized" dbs.


type Config struct {
FilePath string // Path to the directory where the Firewood database will be stored
FileName string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use filepath as full filepath rather than taking filename+filepath?


type Database struct {
fwDisk *ffi.Database // The underlying Firewood database, used for storing proposals and revisions.
ethdb ethdb.Database // The underlying disk database, used for storing genesis and the path.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems this can be removed

config = Defaults
}

fwConfig, path, err := validateConfig(config)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this more looks like a getPath

func (db *Database) Initialized(root common.Hash) bool {
// If the database is open, then the file is necessarily initialized.
// TODO: This would be more informative to have a flag to check that we've state synced later.
return db.fwDisk != nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we check fw.Root() != common.Hash?, this looks like we actually always return Initialized=true since fwDisk is never nil after New but if we haven't commited anything fw.Root still can be empty (and actually not initialized)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request go Pull requests that update go code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants