Skip to content

Confusion/Issue using Workspace-snapshot.sh with Pruned Snapshot (pruneancient) #417

@hakiran16

Description

@hakiran16

Hello BSC Snapshot Maintainers,

I am trying to set up a pruned BSC mainnet node on Ubuntu (using ~3.65TB NVMe SSD) with the latest official geth client binary (1.5.9 downloaded from bnb-chain/bsc releases). I am using the Workspace-snapshot.sh script from this repository but encountering issues specifically with the pruned snapshots.

Environment:

OS: Ubuntu 22.04/24.04 (User can specify)
Client: geth-bin (from bnb-chain/bsc) version 1.5.9
Disk: ~3.65 TB NVMe SSD
Target Snapshot Type: Pruned Mainnet (latest available, seems to be pruneancient)
Workspace-snapshot.sh: Latest version downloaded from this repo on Apr 10/11, 2025.
Snapshot Identifiers (from README around Apr 10/11, 2025):

Full: mainnet-geth-pbss-20250404 (~3TB)
Pruned: mainnet-geth-pbss-20250404-pruneancient (~900GB, requires >=v1.5.5)
Attempts and Issues:

Attempt 1: Using -p flag with -pruneancient identifier:

Command:
bash ./fetch-snapshot.sh -d -e -c -p --auto-delete -D "/home/bsc_data/downloads" -E "/home/bsc_data/node/geth/chaindata" "mainnet-geth-pbss-20250404-pruneancient"
Result: Failed immediately with a 404 Error trying to download the metadata CSV file. It seemed to incorrectly generate the URL by duplicating the suffix: .../mainnet-geth-pbss-20250404-pruneancient-pruneancient.csv
Error Log Snippet:
try to download mainnet-geth-pbss-20250404-pruneancient-pruneancient.csv ...
... ERROR 404: Not Found.
Attempt 2: Using -p flag with BASE identifier (as per script examples):

Command:
`Bash

bash ./fetch-snapshot.sh -d -e -c -p --auto-delete -D "/home/bsc_data/downloads" -E "/home/bsc_data/node/geth/chaindata" "mainnet-geth-pbss-20250404"`

Result (on first try): The script ran, but downloaded and extracted the Full Snapshot data, resulting in ~3.5TB in the chaindata directory (du -sh confirmed) and filling up the 3.65TB disk, causing the node to fail later. This contradicts the expectation of getting the ~900GB pruned data when using the -p flag. (Note: I am currently re-running this exact command one last time while carefully monitoring disk usage).
Attempt 3: Manual Download & Extract of pruneancient files:

Manually downloaded the two files associated with pruneancient (...base-48055283.tar.lz4 and ...blocks-pruneancient-47965283.tar.lz4) using their direct URLs.
Manually extracted using lz4 -d ... | tar xvf - ... (base first, then blocks-pruneancient into the same target directory). Extraction completed without errors, du -sh showed ~970GB.
Started node using this data:
`Bash

In /home/bsc_data/node

nohup ./geth-bin --config ../config/config.toml --datadir ./geth --cache 18000 --txlookuplimit 0 --http --http.addr --http.port 8545 --ws --ws.addr --ws.port 8546 &> bsc_node.log &`

Result: Node started, RPC was responsive, but eth_syncing showed currentBlock: 0x0, highestBlock: 0x0, and all state counters zero initially. After a while, highestBlock updated correctly (~48M) but currentBlock remained very low (~157k) with startingBlock: 0x0 and zero state counters, indicating it defaulted to syncing from genesis and failed to use the manually extracted pruned data. eth_getBlockByNumber failed for blocks like 10M and 47M (returning null).
Questions:

What is the correct combination of snapshot identifier and flags (specifically -p) to reliably use Workspace-snapshot.sh to download and extract the pruned (pruneancient) snapshot?
Is there a known issue with the script's handling of the -p flag or the pruneancient identifier for the mainnet-geth-pbss-20250404 version?
Is the expected final size of the chaindata directory after using the pruneancient snapshot indeed around ~900GB - 1TB?
Could the manual extraction failure indicate an issue with the pruneancient snapshot files themselves, even if they extract without tool errors?
Any clarification or guidance would be greatly appreciated. Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions