-
Notifications
You must be signed in to change notification settings - Fork 201
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
Migrate Bitswap CID handling to py-cid for standards compliance and better DX
Summary
Current state: py-libp2p uses a custom CID implementation in libp2p/bitswap/cid.py that:
- Supports only SHA-256 hashing
- Has limited codec support (
CODEC_DAG_PB,CODEC_RAWonly) - Uses hex-only string representation (
cid.hex()), no multibase/base58 - Returns raw bytes; no proper CID objects, no CID string parsing
Goal: Adopt py-cid for full CID spec support: proper encoding (base58 for v0, multibase for v1), builder pattern, CIDSet, JSON/IPLD format, and path parsing (e.g. /ipfs/...).
Codebase locations where new CID features apply
Bitswap (primary)
| File | Current usage | py-cid opportunity |
|---|---|---|
libp2p/bitswap/cid.py |
compute_cid_v0/v1, verify_cid, get_cid_prefix, reconstruct_cid_from_prefix_and_data, cid_to_string (hex only) |
Replace with py-cid; add backward-compat wrappers; use Prefix, builder, make_cid, from_string |
libp2p/bitswap/block_store.py |
dict[bytes, bytes]; APIs take cid: bytes |
CID objects as keys; accept str/CID in get/put |
libp2p/bitswap/client.py |
Wantlist, prefix, verify, many cid.hex() logs |
CIDSet for wantlist; proper CID strings in logs; validation via py-cid |
libp2p/bitswap/dag.py |
compute_cid_v1, verify_cid, hex logging |
Builder for creation; CID objects and better logging |
libp2p/bitswap/dag_pb.py |
Link.cid: bytes |
Optional CID type; doc examples with py-cid |
libp2p/bitswap/__init__.py |
Exports from cid |
Update exports after migration |
Examples
examples/bitswap/bitswap.py— Currentlyroot_cid = bytes.fromhex(root_cid_hex)(line ~176). Support standard CID strings (base58, multibase,/ipfs/...) via py-cid parsing; replacecid.hex()in logs.
Tests
tests/core/bitswap/—test_cid.py,test_block_store.py,test_client.py,test_dag.py,test_dag_pb.py,test_messages.py,test_integration.py,test_protocol_versions.pyall usecompute_cid_v1,verify_cid,CODEC_*. Update to py-cid-backed API and add tests for new parsing/encoding.
Docs
docs/libp2p.bitswap.rst— Examples usingcompute_cid; update to show py-cid usage where relevant.
Host/DHT (secondary)
libp2p/abc.py—provide(cid: bytes),find_provider_iter(cid: bytes); could later accept CID strings or CID objects for UX.libp2p/kad_dht/provider_store.py— Content key asbytes(multihash); optional CID-aware helpers (e.g. accept CID string for provide/find_providers).
Code examples: improving the codebase with py-cid
1. Replace custom CID with py-cid (backward-compatible wrapper)
Before (current):
from libp2p.bitswap.cid import compute_cid_v1, verify_cid, CODEC_RAW
cid_bytes = compute_cid_v1(data, codec=CODEC_RAW)
logger.info(f"CID: {cid_bytes.hex()}")
is_valid = verify_cid(cid_bytes, data)After (with py-cid):
from cid import V1Builder, make_cid
# Backward-compatible wrapper (keep existing API)
def compute_cid_v1(data: bytes, codec: int = CODEC_RAW) -> bytes:
codec_name = _codec_int_to_name(codec) # 0x55 -> "raw", 0x70 -> "dag-pb"
builder = V1Builder(codec=codec_name, mh_type="sha2-256")
cid = builder.sum(data)
return cid.buffer
# Usage: proper CID string in logs
cid_bytes = compute_cid_v1(data, codec=CODEC_RAW)
cid = make_cid(cid_bytes)
logger.info(f"CID: {cid}") # Properly encoded (e.g. base32 for v1)
logger.info(f"CID: {cid.loggable()['cid']}") # For structured logging2. Block store accepting CID object or string
Before:
async def get_block(self, cid: bytes) -> bytes | None:
return self._blocks.get(cid)After:
from cid import make_cid # and CIDv0, CIDv1 if typing
async def get_block(self, cid: bytes | str | CIDv0 | CIDv1) -> bytes | None:
if not isinstance(cid, (bytes, bytearray)):
cid = make_cid(cid)
if hasattr(cid, 'buffer'):
cid = cid.buffer # normalize to bytes for dict lookup, or use CID as key
return self._blocks.get(cid)3. Example CLI: parse standard CID strings
Before (hex only):
root_cid = bytes.fromhex(root_cid_hex) # Only supports hexAfter (base58, multibase, /ipfs/ paths):
from cid import make_cid
cid = make_cid(root_cid_hex) # Accepts base58 (Qm...), multibase (bafy...), /ipfs/Qm...
root_cid = cid.buffer # bytes for existing APIs4. Logging: use proper CID representation
Before:
logger.info(f"Root CID: {root_cid.hex()}")
logger.info(f"Block {cid.hex()[:16]}...")After:
from cid import make_cid
cid_obj = make_cid(cid)
logger.info(f"Root CID: {cid_obj}")
logger.info(f"Block {cid_obj}...")Migration strategy (brief)
- Add
py-ciddependency (e.g. inpyproject.toml). - Introduce a compatibility layer in
libp2p/bitswap/cid.py(py-cid inside, same public API initially). - Migrate logging and examples first (low risk).
- Then block store and client (CID objects / CIDSet where beneficial).
- Update tests to use py-cid-backed API and add tests for new parsing/encoding.
- Deprecate old helpers in a later release if desired.
References
- py-cid repository
- CID specification
- Full analysis and more examples: py-libp2p CID Status and py-cid Integration Opportunities #1174
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed