Skip to content

Migrate Bitswap CID handling to py-cid #1181

@acul71

Description

@acul71

Migrate Bitswap CID handling to py-cid for standards compliance and better DX

Summary

Current state: py-libp2p uses a custom CID implementation in libp2p/bitswap/cid.py that:

  • Supports only SHA-256 hashing
  • Has limited codec support (CODEC_DAG_PB, CODEC_RAW only)
  • Uses hex-only string representation (cid.hex()), no multibase/base58
  • Returns raw bytes; no proper CID objects, no CID string parsing

Goal: Adopt py-cid for full CID spec support: proper encoding (base58 for v0, multibase for v1), builder pattern, CIDSet, JSON/IPLD format, and path parsing (e.g. /ipfs/...).


Codebase locations where new CID features apply

Bitswap (primary)

File Current usage py-cid opportunity
libp2p/bitswap/cid.py compute_cid_v0/v1, verify_cid, get_cid_prefix, reconstruct_cid_from_prefix_and_data, cid_to_string (hex only) Replace with py-cid; add backward-compat wrappers; use Prefix, builder, make_cid, from_string
libp2p/bitswap/block_store.py dict[bytes, bytes]; APIs take cid: bytes CID objects as keys; accept str/CID in get/put
libp2p/bitswap/client.py Wantlist, prefix, verify, many cid.hex() logs CIDSet for wantlist; proper CID strings in logs; validation via py-cid
libp2p/bitswap/dag.py compute_cid_v1, verify_cid, hex logging Builder for creation; CID objects and better logging
libp2p/bitswap/dag_pb.py Link.cid: bytes Optional CID type; doc examples with py-cid
libp2p/bitswap/__init__.py Exports from cid Update exports after migration

Examples

  • examples/bitswap/bitswap.py — Currently root_cid = bytes.fromhex(root_cid_hex) (line ~176). Support standard CID strings (base58, multibase, /ipfs/...) via py-cid parsing; replace cid.hex() in logs.

Tests

  • tests/core/bitswap/test_cid.py, test_block_store.py, test_client.py, test_dag.py, test_dag_pb.py, test_messages.py, test_integration.py, test_protocol_versions.py all use compute_cid_v1, verify_cid, CODEC_*. Update to py-cid-backed API and add tests for new parsing/encoding.

Docs

  • docs/libp2p.bitswap.rst — Examples using compute_cid; update to show py-cid usage where relevant.

Host/DHT (secondary)

  • libp2p/abc.pyprovide(cid: bytes), find_provider_iter(cid: bytes); could later accept CID strings or CID objects for UX.
  • libp2p/kad_dht/provider_store.py — Content key as bytes (multihash); optional CID-aware helpers (e.g. accept CID string for provide/find_providers).

Code examples: improving the codebase with py-cid

1. Replace custom CID with py-cid (backward-compatible wrapper)

Before (current):

from libp2p.bitswap.cid import compute_cid_v1, verify_cid, CODEC_RAW

cid_bytes = compute_cid_v1(data, codec=CODEC_RAW)
logger.info(f"CID: {cid_bytes.hex()}")
is_valid = verify_cid(cid_bytes, data)

After (with py-cid):

from cid import V1Builder, make_cid

# Backward-compatible wrapper (keep existing API)
def compute_cid_v1(data: bytes, codec: int = CODEC_RAW) -> bytes:
    codec_name = _codec_int_to_name(codec)  # 0x55 -> "raw", 0x70 -> "dag-pb"
    builder = V1Builder(codec=codec_name, mh_type="sha2-256")
    cid = builder.sum(data)
    return cid.buffer

# Usage: proper CID string in logs
cid_bytes = compute_cid_v1(data, codec=CODEC_RAW)
cid = make_cid(cid_bytes)
logger.info(f"CID: {cid}")  # Properly encoded (e.g. base32 for v1)
logger.info(f"CID: {cid.loggable()['cid']}")  # For structured logging

2. Block store accepting CID object or string

Before:

async def get_block(self, cid: bytes) -> bytes | None:
    return self._blocks.get(cid)

After:

from cid import make_cid  # and CIDv0, CIDv1 if typing

async def get_block(self, cid: bytes | str | CIDv0 | CIDv1) -> bytes | None:
    if not isinstance(cid, (bytes, bytearray)):
        cid = make_cid(cid)
    if hasattr(cid, 'buffer'):
        cid = cid.buffer  # normalize to bytes for dict lookup, or use CID as key
    return self._blocks.get(cid)

3. Example CLI: parse standard CID strings

Before (hex only):

root_cid = bytes.fromhex(root_cid_hex)  # Only supports hex

After (base58, multibase, /ipfs/ paths):

from cid import make_cid

cid = make_cid(root_cid_hex)  # Accepts base58 (Qm...), multibase (bafy...), /ipfs/Qm...
root_cid = cid.buffer  # bytes for existing APIs

4. Logging: use proper CID representation

Before:

logger.info(f"Root CID: {root_cid.hex()}")
logger.info(f"Block {cid.hex()[:16]}...")

After:

from cid import make_cid

cid_obj = make_cid(cid)
logger.info(f"Root CID: {cid_obj}")
logger.info(f"Block {cid_obj}...")

Migration strategy (brief)

  1. Add py-cid dependency (e.g. in pyproject.toml).
  2. Introduce a compatibility layer in libp2p/bitswap/cid.py (py-cid inside, same public API initially).
  3. Migrate logging and examples first (low risk).
  4. Then block store and client (CID objects / CIDSet where beneficial).
  5. Update tests to use py-cid-backed API and add tests for new parsing/encoding.
  6. Deprecate old helpers in a later release if desired.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions