Skip to content

Commit ed4a438

Browse files
dtkavclaude
andcommitted
feat: Migrate from bincode to CBOR for extensible data persistence
Implements Architecture Change Request for CBOR-based data format with extensible metadata support. This enables tool readability and future enhancements without breaking changes. - **Extension Trait**: CborBTreeMapExt for clean BTreeMap ↔ CBOR conversion - **Metadata Container**: YSweetData struct with version, timestamps, extensible metadata - **Backward Compatibility**: Auto-migration from bincode to CBOR format - **Enhanced SyncKv**: Tracks created_at timestamp and preserves metadata across persist cycles - `set_metadata()`: Replace all document metadata - `get_metadata()`: Retrieve current metadata - `update_metadata()`: Update specific metadata fields - Format detection: Try CBOR first, fallback to bincode - Lazy migration: Existing bincode data migrates on next persist - Zero data loss with comprehensive error handling - **Tool Readability**: CBOR maps inspectable by external tools - **Extensibility**: Rich metadata without breaking existing deployments - **Interoperability**: Cross-language data access via standard CBOR format - **Future-Proof**: Version management and structured metadata framework Includes comprehensive test suite with 8 tests covering: - CBOR serialization roundtrips - Bincode → CBOR migration - Metadata persistence and API - Backward compatibility verification 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 799e219 commit ed4a438

11 files changed

Lines changed: 1541 additions & 33 deletions

File tree

Lines changed: 328 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,328 @@
1+
# Architecture Change Request: Migration from Bincode to CBOR for Data Persistence
2+
3+
**Status**: Draft
4+
**Author**: System Analysis
5+
**Date**: 2025-08-28
6+
**Related Components**: `y-sweet-core::sync_kv`, `y-sweet-core::store`
7+
8+
## Executive Summary
9+
10+
This ACR proposes migrating Y-Sweet's data persistence layer from bincode to CBOR (Concise Binary Object Representation) format to enable extensible metadata storage and improve interoperability. The change primarily affects the `SyncKv` implementation in `y-sweet-core/src/sync_kv.rs` and introduces an extension trait for clean BTreeMap serialization.
11+
12+
## Background
13+
14+
### Current Implementation
15+
16+
Y-Sweet currently uses bincode for serializing key-value data in the `SyncKv` component:
17+
18+
- **Location**: `y-sweet-core/src/sync_kv.rs:33,61`
19+
- **Current format**: `bincode::serialize()` and `bincode::deserialize()`
20+
- **Data structure**: `BTreeMap<Vec<u8>, Vec<u8>>` serialized as binary blob
21+
- **Storage key**: `{doc_id}/data.ysweet`
22+
23+
### Analysis of Current Data Flow
24+
25+
1. **Persistence**: `SyncKv::persist()` serializes in-memory `BTreeMap` using bincode
26+
2. **Loading**: `SyncKv::new()` deserializes bincode data back to `BTreeMap`
27+
3. **Storage**: Data stored via `Store::set()` and `Store::get()` as `Vec<u8>`
28+
4. **Backends**: Supports filesystem and S3-compatible storage
29+
30+
## Problem Statement
31+
32+
The current bincode implementation has several limitations:
33+
34+
1. **Lack of Extensibility**: No mechanism to add metadata without breaking existing data
35+
2. **Schema Dependency**: Bincode requires compile-time knowledge of data structure
36+
3. **Limited Interoperability**: Bincode is Rust-specific, limiting cross-language access
37+
4. **No Version Management**: Unable to handle schema evolution gracefully
38+
5. **Missing Metadata**: Cannot store creation timestamps, version info, or other metadata
39+
6. **Tool Visibility**: External tools cannot inspect or modify the BTreeMap structure
40+
41+
## Proposed Solution
42+
43+
### CBOR Format Benefits
44+
45+
CBOR provides several advantages over bincode:
46+
47+
- **IETF Standard**: RFC 8949 specification with broad language support
48+
- **Self-Describing**: Schema-less decoding capability
49+
- **Extensibility**: Built-in tag system for metadata and future expansion
50+
- **Interoperability**: Cross-language compatibility and tool readability
51+
- **Metadata Support**: Rich type information embedded in format
52+
53+
### Architecture Design
54+
55+
The solution uses a two-layer approach:
56+
57+
1. **Extension Trait**: Handles BTreeMap ↔ CBOR map conversion
58+
2. **Metadata Wrapper**: SyncKv manages versioning, timestamps, and extensible metadata
59+
60+
#### Layer 1: BTreeMap CBOR Extension
61+
62+
```rust
63+
trait CborBTreeMapExt {
64+
fn to_cbor_value(&self) -> ciborium::value::Value;
65+
fn from_cbor_value(value: ciborium::value::Value) -> Result<Self, Error>;
66+
}
67+
68+
impl CborBTreeMapExt for BTreeMap<Vec<u8>, Vec<u8>> {
69+
fn to_cbor_value(&self) -> ciborium::value::Value {
70+
let cbor_map = self.iter()
71+
.map(|(k, v)| (
72+
ciborium::value::Value::Bytes(k.clone()),
73+
ciborium::value::Value::Bytes(v.clone())
74+
))
75+
.collect();
76+
ciborium::value::Value::Map(cbor_map)
77+
}
78+
79+
fn from_cbor_value(value: ciborium::value::Value) -> Result<Self, Error> {
80+
if let ciborium::value::Value::Map(cbor_map) = value {
81+
let mut btree = BTreeMap::new();
82+
for (k, v) in cbor_map {
83+
if let (Value::Bytes(key), Value::Bytes(val)) = (k, v) {
84+
btree.insert(key, val);
85+
}
86+
}
87+
Ok(btree)
88+
} else {
89+
Err("Expected CBOR map")
90+
}
91+
}
92+
}
93+
```
94+
95+
#### Layer 2: Metadata Container
96+
97+
```rust
98+
#[derive(Serialize, Deserialize)]
99+
struct YSweetData {
100+
/// Format version for future compatibility
101+
version: u32,
102+
103+
/// Creation timestamp (milliseconds since epoch)
104+
created_at: u64,
105+
106+
/// Last modified timestamp (milliseconds since epoch)
107+
modified_at: u64,
108+
109+
/// Optional metadata for future extensions
110+
metadata: Option<BTreeMap<String, ciborium::value::Value>>,
111+
112+
/// The actual key-value data as CBOR map
113+
#[serde(with = "cbor_btreemap")]
114+
data: BTreeMap<Vec<u8>, Vec<u8>>,
115+
}
116+
```
117+
118+
### Benefits of This Approach
119+
120+
1. **Clean Separation**: Extension trait handles serialization, SyncKv handles metadata
121+
2. **Tool Readability**: CBOR maps can be inspected by generic CBOR tools
122+
3. **Performance**: Maintains BTreeMap's O(log n) operations in memory
123+
4. **Extensibility**: Easy to add new metadata without breaking changes
124+
5. **Reusability**: Extension trait can be used elsewhere in the codebase
125+
126+
## Implementation Plan
127+
128+
#### Phase 1: Add Extension Trait
129+
- Create `CborBTreeMapExt` trait in `y-sweet-core/src/sync_kv.rs`
130+
- Implement CBOR map conversion functions
131+
- Add comprehensive unit tests
132+
133+
#### Phase 2: Implement Metadata Container
134+
- Define `YSweetData` struct with version and metadata fields
135+
- Update `SyncKv::persist()` to use new format
136+
- Update `SyncKv::new()` with migration logic
137+
138+
#### Phase 3: Migration Strategy
139+
- Implement format detection (CBOR vs bincode)
140+
- Auto-migrate existing bincode data on first write
141+
- Add migration logging and metrics
142+
143+
#### Phase 4: Testing & Cleanup
144+
- Integration tests with real-world data
145+
- Performance benchmarking
146+
- Remove bincode dependency after migration period
147+
148+
### Code Changes Required
149+
150+
#### Modified Files
151+
- `y-sweet-core/src/sync_kv.rs`: Core implementation
152+
- `y-sweet-core/Cargo.toml`: Dependencies (ciborium already present)
153+
154+
#### New Implementation
155+
156+
```rust
157+
impl SyncKv {
158+
async fn persist(&self) -> Result<(), Box<dyn std::error::Error>> {
159+
let data = self.data.lock().unwrap();
160+
let now = chrono::Utc::now().timestamp_millis() as u64;
161+
162+
let y_data = YSweetData {
163+
version: 1,
164+
created_at: self.created_at.unwrap_or(now),
165+
modified_at: now,
166+
metadata: None,
167+
data: data.clone(),
168+
};
169+
170+
let bytes = ciborium::ser::to_vec(&y_data)?;
171+
tracing::info!(size = bytes.len(), "Persisting CBOR snapshot");
172+
173+
if let Some(store) = &self.store {
174+
store.set(&self.key, bytes).await?;
175+
}
176+
self.dirty.store(false, Ordering::Relaxed);
177+
Ok(())
178+
}
179+
180+
async fn new<Callback: Fn() + Send + Sync + 'static>(
181+
store: Option<Arc<Box<dyn Store>>>,
182+
key: &str,
183+
callback: Callback,
184+
) -> Result<Self> {
185+
let key = format!("{}/data.ysweet", key);
186+
let mut created_at = None;
187+
188+
let data = if let Some(store) = &store {
189+
if let Some(snapshot) = store.get(&key).await.context("Failed to get from store.")? {
190+
tracing::info!(size = snapshot.len(), "Loading snapshot");
191+
192+
// Try CBOR format first
193+
match ciborium::de::from_slice::<YSweetData>(&snapshot) {
194+
Ok(y_data) => {
195+
created_at = Some(y_data.created_at);
196+
tracing::info!("Loaded CBOR format data");
197+
y_data.data
198+
},
199+
Err(_) => {
200+
// Fallback to bincode for backward compatibility
201+
tracing::info!("Falling back to bincode format, will migrate on next persist");
202+
bincode::deserialize(&snapshot).context("Failed to deserialize.")?
203+
}
204+
}
205+
} else {
206+
BTreeMap::new()
207+
}
208+
} else {
209+
BTreeMap::new()
210+
};
211+
212+
Ok(Self {
213+
data: Arc::new(Mutex::new(data)),
214+
store,
215+
key,
216+
dirty: AtomicBool::new(false),
217+
dirty_callback: Box::new(callback),
218+
created_at,
219+
})
220+
}
221+
}
222+
```
223+
224+
## Migration Strategy
225+
226+
### Backward Compatibility
227+
228+
1. **Format Detection**: Attempt CBOR deserialization first, fallback to bincode
229+
2. **Lazy Migration**: Convert bincode data to CBOR on first write operation
230+
3. **Graceful Degradation**: Handle corrupted data gracefully
231+
4. **Migration Logging**: Track migration progress and performance
232+
233+
### Data Structure Evolution
234+
235+
The CBOR format enables future enhancements:
236+
- **Document Metrics**: Track operation counts, size changes
237+
- **User Attribution**: Store author information for collaborative features
238+
- **Compression**: Add compression metadata and handling
239+
- **Encryption**: Metadata-aware encryption strategies
240+
241+
## Performance Impact
242+
243+
### Expected Changes
244+
- **Serialization**: CBOR ~2x slower than bincode (acceptable for persistence operations)
245+
- **Deserialization**: CBOR ~2x slower than bincode
246+
- **Size**: CBOR data ~10-20% larger due to self-describing format and metadata
247+
- **Memory**: Minimal additional memory for metadata fields
248+
249+
### Mitigation Strategies
250+
- Lazy evaluation of metadata fields
251+
- Optional metadata to minimize overhead for small documents
252+
- Background persistence to avoid blocking operations
253+
254+
## Risk Assessment
255+
256+
### High Risk
257+
- **Data Corruption**: During migration from bincode to CBOR
258+
- **Performance Regression**: Slower persistence operations
259+
260+
### Medium Risk
261+
- **Migration Complexity**: Large existing datasets with edge cases
262+
- **Cross-Version Compatibility**: Mixed format environments
263+
264+
### Low Risk
265+
- **Dependency Issues**: ciborium library already included
266+
- **Tool Integration**: CBOR tooling availability
267+
268+
### Risk Mitigation
269+
- Extensive testing with production data samples
270+
- Staged rollout with feature flags
271+
- Comprehensive backup strategy
272+
- Monitoring and alerting for migration issues
273+
274+
## Testing Strategy
275+
276+
1. **Unit Tests**: Extension trait serialization round-trips
277+
2. **Integration Tests**: Full SyncKv persistence cycle with metadata
278+
3. **Migration Tests**: Bincode → CBOR conversion validation
279+
4. **Performance Tests**: Benchmark against current bincode implementation
280+
5. **Compatibility Tests**: Cross-version data access scenarios
281+
6. **Tool Tests**: Verify external CBOR tool can read/modify data
282+
283+
## Success Metrics
284+
285+
- Zero data loss during migration
286+
- Performance degradation <50% (acceptable for persistence operations)
287+
- Successful metadata extensibility demonstration
288+
- External CBOR tools can inspect data structure
289+
- Backward compatibility maintained for migration period
290+
291+
## Timeline
292+
293+
- **Week 1**: Extension trait implementation and unit tests
294+
- **Week 2**: YSweetData container and SyncKv integration
295+
- **Week 3**: Migration logic and integration testing
296+
- **Week 4**: Performance testing and optimization
297+
- **Week 5-6**: Production deployment and monitoring
298+
299+
## Dependencies
300+
301+
- `ciborium = "0.2.2"` (already available in Cargo.toml:34)
302+
- `serde` with derive features (already available)
303+
- `chrono` for timestamps (already available)
304+
305+
## Future Opportunities
306+
307+
1. **Rich Metadata**: Document versioning, collaborative author tracking
308+
2. **Tool Ecosystem**: CBOR-based debugging and analysis tools
309+
3. **Cross-Language Access**: Python/JavaScript tools for Y-Sweet data
310+
4. **Compression**: CBOR-tagged compression for large documents
311+
5. **Analytics**: Document usage patterns and performance metrics
312+
6. **Backup/Export**: Human-readable document exports via CBOR tools
313+
314+
## Alternatives Considered
315+
316+
1. **Keep Bincode**: No risk but foregoes extensibility and tool integration
317+
2. **MessagePack**: Similar benefits but less standardized than CBOR
318+
3. **Custom Binary Format**: Full control but significant development overhead
319+
4. **JSON**: Human readable but much larger size and slower performance
320+
5. **Protocol Buffers**: Good performance but requires schema management
321+
322+
## Conclusion
323+
324+
Migrating to CBOR provides essential extensibility for Y-Sweet's evolution while maintaining reasonable performance characteristics. The two-layer architecture cleanly separates concerns: the extension trait handles efficient BTreeMap serialization, while SyncKv manages metadata and versioning.
325+
326+
The self-describing nature of CBOR maps enables external tool integration, crucial for debugging, analysis, and cross-platform access. Combined with the extensible metadata framework, this migration positions Y-Sweet for future enhancements without breaking existing deployments.
327+
328+
The proposed backward-compatible migration strategy minimizes deployment risk while providing a clear path toward a more maintainable and extensible data persistence layer.

0 commit comments

Comments
 (0)