Skip to content

perf: improve data export speed for large datasets #188

@cptkoolbeenz

Description

@cptkoolbeenz

Description

Data export for large logging sessions takes excessive time and consumes extremely high memory. While issue #78 addressed some memory issues and added progress indicators, significant performance improvements are still needed.

Current Performance Metrics

Recent test case:

  • Configuration: 16 channels at 100Hz
  • Duration: ~9 hours (2025-08-06T03:02:19 to 2025-08-06T12:05:01)
  • Data points: ~51.8 million samples (16 channels × 100Hz × 32,562 seconds)
  • Export time: ~1 hour 15 minutes
  • Output size: ~300MB CSV file
  • CPU usage: ~10% (reasonable)
  • Memory usage: >32GB (excessive)

Performance Goals

  • Export time reduced by 10x (from 75 minutes to ~7.5 minutes for similar datasets)
  • Memory usage capped at reasonable levels (e.g., 4-8GB max)
  • Maintain low CPU usage
  • No data loss or corruption

Potential Optimization Strategies

  1. Streaming writes: Write data directly to disk without buffering entire dataset in memory
  2. Batch processing: Process data in chunks rather than loading all at once
  3. Parallel processing: Utilize multiple CPU cores for data transformation
  4. Optimized CSV writing: Use high-performance CSV libraries or custom implementation
  5. Memory pooling: Reuse memory buffers to reduce allocation overhead
  6. Database query optimization: Ensure efficient data retrieval from SQLite

Technical Considerations

  • Current implementation appears to load significant data into memory despite commit c3fd1a4 claiming to fix memory issues
  • 32GB+ memory usage suggests entire dataset may still be loaded into memory
  • Export rate: ~11.5K samples/second (51.8M samples / 4500 seconds)
  • Target rate: ~115K samples/second for 10x improvement

Success Criteria

  • Large dataset exports (50M+ samples) complete in under 10 minutes
  • Memory usage stays below 8GB during export
  • Progress indicator remains responsive
  • Export can be cancelled cleanly
  • No regression in data accuracy or completeness

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions