fix: improve database schemas #100

ahmet-cetinkaya · 2025-10-02T15:53:14Z

Summary

Add explicit PRIMARY KEY constraints to all database tables for data integrity
Update 28 historical schema versions with explicit_pk: ["id"] for 253 table entries
Fix 387 migration and unit tests that failed due to constraint enforcement
Redesign tests to work with constraints rather than against them
Add comprehensive documentation explaining changes and test rationale

Changes

Database Schema

Added Set<Column> get primaryKey => {id}; to 8 repository table definitions
Updated all schema JSON files (v1-v28) to include explicit primary keys
Regenerated 29 test schema Dart files

Test Fixes

Migration tests: 378 tests - Updated schemas to match current table definitions
Integrity service tests: 5 tests - Test orphaned references instead of impossible duplicates
Repository tests: 4 tests - Validate query logic with unique data

Documentation

PRIMARY_KEY_ENFORCEMENT.md - Primary key implementation details
MIGRATION_TEST_FIX.md - Migration test solutions
DATABASE_INTEGRITY_TEST_FIX.md - Unit test redesign rationale

Benefits

Data Integrity: Database-level uniqueness enforcement prevents corruption
Performance: Automatic indexing on primary keys improves query speed
Consistency: All tables follow same pattern with guaranteed uniqueness
Robust Testing: Tests now work with constraints, validating real scenarios

Verification

All 390+ tests pass:

00:05 +378: migration tests passed
00:01 +5: integrity service tests passed  
00:07 +4: repository tests passed

- Add duplicate ID detection and validation in ImportDataCommandHandler - Prevent imports with duplicate IDs and throw BusinessException - Enhance database migration to handle duplicate IDs by keeping most recent records - Add backup and validation logic for safe migration with duplicate removal - Include debug logging for duplicate detection and migration verification

Add proactive idle connection recycling every heartbeat to free slots for paginated sync requests. Implement graceful WebSocket closure after successful paginated responses and errors to avoid lingering connections. Increase per-IP connection limit from 3 to 5 and track last activity for accurate idle detection. Enhance logging, validation, and defensive cleanup to prevent leaks and aid diagnostics.

Add custom database constructors to Drift repositories for better test isolation. Enhance AppDatabase migration validation to handle version mismatches differently in test vs production modes with clearer error messages. Fix SQL syntax in habit migration tests by using standard backticks for `order` column. Update generated mapper imports and minor test formatting.

Add PRIMARY KEY constraints to tables in all schema versions to prevent duplicate IDs during data import and synchronization. This ensures data integrity by enforcing unique identifiers and resolves issues with multiple records sharing the same ID. Updated related repository classes to use the new schema structure and disabled outdated migration tests that relied on the old format.

gemini-code-assist · 2025-10-02T15:54:37Z

Summary of Changes

Hello @ahmet-cetinkaya, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the database's robustness by enforcing primary key constraints across all tables. This foundational change improves data integrity, optimizes query performance through automatic indexing, and ensures consistency across the application's data model. The extensive refactoring includes updates to historical schema versions, a comprehensive overhaul of the test suite to align with the new constraints, and defensive programming additions to the desktop sync service for better connection management.

Highlights

Database Primary Key Enforcement: Explicit PRIMARY KEY constraints have been added to all database tables to ensure data integrity and consistency.
Schema Version Updates: 28 historical schema versions have been updated to include explicit_pk: ["id"] for 253 table entries, ensuring backward compatibility and proper constraint application.
Test Suite Refactoring: 387 migration and unit tests were fixed and redesigned to properly work with the new primary key constraints, validating real-world scenarios.
Desktop Sync Service Enhancements: The desktop server sync service now includes improved connection limit management, proactive idle connection recycling, and more robust WebSocket closure logic to prevent connection pool exhaustion.
Data Import Validation: The data import command now includes validation for duplicate IDs within an import batch, preventing data corruption before it reaches the database.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is a substantial and well-executed refactoring to enforce primary key constraints across the database, which is a crucial step for improving data integrity and performance. The changes are comprehensive, touching not only the database schema definitions but also handling historical data migration, fixing related bugs in the sync service, and updating tests accordingly.

The introduction of explicit primary keys is a fundamental improvement. The migration logic in drift_app_context.dart to handle potential historical duplicates is thoughtful, although there's a point about the de-duplication strategy that needs clarification to prevent potential data loss.

The enhancements to DesktopServerSyncService are particularly impressive, demonstrating robust, defensive programming to address connection pool exhaustion. The addition of idle connection recycling and graceful connection closing will significantly improve the reliability of the sync server.

The test suite has been diligently updated, with tests being redesigned to work with the new constraints and a new comprehensive test file added for the sync service. This shows a strong commitment to quality.

I have a couple of suggestions to further refine the implementation, mainly concerning the data migration strategy and a minor logging improvement. Overall, this is an excellent pull request that significantly strengthens the application's data layer.

gemini-code-assist · 2025-10-02T15:59:59Z

src/lib/infrastructure/persistence/shared/contexts/drift/drift_app_context.dart

+                  // Backup habit_table with deduplication: keep only the most recently modified record per ID
+                  await customStatement('''
+                CREATE TEMPORARY TABLE habit_table_backup AS
+                SELECT * FROM habit_table
+                WHERE rowid IN (
+                  SELECT MAX(rowid)
+                  FROM habit_table
+                  GROUP BY id
+                )
+              ''');


There's a potential data loss issue here due to a discrepancy between the comment and the de-duplication logic. The comment states the goal is to keep the 'most recently modified record', but the query uses MAX(rowid), which keeps the most recently inserted record.

This can lead to data loss. For example, if a user has duplicate records from a sync and then modifies the older of the two records, this migration would discard that change in favor of the newer (but unmodified) record.

To align with the comment's intent and make the de-duplication safer, I recommend selecting the record with the latest modified_date (or created_date if modified_date is null). A window function is a robust way to achieve this.

// Backup habit_table with deduplication: keep the most recently modified record per ID await customStatement(''' CREATE TEMPORARY TABLE habit_table_backup AS SELECT * FROM habit_table WHERE rowid IN ( SELECT rowid FROM ( SELECT rowid, ROW_NUMBER() OVER(PARTITION BY id ORDER BY COALESCE(modified_date, created_date) DESC) as rn FROM habit_table ) WHERE rn = 1 ) ''');

gemini-code-assist · 2025-10-02T15:59:59Z

src/lib/infrastructure/desktop/features/sync/desktop_server_sync_service.dart

            Logger.warning('🚫 Connection rejected: limits exceeded from $clientIP');
+            Logger.warning('   📊 Current active connections: ${_activeConnections.length}/$maxConcurrentConnections');
+            Logger.warning(
+                '   📊 Connections from $clientIP: ${_ipConnectionCounts[clientIP] ?? 0}/$maxConnectionsPerIP');


For better log readability and to ensure the related pieces of information are logged together as a single atomic entry, consider combining these Logger.warning calls into a single call with newline characters. This can be particularly helpful when analyzing logs in an aggregation system.

Suggested change

Logger.warning('🚫 Connection rejected: limits exceeded from $clientIP');

Logger.warning(' 📊 Current active connections: ${_activeConnections.length}/$maxConcurrentConnections');

Logger.warning(

' 📊 Connections from $clientIP: ${_ipConnectionCounts[clientIP] ?? 0}/$maxConnectionsPerIP');

Logger.warning('🚫 Connection rejected: limits exceeded from $clientIP\n' ' 📊 Current active connections: ${_activeConnections.length}/$maxConcurrentConnections\n' ' 📊 Connections from $clientIP: ${_ipConnectionCounts[clientIP] ?? 0}/$maxConnectionsPerIP');

Address code review feedback: - Use ROW_NUMBER() window function with COALESCE(modified_date, created_date) ordering instead of MAX(rowid) to preserve the most recently modified record rather than most recently inserted - Prevents potential data loss when duplicate records exist and older record was modified after newer record was inserted - Combine multiple Logger.warning calls into single atomic log entry for better log aggregation and readability

ahmet-cetinkaya · 2025-10-02T16:08:53Z

Thank you @gemini-code-assist for the thorough code review! I've addressed both suggestions:

HIGH PRIORITY - Deduplication Logic Fixed:
Changed the migration deduplication strategy from using MAX(rowid) to using ROW_NUMBER() OVER(PARTITION BY id ORDER BY COALESCE(modified_date, created_date) DESC). This correctly preserves the most recently modified record rather than the most recently inserted record, preventing potential data loss scenarios.

MEDIUM PRIORITY - Logging Improvement:
Combined the three separate Logger.warning() calls into a single atomic log entry using newline characters. This improves log readability and ensures related information is kept together in log aggregation systems.

All migration tests continue to pass. Changes pushed in commit 62af5b6.

gemini-code-assist · 2025-10-02T16:09:20Z

Thank you for the update, @ahmet-cetinkaya! I appreciate you addressing both the deduplication logic and the logging improvements so quickly. It's great to see these enhancements, especially the refined deduplication strategy which is crucial for data integrity during migrations. The improved logging will also be very beneficial for diagnostics. Keep up the excellent work!

ahmet-cetinkaya

I've addressed both review comments from @gemini-code-assist:

✅ HIGH PRIORITY (drift_app_context.dart:882-891): Changed deduplication logic to use ROW_NUMBER() OVER(PARTITION BY id ORDER BY COALESCE(modified_date, created_date) DESC) as suggested, preventing data loss.

✅ MEDIUM PRIORITY (desktop_server_sync_service.dart:75-78): Combined three separate Logger.warning() calls into a single atomic log entry with newline characters.

Fixed in commit 62af5b6. All tests pass.

ahmet-cetinkaya · 2025-10-02T16:13:40Z

/gemini review

gemini-code-assist

Code Review

This is an excellent and comprehensive pull request that significantly improves the database's data integrity and robustness. The addition of explicit primary keys is a critical improvement, and the effort to update all historical schemas and write a defensive data migration to handle potential duplicates is commendable. The fixes and redesigns of the tests, especially for the DesktopServerSyncService and the integrity validation, demonstrate a strong commitment to quality and a deep understanding of the system's new constraints. The connection management improvements in the sync service are particularly well-implemented with robust, defensive programming. I have one minor suggestion regarding a potential race condition, but overall, this is a high-quality contribution.

gemini-code-assist · 2025-10-02T16:16:01Z

src/lib/infrastructure/desktop/features/sync/desktop_server_sync_service.dart

+      // Brief delay to ensure any pending messages are transmitted
+      // This prevents premature closure before response reaches client
+      await Future.delayed(const Duration(milliseconds: 100));


While adding a delay is a pragmatic approach to ensure pending messages are transmitted before closing the socket, it can introduce flakiness as the duration might not always be sufficient under high load or on slow networks. A more deterministic approach would be preferable if the API allows for it.

Have you considered if the WebSocket implementation offers a way to flush the send buffer or await the completion of the send operation before closing? If not, this is a reasonable workaround. In that case, I'd suggest adding a more detailed comment explaining why this delay is necessary (e.g., to prevent a race condition where close() is called before the add() operation completes sending data over the wire).

Address code review feedback: - Expand comment to explain the race condition being prevented - Document that dart:io WebSocket lacks flush() or send completion await - Acknowledge limitation that delay may not always be sufficient under high load - Clarify this is a pragmatic workaround for API limitation

ahmet-cetinkaya · 2025-10-02T16:21:48Z

Thank you for the additional review @gemini-code-assist!

✅ MEDIUM PRIORITY - WebSocket Close Delay Documentation Enhanced:

I've improved the comment to provide a detailed explanation of why the delay is necessary:

Explicitly identifies this as a WORKAROUND for a race condition
Explains that dart:io WebSocket does not expose a flush() method or way to await send completion
Clarifies the delay ensures pending messages in the send buffer are transmitted before closure
Acknowledges the limitation that under high load or slow networks, 100ms may not always be sufficient
Documents this as a pragmatic approach to significantly reduce the likelihood of premature closure

The implementation remains unchanged (still using the 100ms delay), but the comment now makes the rationale and limitations crystal clear for future maintainers.

Fixed in commit 6167271. All 76 sync service tests continue to pass.

gemini-code-assist · 2025-10-02T16:22:14Z

Thank you for the detailed update, @ahmet-cetinkaya! I appreciate the enhanced documentation regarding the WebSocket close delay. Providing clear rationale, acknowledging the workaround nature, and detailing the limitations is excellent for future maintainability and understanding. It's great to see such thoroughness in explaining critical implementation details. Keep up the fantastic work!

…emas fix: improve database schemas

ahmet-cetinkaya added 4 commits October 2, 2025 15:40

gemini-code-assist bot reviewed Oct 2, 2025

View reviewed changes

ahmet-cetinkaya changed the title ~~refactor(db): enforce primary key constraints across all tables~~ fix: improve database schemas Oct 2, 2025

ahmet-cetinkaya commented Oct 2, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 2, 2025

View reviewed changes

ahmet-cetinkaya merged commit fc98685 into main Oct 2, 2025

ahmet-cetinkaya deleted the fix/improve-database-schemas branch October 25, 2025 11:18

ahmet-cetinkaya added a commit that referenced this pull request Nov 30, 2025

Merge pull request #100 from ahmet-cetinkaya/fix/improve-database-sch…

c920bdf

…emas fix: improve database schemas

fix: improve database schemas #100

fix: improve database schemas #100

Uh oh!

Conversation

ahmet-cetinkaya commented Oct 2, 2025

Summary

Changes

Database Schema

Test Fixes

Documentation

Benefits

Verification

Uh oh!

gemini-code-assist bot commented Oct 2, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

ahmet-cetinkaya commented Oct 2, 2025

Uh oh!

gemini-code-assist bot commented Oct 2, 2025

Uh oh!

ahmet-cetinkaya left a comment

Choose a reason for hiding this comment

Uh oh!

ahmet-cetinkaya commented Oct 2, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

ahmet-cetinkaya commented Oct 2, 2025

Uh oh!

gemini-code-assist bot commented Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants