Skip to content

Conversation

@ahmet-cetinkaya
Copy link
Owner

Summary

  • Add explicit PRIMARY KEY constraints to all database tables for data integrity
  • Update 28 historical schema versions with explicit_pk: ["id"] for 253 table entries
  • Fix 387 migration and unit tests that failed due to constraint enforcement
  • Redesign tests to work with constraints rather than against them
  • Add comprehensive documentation explaining changes and test rationale

Changes

Database Schema

  • Added Set<Column> get primaryKey => {id}; to 8 repository table definitions
  • Updated all schema JSON files (v1-v28) to include explicit primary keys
  • Regenerated 29 test schema Dart files

Test Fixes

  • Migration tests: 378 tests - Updated schemas to match current table definitions
  • Integrity service tests: 5 tests - Test orphaned references instead of impossible duplicates
  • Repository tests: 4 tests - Validate query logic with unique data

Documentation

  • PRIMARY_KEY_ENFORCEMENT.md - Primary key implementation details
  • MIGRATION_TEST_FIX.md - Migration test solutions
  • DATABASE_INTEGRITY_TEST_FIX.md - Unit test redesign rationale

Benefits

  • Data Integrity: Database-level uniqueness enforcement prevents corruption
  • Performance: Automatic indexing on primary keys improves query speed
  • Consistency: All tables follow same pattern with guaranteed uniqueness
  • Robust Testing: Tests now work with constraints, validating real scenarios

Verification

All 390+ tests pass:

00:05 +378: migration tests passed
00:01 +5: integrity service tests passed  
00:07 +4: repository tests passed

- Add duplicate ID detection and validation in ImportDataCommandHandler
- Prevent imports with duplicate IDs and throw BusinessException
- Enhance database migration to handle duplicate IDs by keeping most recent records
- Add backup and validation logic for safe migration with duplicate removal
- Include debug logging for duplicate detection and migration verification
Add proactive idle connection recycling every heartbeat to free slots for
paginated sync requests. Implement graceful WebSocket closure after
successful paginated responses and errors to avoid lingering connections.
Increase per-IP connection limit from 3 to 5 and track last activity for
accurate idle detection. Enhance logging, validation, and defensive cleanup
to prevent leaks and aid diagnostics.
Add custom database constructors to Drift repositories for better test isolation.
Enhance AppDatabase migration validation to handle version mismatches differently
in test vs production modes with clearer error messages.
Fix SQL syntax in habit migration tests by using standard backticks for `order` column.
Update generated mapper imports and minor test formatting.
Add PRIMARY KEY constraints to tables in all schema versions to prevent duplicate IDs during data import and synchronization. This ensures data integrity by enforcing unique identifiers and resolves issues with multiple records sharing the same ID. Updated related repository classes to use the new schema structure and disabled outdated migration tests that relied on the old format.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ahmet-cetinkaya, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the database's robustness by enforcing primary key constraints across all tables. This foundational change improves data integrity, optimizes query performance through automatic indexing, and ensures consistency across the application's data model. The extensive refactoring includes updates to historical schema versions, a comprehensive overhaul of the test suite to align with the new constraints, and defensive programming additions to the desktop sync service for better connection management.

Highlights

  • Database Primary Key Enforcement: Explicit PRIMARY KEY constraints have been added to all database tables to ensure data integrity and consistency.
  • Schema Version Updates: 28 historical schema versions have been updated to include explicit_pk: ["id"] for 253 table entries, ensuring backward compatibility and proper constraint application.
  • Test Suite Refactoring: 387 migration and unit tests were fixed and redesigned to properly work with the new primary key constraints, validating real-world scenarios.
  • Desktop Sync Service Enhancements: The desktop server sync service now includes improved connection limit management, proactive idle connection recycling, and more robust WebSocket closure logic to prevent connection pool exhaustion.
  • Data Import Validation: The data import command now includes validation for duplicate IDs within an import batch, preventing data corruption before it reaches the database.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a substantial and well-executed refactoring to enforce primary key constraints across the database, which is a crucial step for improving data integrity and performance. The changes are comprehensive, touching not only the database schema definitions but also handling historical data migration, fixing related bugs in the sync service, and updating tests accordingly.

The introduction of explicit primary keys is a fundamental improvement. The migration logic in drift_app_context.dart to handle potential historical duplicates is thoughtful, although there's a point about the de-duplication strategy that needs clarification to prevent potential data loss.

The enhancements to DesktopServerSyncService are particularly impressive, demonstrating robust, defensive programming to address connection pool exhaustion. The addition of idle connection recycling and graceful connection closing will significantly improve the reliability of the sync server.

The test suite has been diligently updated, with tests being redesigned to work with the new constraints and a new comprehensive test file added for the sync service. This shows a strong commitment to quality.

I have a couple of suggestions to further refine the implementation, mainly concerning the data migration strategy and a minor logging improvement. Overall, this is an excellent pull request that significantly strengthens the application's data layer.

Comment on lines 882 to 891
// Backup habit_table with deduplication: keep only the most recently modified record per ID
await customStatement('''
CREATE TEMPORARY TABLE habit_table_backup AS
SELECT * FROM habit_table
WHERE rowid IN (
SELECT MAX(rowid)
FROM habit_table
GROUP BY id
)
''');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There's a potential data loss issue here due to a discrepancy between the comment and the de-duplication logic. The comment states the goal is to keep the 'most recently modified record', but the query uses MAX(rowid), which keeps the most recently inserted record.

This can lead to data loss. For example, if a user has duplicate records from a sync and then modifies the older of the two records, this migration would discard that change in favor of the newer (but unmodified) record.

To align with the comment's intent and make the de-duplication safer, I recommend selecting the record with the latest modified_date (or created_date if modified_date is null). A window function is a robust way to achieve this.

                  // Backup habit_table with deduplication: keep the most recently modified record per ID
                  await customStatement('''
                CREATE TEMPORARY TABLE habit_table_backup AS
                SELECT * FROM habit_table
                WHERE rowid IN (
                  SELECT rowid FROM (
                    SELECT
                      rowid,
                      ROW_NUMBER() OVER(PARTITION BY id ORDER BY COALESCE(modified_date, created_date) DESC) as rn
                    FROM habit_table
                  )
                  WHERE rn = 1
                )
              ''');

Comment on lines 72 to 79
Logger.warning('🚫 Connection rejected: limits exceeded from $clientIP');
Logger.warning(' 📊 Current active connections: ${_activeConnections.length}/$maxConcurrentConnections');
Logger.warning(
' 📊 Connections from $clientIP: ${_ipConnectionCounts[clientIP] ?? 0}/$maxConnectionsPerIP');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better log readability and to ensure the related pieces of information are logged together as a single atomic entry, consider combining these Logger.warning calls into a single call with newline characters. This can be particularly helpful when analyzing logs in an aggregation system.

Suggested change
Logger.warning('🚫 Connection rejected: limits exceeded from $clientIP');
Logger.warning(' 📊 Current active connections: ${_activeConnections.length}/$maxConcurrentConnections');
Logger.warning(
' 📊 Connections from $clientIP: ${_ipConnectionCounts[clientIP] ?? 0}/$maxConnectionsPerIP');
Logger.warning('🚫 Connection rejected: limits exceeded from $clientIP\n' ' 📊 Current active connections: ${_activeConnections.length}/$maxConcurrentConnections\n' ' 📊 Connections from $clientIP: ${_ipConnectionCounts[clientIP] ?? 0}/$maxConnectionsPerIP');

@ahmet-cetinkaya ahmet-cetinkaya changed the title refactor(db): enforce primary key constraints across all tables fix: improve database schemas Oct 2, 2025
Address code review feedback:
- Use ROW_NUMBER() window function with COALESCE(modified_date, created_date)
  ordering instead of MAX(rowid) to preserve the most recently modified record
  rather than most recently inserted
- Prevents potential data loss when duplicate records exist and older record
  was modified after newer record was inserted
- Combine multiple Logger.warning calls into single atomic log entry for
  better log aggregation and readability
@ahmet-cetinkaya
Copy link
Owner Author

Thank you @gemini-code-assist for the thorough code review! I've addressed both suggestions:

HIGH PRIORITY - Deduplication Logic Fixed:
Changed the migration deduplication strategy from using MAX(rowid) to using ROW_NUMBER() OVER(PARTITION BY id ORDER BY COALESCE(modified_date, created_date) DESC). This correctly preserves the most recently modified record rather than the most recently inserted record, preventing potential data loss scenarios.

MEDIUM PRIORITY - Logging Improvement:
Combined the three separate Logger.warning() calls into a single atomic log entry using newline characters. This improves log readability and ensures related information is kept together in log aggregation systems.

All migration tests continue to pass. Changes pushed in commit 62af5b6.

@gemini-code-assist
Copy link
Contributor

Thank you for the update, @ahmet-cetinkaya! I appreciate you addressing both the deduplication logic and the logging improvements so quickly. It's great to see these enhancements, especially the refined deduplication strategy which is crucial for data integrity during migrations. The improved logging will also be very beneficial for diagnostics. Keep up the excellent work!

Copy link
Owner Author

@ahmet-cetinkaya ahmet-cetinkaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed both review comments from @gemini-code-assist:

HIGH PRIORITY (drift_app_context.dart:882-891): Changed deduplication logic to use ROW_NUMBER() OVER(PARTITION BY id ORDER BY COALESCE(modified_date, created_date) DESC) as suggested, preventing data loss.

MEDIUM PRIORITY (desktop_server_sync_service.dart:75-78): Combined three separate Logger.warning() calls into a single atomic log entry with newline characters.

Fixed in commit 62af5b6. All tests pass.

@ahmet-cetinkaya
Copy link
Owner Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent and comprehensive pull request that significantly improves the database's data integrity and robustness. The addition of explicit primary keys is a critical improvement, and the effort to update all historical schemas and write a defensive data migration to handle potential duplicates is commendable. The fixes and redesigns of the tests, especially for the DesktopServerSyncService and the integrity validation, demonstrate a strong commitment to quality and a deep understanding of the system's new constraints. The connection management improvements in the sync service are particularly well-implemented with robust, defensive programming. I have one minor suggestion regarding a potential race condition, but overall, this is a high-quality contribution.

Comment on lines 890 to 892
// Brief delay to ensure any pending messages are transmitted
// This prevents premature closure before response reaches client
await Future.delayed(const Duration(milliseconds: 100));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While adding a delay is a pragmatic approach to ensure pending messages are transmitted before closing the socket, it can introduce flakiness as the duration might not always be sufficient under high load or on slow networks. A more deterministic approach would be preferable if the API allows for it.

Have you considered if the WebSocket implementation offers a way to flush the send buffer or await the completion of the send operation before closing? If not, this is a reasonable workaround. In that case, I'd suggest adding a more detailed comment explaining why this delay is necessary (e.g., to prevent a race condition where close() is called before the add() operation completes sending data over the wire).

Address code review feedback:
- Expand comment to explain the race condition being prevented
- Document that dart:io WebSocket lacks flush() or send completion await
- Acknowledge limitation that delay may not always be sufficient under high load
- Clarify this is a pragmatic workaround for API limitation
@ahmet-cetinkaya
Copy link
Owner Author

Thank you for the additional review @gemini-code-assist!

MEDIUM PRIORITY - WebSocket Close Delay Documentation Enhanced:

I've improved the comment to provide a detailed explanation of why the delay is necessary:

  • Explicitly identifies this as a WORKAROUND for a race condition
  • Explains that dart:io WebSocket does not expose a flush() method or way to await send completion
  • Clarifies the delay ensures pending messages in the send buffer are transmitted before closure
  • Acknowledges the limitation that under high load or slow networks, 100ms may not always be sufficient
  • Documents this as a pragmatic approach to significantly reduce the likelihood of premature closure

The implementation remains unchanged (still using the 100ms delay), but the comment now makes the rationale and limitations crystal clear for future maintainers.

Fixed in commit 6167271. All 76 sync service tests continue to pass.

@gemini-code-assist
Copy link
Contributor

Thank you for the detailed update, @ahmet-cetinkaya! I appreciate the enhanced documentation regarding the WebSocket close delay. Providing clear rationale, acknowledging the workaround nature, and detailing the limitations is excellent for future maintainability and understanding. It's great to see such thoroughness in explaining critical implementation details. Keep up the fantastic work!

@ahmet-cetinkaya ahmet-cetinkaya merged commit fc98685 into main Oct 2, 2025
@ahmet-cetinkaya ahmet-cetinkaya deleted the fix/improve-database-schemas branch October 25, 2025 11:18
ahmet-cetinkaya added a commit that referenced this pull request Nov 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants