-
Notifications
You must be signed in to change notification settings - Fork 14
fix: improve database schemas #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add duplicate ID detection and validation in ImportDataCommandHandler - Prevent imports with duplicate IDs and throw BusinessException - Enhance database migration to handle duplicate IDs by keeping most recent records - Add backup and validation logic for safe migration with duplicate removal - Include debug logging for duplicate detection and migration verification
Add proactive idle connection recycling every heartbeat to free slots for paginated sync requests. Implement graceful WebSocket closure after successful paginated responses and errors to avoid lingering connections. Increase per-IP connection limit from 3 to 5 and track last activity for accurate idle detection. Enhance logging, validation, and defensive cleanup to prevent leaks and aid diagnostics.
Add custom database constructors to Drift repositories for better test isolation. Enhance AppDatabase migration validation to handle version mismatches differently in test vs production modes with clearer error messages. Fix SQL syntax in habit migration tests by using standard backticks for `order` column. Update generated mapper imports and minor test formatting.
Add PRIMARY KEY constraints to tables in all schema versions to prevent duplicate IDs during data import and synchronization. This ensures data integrity by enforcing unique identifiers and resolves issues with multiple records sharing the same ID. Updated related repository classes to use the new schema structure and disabled outdated migration tests that relied on the old format.
Summary of ChangesHello @ahmet-cetinkaya, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the database's robustness by enforcing primary key constraints across all tables. This foundational change improves data integrity, optimizes query performance through automatic indexing, and ensures consistency across the application's data model. The extensive refactoring includes updates to historical schema versions, a comprehensive overhaul of the test suite to align with the new constraints, and defensive programming additions to the desktop sync service for better connection management. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a substantial and well-executed refactoring to enforce primary key constraints across the database, which is a crucial step for improving data integrity and performance. The changes are comprehensive, touching not only the database schema definitions but also handling historical data migration, fixing related bugs in the sync service, and updating tests accordingly.
The introduction of explicit primary keys is a fundamental improvement. The migration logic in drift_app_context.dart to handle potential historical duplicates is thoughtful, although there's a point about the de-duplication strategy that needs clarification to prevent potential data loss.
The enhancements to DesktopServerSyncService are particularly impressive, demonstrating robust, defensive programming to address connection pool exhaustion. The addition of idle connection recycling and graceful connection closing will significantly improve the reliability of the sync server.
The test suite has been diligently updated, with tests being redesigned to work with the new constraints and a new comprehensive test file added for the sync service. This shows a strong commitment to quality.
I have a couple of suggestions to further refine the implementation, mainly concerning the data migration strategy and a minor logging improvement. Overall, this is an excellent pull request that significantly strengthens the application's data layer.
| // Backup habit_table with deduplication: keep only the most recently modified record per ID | ||
| await customStatement(''' | ||
| CREATE TEMPORARY TABLE habit_table_backup AS | ||
| SELECT * FROM habit_table | ||
| WHERE rowid IN ( | ||
| SELECT MAX(rowid) | ||
| FROM habit_table | ||
| GROUP BY id | ||
| ) | ||
| '''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a potential data loss issue here due to a discrepancy between the comment and the de-duplication logic. The comment states the goal is to keep the 'most recently modified record', but the query uses MAX(rowid), which keeps the most recently inserted record.
This can lead to data loss. For example, if a user has duplicate records from a sync and then modifies the older of the two records, this migration would discard that change in favor of the newer (but unmodified) record.
To align with the comment's intent and make the de-duplication safer, I recommend selecting the record with the latest modified_date (or created_date if modified_date is null). A window function is a robust way to achieve this.
// Backup habit_table with deduplication: keep the most recently modified record per ID
await customStatement('''
CREATE TEMPORARY TABLE habit_table_backup AS
SELECT * FROM habit_table
WHERE rowid IN (
SELECT rowid FROM (
SELECT
rowid,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY COALESCE(modified_date, created_date) DESC) as rn
FROM habit_table
)
WHERE rn = 1
)
''');| Logger.warning('🚫 Connection rejected: limits exceeded from $clientIP'); | ||
| Logger.warning(' 📊 Current active connections: ${_activeConnections.length}/$maxConcurrentConnections'); | ||
| Logger.warning( | ||
| ' 📊 Connections from $clientIP: ${_ipConnectionCounts[clientIP] ?? 0}/$maxConnectionsPerIP'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better log readability and to ensure the related pieces of information are logged together as a single atomic entry, consider combining these Logger.warning calls into a single call with newline characters. This can be particularly helpful when analyzing logs in an aggregation system.
| Logger.warning('🚫 Connection rejected: limits exceeded from $clientIP'); | |
| Logger.warning(' 📊 Current active connections: ${_activeConnections.length}/$maxConcurrentConnections'); | |
| Logger.warning( | |
| ' 📊 Connections from $clientIP: ${_ipConnectionCounts[clientIP] ?? 0}/$maxConnectionsPerIP'); | |
| Logger.warning('🚫 Connection rejected: limits exceeded from $clientIP\n' ' 📊 Current active connections: ${_activeConnections.length}/$maxConcurrentConnections\n' ' 📊 Connections from $clientIP: ${_ipConnectionCounts[clientIP] ?? 0}/$maxConnectionsPerIP'); |
Address code review feedback: - Use ROW_NUMBER() window function with COALESCE(modified_date, created_date) ordering instead of MAX(rowid) to preserve the most recently modified record rather than most recently inserted - Prevents potential data loss when duplicate records exist and older record was modified after newer record was inserted - Combine multiple Logger.warning calls into single atomic log entry for better log aggregation and readability
|
Thank you @gemini-code-assist for the thorough code review! I've addressed both suggestions: HIGH PRIORITY - Deduplication Logic Fixed: MEDIUM PRIORITY - Logging Improvement: All migration tests continue to pass. Changes pushed in commit 62af5b6. |
|
Thank you for the update, @ahmet-cetinkaya! I appreciate you addressing both the deduplication logic and the logging improvements so quickly. It's great to see these enhancements, especially the refined deduplication strategy which is crucial for data integrity during migrations. The improved logging will also be very beneficial for diagnostics. Keep up the excellent work! |
ahmet-cetinkaya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed both review comments from @gemini-code-assist:
✅ HIGH PRIORITY (drift_app_context.dart:882-891): Changed deduplication logic to use ROW_NUMBER() OVER(PARTITION BY id ORDER BY COALESCE(modified_date, created_date) DESC) as suggested, preventing data loss.
✅ MEDIUM PRIORITY (desktop_server_sync_service.dart:75-78): Combined three separate Logger.warning() calls into a single atomic log entry with newline characters.
Fixed in commit 62af5b6. All tests pass.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent and comprehensive pull request that significantly improves the database's data integrity and robustness. The addition of explicit primary keys is a critical improvement, and the effort to update all historical schemas and write a defensive data migration to handle potential duplicates is commendable. The fixes and redesigns of the tests, especially for the DesktopServerSyncService and the integrity validation, demonstrate a strong commitment to quality and a deep understanding of the system's new constraints. The connection management improvements in the sync service are particularly well-implemented with robust, defensive programming. I have one minor suggestion regarding a potential race condition, but overall, this is a high-quality contribution.
| // Brief delay to ensure any pending messages are transmitted | ||
| // This prevents premature closure before response reaches client | ||
| await Future.delayed(const Duration(milliseconds: 100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While adding a delay is a pragmatic approach to ensure pending messages are transmitted before closing the socket, it can introduce flakiness as the duration might not always be sufficient under high load or on slow networks. A more deterministic approach would be preferable if the API allows for it.
Have you considered if the WebSocket implementation offers a way to flush the send buffer or await the completion of the send operation before closing? If not, this is a reasonable workaround. In that case, I'd suggest adding a more detailed comment explaining why this delay is necessary (e.g., to prevent a race condition where close() is called before the add() operation completes sending data over the wire).
Address code review feedback: - Expand comment to explain the race condition being prevented - Document that dart:io WebSocket lacks flush() or send completion await - Acknowledge limitation that delay may not always be sufficient under high load - Clarify this is a pragmatic workaround for API limitation
|
Thank you for the additional review @gemini-code-assist! ✅ MEDIUM PRIORITY - WebSocket Close Delay Documentation Enhanced: I've improved the comment to provide a detailed explanation of why the delay is necessary:
The implementation remains unchanged (still using the 100ms delay), but the comment now makes the rationale and limitations crystal clear for future maintainers. Fixed in commit 6167271. All 76 sync service tests continue to pass. |
|
Thank you for the detailed update, @ahmet-cetinkaya! I appreciate the enhanced documentation regarding the WebSocket close delay. Providing clear rationale, acknowledging the workaround nature, and detailing the limitations is excellent for future maintainability and understanding. It's great to see such thoroughness in explaining critical implementation details. Keep up the fantastic work! |
…emas fix: improve database schemas
Summary
explicit_pk: ["id"]for 253 table entriesChanges
Database Schema
Set<Column> get primaryKey => {id};to 8 repository table definitionsTest Fixes
Documentation
PRIMARY_KEY_ENFORCEMENT.md- Primary key implementation detailsMIGRATION_TEST_FIX.md- Migration test solutionsDATABASE_INTEGRITY_TEST_FIX.md- Unit test redesign rationaleBenefits
Verification
All 390+ tests pass: