Create ETL to sync FDUs from eCAPRIS EDP dataset to Moped DB#1716
Conversation
| updated_by_user_id: x-hasura-user-db-id | ||
| columns: | ||
| - dept_unit | ||
| - ecapris_funding_id |
There was a problem hiding this comment.
This is the same column as fao_id in the eCAPRIS data and will be populated with the unique id that we can trace back to a specific FDU's data if needed. I'm torn about the rename but fao_id is hard to parse imo. I'd love thoughts!
There was a problem hiding this comment.
I don't know what fao stands for but can ask. FSD just let us know that this is the unique id for FDUs in their DB. What I do know is that it always makes me think of FAO Schwarz 🧸 😆
There was a problem hiding this comment.
I was just curious. I also think of FAO Schwarz 🚂
| - funding_source_id | ||
| - funding_status_id | ||
| - is_deleted | ||
| - is_editable |
There was a problem hiding this comment.
No longer needed. All records in the moped_proj_funding table should be editable just like they are today since we aren't mixing with immutable eCAPRIS data in the same table.
| -- Index fdu since we'll be querying by it to avoid duplicates in the combined view (using NOT EXISTS) | ||
| CREATE INDEX idx_moped_proj_funding_fdu_not_deleted |
There was a problem hiding this comment.
I've added many indexes that I paired with Claude about. Let me know if this jumps out as over-indexing to anyone but I included my comments of how I interpreted these benefiting us in queries and views needed for the UI.
I also sped-run potential performance impacts we could see on project_list_view since we aggregate funding information in it. Here are those details.
There was a problem hiding this comment.
Nice! these look great to me!
|
|
||
| -- Create a combined_project_funding_view for the project funding UI to consume. This view combines | ||
| -- both moped_proj_funding and ecapris_subproject_funding data and removes duplicates based on FDU. | ||
| CREATE OR REPLACE VIEW combined_project_funding_view AS |
There was a problem hiding this comment.
This view follows the same pattern that we used for notes to bring notes together in the UI.
| -- Disable Hasura triggers temporarily to allow direct updates to moped_proj_funding without generating activity log entries | ||
| DO $$ |
There was a problem hiding this comment.
I don't love this block to enable/disable triggers, but we've been looking for a way to do this besides SET session_replication_role = replica;/SET session_replication_role = default; since we ran into permissions issues with the moped_admin user.
I used the following SQL on the read replica to verify that moped_admin can operate on these triggers when applying the migration but please double-check me. 🙏
SELECT tableowner FROM pg_tables WHERE schemaname = 'public' AND tablename = 'moped_proj_funding';The IF/ELSE logic is needed because I realized these triggers only exist when replicating from a production snapshot and ./hasura-cluster start breaks without the checks. We could avoid if DISABLE TRIGGER IF EXISTS was a Postgres thing but it isn't unfortunately. 🫠
There was a problem hiding this comment.
Looks great to me—thanks for working around this and establishing a new pattern 🚀
I guess it's always worth double-checking—are you sure we don't want to have these changes in the activity log?
There was a problem hiding this comment.
It is a good question, and I did overlook the need for some initial activity for the activity log to show that sync is switched on when we roll this out cityofaustin/atd-data-tech#24689 🙏
I'm going to take another look at these since I was distracted with preventing clutter and reshuffling the project list view. Thanks!
There was a problem hiding this comment.
This helped me catch a bug with the update to switch should_sync_ecapris_funding to true for all projects with subproject ids. Since the metadata change needed to make the Hasura event trigger work applies after this migration, the statement below in the migration in this PR doesn't generate activity rows. I need to think about it more next week. 🤔
-- Switch on sync for projects with ecapris_subproject_id set
UPDATE moped_project SET should_sync_ecapris_funding = TRUE
WHERE ecapris_subproject_id IS NOT NULL;There was a problem hiding this comment.
After splitting eCAPRIS synced records into their own table, this script became more about creating a cached layer of all records available in the upstream dataset. This gives us the benefit of not needing to make Socrata API calls from the UI at all, but I am curious if anyone sees downsides to this.
FSD told us earlier this year that the eCAPRIS funding data that we are consuming is up-to-date as of EOB of the previous day. So, this ETL will only need to run once per day and after Charlie's FDU tagging ETL. Moped users will always have the latest and greatest at the start of the day.
There was a problem hiding this comment.
Very cool. Love the short and sweet ETL code 🚀
| 1. Dry run the script via: | ||
| ```bash | ||
| docker compose run ecapris-funding -n | ||
| ``` |
There was a problem hiding this comment.
h/t @frankhereford for pushing for dry run modes and motivating me to include here. Next step would be to parameterize in the DAG.
johnclary
left a comment
There was a problem hiding this comment.
This looks great to me. Very exciting to see funding get the ecapris sync treatment. Thanks for excellent test instructions and documentation 🚀 🚢
| -- Index fdu since we'll be querying by it to avoid duplicates in the combined view (using NOT EXISTS) | ||
| CREATE INDEX idx_moped_proj_funding_fdu_not_deleted |
There was a problem hiding this comment.
Nice! these look great to me!
There was a problem hiding this comment.
Very cool. Love the short and sweet ETL code 🚀
| -- Disable Hasura triggers temporarily to allow direct updates to moped_proj_funding without generating activity log entries | ||
| DO $$ |
There was a problem hiding this comment.
Looks great to me—thanks for working around this and establishing a new pattern 🚀
I guess it's always worth double-checking—are you sure we don't want to have these changes in the activity log?
|
Hey all, late last week, I caught a bug with the order we need to make updates to the Hasura metadata that affects our ability to generate activity log entries to show that funding sync is turned on for projects. More here but TLDR: I split out launch day updates into their own issue and future migration. |
chiaberry
left a comment
There was a problem hiding this comment.
Nice refactor and I think this will give us fewer headaches in the future!
Side note, I cannot figure out why, but every time I ran this I'd end up with an orphan container.
|
|
||
| if args.dry_run: | ||
| logger.info( | ||
| f"[DRY RUN] Would upsert chunk of {len(chunk_payload)} funding records into Moped DB..." |
There was a problem hiding this comment.
i like the [DRY RUN] part of the logs a lot, hard to miss!
There was a problem hiding this comment.
awesome - h/t @frankhereford for starting this pattern!
| ## Testing the script locally using Docker Compose | ||
|
|
||
| 1. Ensure the local Moped stack is running with a current snapshot. | ||
| 1. Configure an `env_file` according to the `env_template` example. Find the Socrata (ODP) secrets in the secret store entry called `Socrata Key ID, Secret, and Token`. |
There was a problem hiding this comment.
not sure if this is a place to put it, but I had to go searching for the mapping between the secrets in the 1pw entry and the env_file, since the names of the variables dont match with the entries. I had to look in airflow to remind me what goes where
There was a problem hiding this comment.
Hard agree here. Could the env_template values align with our keys in the pw manager?
There was a problem hiding this comment.
Thanks, y'all! This was pure copypasta so I'll update to include the full word Socrata and the matching secret name like SOCRATA_ENDPOINT. Agree that this would reduce friction when getting started here. 🙏
| updated_by_user_id: x-hasura-user-db-id | ||
| columns: | ||
| - dept_unit | ||
| - ecapris_funding_id |
mateoclarke
left a comment
There was a problem hiding this comment.
This is great Mike. I'm less familiar with ETL testing in Moped but I appreciate the patience you've shown with your testing instructions, README, and console messages along the way.
This is probably user error on my part, but the mutation query didn't quite work as expected for me...
After running the mutation, I see 4 records, not 3. And only one of them shows "is_synced_from_ecapris": true. My guess is somehow I messed up the mutation or the state of my snapshot is weird.
Here is the full graphql response after mutation (I've added comments for emphasis):
{
"data": {
"combined_project_funding_view": [
{
"amount": 999,
"fdu": "800J 2507 9719",
"description": null,
"fao_id": null,
"status_name": "Set up",
"is_synced_from_ecapris": false
},
{
"amount": 498563,
"fdu": null,
"description": "800J 2507 9719\t", // <--- This \t is weird.
"fao_id": null,
"status_name": "Tentative",
"is_synced_from_ecapris": false
},
{
"amount": 100000,
"fdu": null,
"description": null,
"fao_id": null,
"status_name": "Tentative",
"is_synced_from_ecapris": false
},
{
"amount": 0,
"fdu": "820B 2507 D382",
"description": "Synced from eCAPRIS",
"fao_id": 130162,
"status_name": "Set up",
"is_synced_from_ecapris": true // <--- y u true?
}
]
}
}
Again, this is probably me being naive and messing up the mutation somehow, but I wanted to share my experience just in case it helps.
Update from 10 minutes later: I just ran a fresh snapshot and it seems better but still a minor discrepancy. My first ProjectFundingTableQuerySyncOn query returns two records, not 3 as you assert. But after I run the mutation, 3 records return "is_synced_from_ecapris": false. I hope this doesn't confuse your testing.
|
|
||
| if args.dry_run: | ||
| logger.info( | ||
| f"[DRY RUN] Would upsert chunk of {len(chunk_payload)} funding records into Moped DB..." |
| ## Testing the script locally using Docker Compose | ||
|
|
||
| 1. Ensure the local Moped stack is running with a current snapshot. | ||
| 1. Configure an `env_file` according to the `env_template` example. Find the Socrata (ODP) secrets in the secret store entry called `Socrata Key ID, Secret, and Token`. |
There was a problem hiding this comment.
Hard agree here. Could the env_template values align with our keys in the pw manager?
|
|
||
| 1. Ensure the local Moped stack is running with a current snapshot. | ||
| 1. Configure an `env_file` according to the `env_template` example. Find the Socrata (ODP) secrets in the secret store entry called `Socrata Key ID, Secret, and Token`. | ||
| 1. `docker compose build` to build the container. |
There was a problem hiding this comment.
Maybe I'm a dummy, but it would've unstuck me faster if there was a bullet after this like:
1. cd moped-etl/ecapris-funding.
I thought I had my credentials wrong or misnamed or misplaced the env_file
There was a problem hiding this comment.
sorry! i totally overlooked this comment. I'll take another look at this readme when I'm working on cityofaustin/atd-data-tech#25665. 🙏
|
@mateoclarke thanks for testing and raising the flag on the counts in the steps not matching up. I'll keep in mind next time around since there could be changes in the prod Moped data over time which doesn't make having fixed counts like this in the steps very reliable. 🙏 |
Associated issues
Closes cityofaustin/atd-data-tech#24674
This PR adds an ETL script to move eCAPRIS funding records into the Moped DB from the Charlie-enriched ODP funding dataset. The schema is also updated to isolate user-maintained records in
moped_proj_fundingand ETL-maintained records in a new table calledecapris_subproject_funding.Last, a new
combined_project_funding_viewis similar to the existingcombined_project_notes_viewwhich brings funding records together to display in the funding view and other place that we need to show the overall funding picture of a project. One difference is that this view only bring in eCAPRIS funding records with FDUs that don't already exist in the user-maintained rows of a project.This is to preserve existing user data and to prevent duplicates. It also opens the door to "overrides" that are really user records that use eCAPRIS data as a template. In addition, we could display eCAPRIS reference values for records that have a
ecapris_funding_idwhich ties back to the records in theecapris_subproject_fundingtable.Testing
URL to test:
Local only
Steps to test:
/atd-moped/moped-etl/ecapris-fundingand follow the readme to get ready to run the scriptdocker compose run ecapris-funding -nto test the dry run modedocker compose run ecapris-fundingfor the real dealUI queries for project funding table
UI mutation and query combined view afterwards
UI query for future FDU dropdown
UI query for importing one or many FDUs by eCAPRIS subproject id
Ship list
[ ] Product manager added to QA test script if applicable