Skip to content

Create ETL to sync FDUs from eCAPRIS EDP dataset to Moped DB#1716

Merged
mddilley merged 42 commits into
mike/24593_fund_sync_dbfrom
mike/24674_fund_sync_etl
Nov 20, 2025
Merged

Create ETL to sync FDUs from eCAPRIS EDP dataset to Moped DB#1716
mddilley merged 42 commits into
mike/24593_fund_sync_dbfrom
mike/24674_fund_sync_etl

Conversation

@mddilley
Copy link
Copy Markdown
Collaborator

@mddilley mddilley commented Nov 12, 2025

Associated issues

Closes cityofaustin/atd-data-tech#24674

This PR adds an ETL script to move eCAPRIS funding records into the Moped DB from the Charlie-enriched ODP funding dataset. The schema is also updated to isolate user-maintained records in moped_proj_funding and ETL-maintained records in a new table called ecapris_subproject_funding.

Last, a new combined_project_funding_view is similar to the existing combined_project_notes_view which brings funding records together to display in the funding view and other place that we need to show the overall funding picture of a project. One difference is that this view only bring in eCAPRIS funding records with FDUs that don't already exist in the user-maintained rows of a project.

This is to preserve existing user data and to prevent duplicates. It also opens the door to "overrides" that are really user records that use eCAPRIS data as a template. In addition, we could display eCAPRIS reference values for records that have a ecapris_funding_id which ties back to the records in the ecapris_subproject_funding table.

Testing

URL to test:

Local only

Steps to test:

  1. Start your local stack from a production snapshot
  2. Find your way to /atd-moped/moped-etl/ecapris-funding and follow the readme to get ready to run the script
  3. Run docker compose run ecapris-funding -n to test the dry run mode
  4. Then, run docker compose run ecapris-funding for the real deal
  5. Open the Hasura console (or API client of choice) to step through the following queries

UI queries for project funding table

-- sync off (displays only user-maintained records)
query ProjectFundingTableQuerySyncOff {
  combined_project_funding_view(where: {project_id: {_eq: 517}}) {
    amount
    fdu
    description
    fao_id
    status_name
    is_synced_from_ecapris
  }
}

-- sync on (supplements funding with any additional FDUs from the eCAPRIS dataset without duplicates)
query ProjectFundingTableQuerySyncOn {
  combined_project_funding_view(where: {_or: [{ecapris_subproject_id: {_eq: "13223.018"}}, {project_id: {_eq: 517}}]}) {
    amount
    fdu
    description
    fao_id
    status_name
    is_synced_from_ecapris
  }
}

UI mutation and query combined view afterwards

-- test this query again and note there are 3 results with 1 synced from eCAPRIS
-- (two out of three records have is_synced_from_ecapris = false)
query ProjectFundingTableQuerySyncOn {
  combined_project_funding_view(where: {_or: [{ecapris_subproject_id: {_eq: "13223.018"}}, {project_id: {_eq: 517}}]}) {
    amount
    fdu
    description
    fao_id
    status_name
    is_synced_from_ecapris
  }
}

-- use this mutation to insert a duplicate FDU with the query variables right below
mutation AddUserMaintainedFDU($objects: [moped_proj_funding_insert_input!]!) {
  insert_moped_proj_funding(objects: $objects)
  {
    returning {
      proj_funding_id
    }
  }
}

-- query variable with duplicate record payload
{
  "objects": [
    {
      "project_id": 517,
      "funding_amount": 999,
      "ecapris_funding_id": 130268,
      "fdu": "800J 2507 9719",
      "funding_status_id": 5,
      "ecapris_funding_id": 130268
    }
  ]
}

-- now, remove the query variables and repeat the first query and note that 
-- all three records show is_synced_from_ecapris = false

UI query for future FDU dropdown

-- fetch all FDUs and their unit long name
query ProjectFundingTableFDUDropdown {
  ecapris_subproject_funding {
    id
    fdu
    unit_long_name
  }
}

UI query for importing one or many FDUs by eCAPRIS subproject id

-- fetch FDUs by eCAPRIS subproject id
query ProjectFundingTableFDUImport {
  ecapris_subproject_funding(where:{ecapris_subproject_id: {_eq: "13223.018"}}) {
    id
    fdu
    unit_long_name
  }
}

Ship list

mddilley and others added 30 commits October 30, 2025 17:55
updated_by_user_id: x-hasura-user-db-id
columns:
- dept_unit
- ecapris_funding_id
Copy link
Copy Markdown
Collaborator Author

@mddilley mddilley Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same column as fao_id in the eCAPRIS data and will be populated with the unique id that we can trace back to a specific FDU's data if needed. I'm torn about the rename but fao_id is hard to parse imo. I'd love thoughts!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does fao stand for?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what fao stands for but can ask. FSD just let us know that this is the unique id for FDUs in their DB. What I do know is that it always makes me think of FAO Schwarz 🧸 😆

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just curious. I also think of FAO Schwarz 🚂

- funding_source_id
- funding_status_id
- is_deleted
- is_editable
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed. All records in the moped_proj_funding table should be editable just like they are today since we aren't mixing with immutable eCAPRIS data in the same table.

Comment on lines +16 to +17
-- Index fdu since we'll be querying by it to avoid duplicates in the combined view (using NOT EXISTS)
CREATE INDEX idx_moped_proj_funding_fdu_not_deleted
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added many indexes that I paired with Claude about. Let me know if this jumps out as over-indexing to anyone but I included my comments of how I interpreted these benefiting us in queries and views needed for the UI.

I also sped-run potential performance impacts we could see on project_list_view since we aggregate funding information in it. Here are those details.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! these look great to me!


-- Create a combined_project_funding_view for the project funding UI to consume. This view combines
-- both moped_proj_funding and ecapris_subproject_funding data and removes duplicates based on FDU.
CREATE OR REPLACE VIEW combined_project_funding_view AS
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This view follows the same pattern that we used for notes to bring notes together in the UI.

Comment on lines +139 to +140
-- Disable Hasura triggers temporarily to allow direct updates to moped_proj_funding without generating activity log entries
DO $$
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this block to enable/disable triggers, but we've been looking for a way to do this besides SET session_replication_role = replica;/SET session_replication_role = default; since we ran into permissions issues with the moped_admin user.

I used the following SQL on the read replica to verify that moped_admin can operate on these triggers when applying the migration but please double-check me. 🙏

SELECT tableowner FROM pg_tables WHERE schemaname = 'public' AND tablename = 'moped_proj_funding';

The IF/ELSE logic is needed because I realized these triggers only exist when replicating from a production snapshot and ./hasura-cluster start breaks without the checks. We could avoid if DISABLE TRIGGER IF EXISTS was a Postgres thing but it isn't unfortunately. 🫠

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me—thanks for working around this and establishing a new pattern 🚀

I guess it's always worth double-checking—are you sure we don't want to have these changes in the activity log?

Copy link
Copy Markdown
Collaborator Author

@mddilley mddilley Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good question, and I did overlook the need for some initial activity for the activity log to show that sync is switched on when we roll this out cityofaustin/atd-data-tech#24689 🙏

I'm going to take another look at these since I was distracted with preventing clutter and reshuffling the project list view. Thanks!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helped me catch a bug with the update to switch should_sync_ecapris_funding to true for all projects with subproject ids. Since the metadata change needed to make the Hasura event trigger work applies after this migration, the statement below in the migration in this PR doesn't generate activity rows. I need to think about it more next week. 🤔

-- Switch on sync for projects with ecapris_subproject_id set
UPDATE moped_project SET should_sync_ecapris_funding = TRUE
WHERE ecapris_subproject_id IS NOT NULL;

Copy link
Copy Markdown
Collaborator Author

@mddilley mddilley Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After splitting eCAPRIS synced records into their own table, this script became more about creating a cached layer of all records available in the upstream dataset. This gives us the benefit of not needing to make Socrata API calls from the UI at all, but I am curious if anyone sees downsides to this.

FSD told us earlier this year that the eCAPRIS funding data that we are consuming is up-to-date as of EOB of the previous day. So, this ETL will only need to run once per day and after Charlie's FDU tagging ETL. Moped users will always have the latest and greatest at the start of the day.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. Love the short and sweet ETL code 🚀

Comment on lines +19 to +22
1. Dry run the script via:
```bash
docker compose run ecapris-funding -n
```
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h/t @frankhereford for pushing for dry run modes and motivating me to include here. Next step would be to parameterize in the DAG.

Copy link
Copy Markdown
Member

@johnclary johnclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Very exciting to see funding get the ecapris sync treatment. Thanks for excellent test instructions and documentation 🚀 🚢

Comment on lines +16 to +17
-- Index fdu since we'll be querying by it to avoid duplicates in the combined view (using NOT EXISTS)
CREATE INDEX idx_moped_proj_funding_fdu_not_deleted
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! these look great to me!

Comment thread moped-etl/ecapris-funding/Dockerfile Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool. Love the short and sweet ETL code 🚀

Comment on lines +139 to +140
-- Disable Hasura triggers temporarily to allow direct updates to moped_proj_funding without generating activity log entries
DO $$
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me—thanks for working around this and establishing a new pattern 🚀

I guess it's always worth double-checking—are you sure we don't want to have these changes in the activity log?

Comment thread moped-etl/ecapris-funding/docker-compose.yaml Outdated
@mddilley
Copy link
Copy Markdown
Collaborator Author

Hey all, late last week, I caught a bug with the order we need to make updates to the Hasura metadata that affects our ability to generate activity log entries to show that funding sync is turned on for projects. More here but TLDR: I split out launch day updates into their own issue and future migration.

Copy link
Copy Markdown
Member

@chiaberry chiaberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor and I think this will give us fewer headaches in the future!

Side note, I cannot figure out why, but every time I ran this I'd end up with an orphan container.

Comment thread moped-etl/ecapris-funding/Dockerfile Outdated

if args.dry_run:
logger.info(
f"[DRY RUN] Would upsert chunk of {len(chunk_payload)} funding records into Moped DB..."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the [DRY RUN] part of the logs a lot, hard to miss!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome - h/t @frankhereford for starting this pattern!

## Testing the script locally using Docker Compose

1. Ensure the local Moped stack is running with a current snapshot.
1. Configure an `env_file` according to the `env_template` example. Find the Socrata (ODP) secrets in the secret store entry called `Socrata Key ID, Secret, and Token`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is a place to put it, but I had to go searching for the mapping between the secrets in the 1pw entry and the env_file, since the names of the variables dont match with the entries. I had to look in airflow to remind me what goes where

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard agree here. Could the env_template values align with our keys in the pw manager?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, y'all! This was pure copypasta so I'll update to include the full word Socrata and the matching secret name like SOCRATA_ENDPOINT. Agree that this would reduce friction when getting started here. 🙏

updated_by_user_id: x-hasura-user-db-id
columns:
- dept_unit
- ecapris_funding_id
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does fao stand for?

Copy link
Copy Markdown
Contributor

@mateoclarke mateoclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great Mike. I'm less familiar with ETL testing in Moped but I appreciate the patience you've shown with your testing instructions, README, and console messages along the way.

This is probably user error on my part, but the mutation query didn't quite work as expected for me...

After running the mutation, I see 4 records, not 3. And only one of them shows "is_synced_from_ecapris": true. My guess is somehow I messed up the mutation or the state of my snapshot is weird.

Here is the full graphql response after mutation (I've added comments for emphasis):

{
  "data": {
    "combined_project_funding_view": [
      {
        "amount": 999,
        "fdu": "800J 2507 9719",
        "description": null,
        "fao_id": null,
        "status_name": "Set up",
        "is_synced_from_ecapris": false
      },
      {
        "amount": 498563,
        "fdu": null,
        "description": "800J 2507 9719\t", // <--- This \t is weird.
        "fao_id": null,
        "status_name": "Tentative",
        "is_synced_from_ecapris": false
      },
      {
        "amount": 100000,
        "fdu": null,
        "description": null,
        "fao_id": null,
        "status_name": "Tentative",
        "is_synced_from_ecapris": false
      },
      {
        "amount": 0,
        "fdu": "820B 2507 D382",
        "description": "Synced from eCAPRIS",
        "fao_id": 130162,
        "status_name": "Set up",
        "is_synced_from_ecapris": true // <--- y u true?
      }
    ]
  }
}

Again, this is probably me being naive and messing up the mutation somehow, but I wanted to share my experience just in case it helps.

Update from 10 minutes later: I just ran a fresh snapshot and it seems better but still a minor discrepancy. My first ProjectFundingTableQuerySyncOn query returns two records, not 3 as you assert. But after I run the mutation, 3 records return "is_synced_from_ecapris": false. I hope this doesn't confuse your testing.


if args.dry_run:
logger.info(
f"[DRY RUN] Would upsert chunk of {len(chunk_payload)} funding records into Moped DB..."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## Testing the script locally using Docker Compose

1. Ensure the local Moped stack is running with a current snapshot.
1. Configure an `env_file` according to the `env_template` example. Find the Socrata (ODP) secrets in the secret store entry called `Socrata Key ID, Secret, and Token`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard agree here. Could the env_template values align with our keys in the pw manager?


1. Ensure the local Moped stack is running with a current snapshot.
1. Configure an `env_file` according to the `env_template` example. Find the Socrata (ODP) secrets in the secret store entry called `Socrata Key ID, Secret, and Token`.
1. `docker compose build` to build the container.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm a dummy, but it would've unstuck me faster if there was a bullet after this like:
1. cd moped-etl/ecapris-funding.

I thought I had my credentials wrong or misnamed or misplaced the env_file

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry! i totally overlooked this comment. I'll take another look at this readme when I'm working on cityofaustin/atd-data-tech#25665. 🙏

@mddilley
Copy link
Copy Markdown
Collaborator Author

@mateoclarke thanks for testing and raising the flag on the counts in the steps not matching up. I'll keep in mind next time around since there could be changes in the prod Moped data over time which doesn't make having fixed counts like this in the steps very reliable. 🙏

@mddilley mddilley merged commit 7ff21e9 into mike/24593_fund_sync_db Nov 20, 2025
@mddilley mddilley deleted the mike/24674_fund_sync_etl branch November 20, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants