Add delivery costs code and data #4

dfsnow · 2025-04-21T06:37:47Z

This PR adds the initial queries, data cleaning, and analysis scripts for a 2025-04 national delivery rates analysis. The PR looks massive, but most of the line additions are actually lockfiles and other setup code. To make review easier, I've commented on sections or files which I think need the most attention.

~~There are still some outstanding plots I need to add to analysis.qmd, but I figure it's better to get the ball rolling on review than to put it off longer.~~

Questions for review

No need to answer these explicitly (although that would be helpful!), just some guiding questions for what kind of feedback would be helpful to me:

Are the methods sound? Do the filtering, cleaning, and aggregation steps make sense?
Is the code easy to understand? Is the project structure clear?
Does the code show an understanding of the subject i.e. domain expertise?

https://turquoise-co.slack.com/archives/C04PT9GNJ4A/p1744839291339909

This reverts commit a8e9934.

projects/2025_04_delivery_costs/README.md

projects/2025_04_delivery_costs/ingest.py

projects/2025_04_delivery_costs/analysis.qmd

dfsnow · 2025-04-28T17:56:18Z

projects/2025_04_delivery_costs/README.md

+- **Case rate** - Negotiated dollar amount taken as-is (no transformation)
+- **Percent of total billed charges** - Negotiated percentage is multiplied by
+  the DRG list price
+- **Per diem** - Negotiated dollar amount is multiplied by CMS' geometric
+  mean length of stay (GLOS) for each DRG
+- **Estimated allowed amount** - Used as a fallback if no other rate
+  types exist
+- **Fee schedule** - Negotiated dollar amount taken as-is (no transformation)
+- **Other** - Rate type isn't specified, but the negotiated dollar amount
+  is taken as-is, if it exists.


A lot of the cleaning steps for this project are adapted from techniques used for CLD.

We may want to consider adding the case rate imputation we have in CLD https://cld.turquoise.health/components/imputations/tiers

@mayanajarian You mean this section? If so, definitely happy replicate it. I tried to make a "lite" version of CLD here just to keep things a little simpler.

projects/2025_04_delivery_costs/analysis.qmd

projects/2025_04_delivery_costs/ingest.py

Accidentally dropping cases where the negated logic returned null

projects/2025_04_delivery_costs/ingest.py

projects/2025_04_delivery_costs/README.md

Missing condition in the WHERE clause removed a subset of valid rates

projects/2025_04_delivery_costs/analysis.qmd

projects/2025_04_delivery_costs/queries/rates.sql

zfx0726

Partial review, looks good so far!

projects/2025_04_delivery_costs/README.md

zfx0726 · 2025-05-01T04:29:13Z

projects/2025_04_delivery_costs/README.md

+
+The following additional data sources are used:
+
+- Policy Reporter data, which is used to weight different payers when


May be worth mentioning that it's weighting payers by market share / # of covered lives, and why (ie to make sure rates are representative of rates that people would tend to see on their hospital bills.

Specifically for Policy Reporter (and other reference datasets we pay for) we should make sure it's OK for us to include here (not sure if you've done that already @dfsnow). I don't anticipate any issues necessarily but it's worth a check. All of the other reference data below is public so shouldn't be an issue.

Good thinking @mayanajarian. I started an ask about this in Slack here.

projects/2025_04_delivery_costs/README.md

projects/2025_04_delivery_costs/analysis.qmd

zfx0726 · 2025-05-05T15:49:59Z

projects/2025_04_delivery_costs/queries/rates.sql

+    hp.hq_longitude AS lon,
+    hp.hq_latitude AS lat,
+    cmsq.hospital_overall_rating AS star_rating
+FROM glue.hospital_data.hospital_rates AS hr


Might be good to lock these in to a specific date eventually. Ie when we have a schema for the historical dated version of this data. Or just mention the as-of date.

projects/2025_04_delivery_costs/queries/rates.sql

zfx0726 · 2025-05-05T16:14:45Z

projects/2025_04_delivery_costs/queries/rates.sql

+        -- per diem rates at 3x the Medicare "day rate" for the same DRG
+        WHEN hr.contract_methodology = 'per diem'
+            AND (
+                hr.negotiated_dollar < (hr.medicare_rate / drg.glos) * 3


This assumes that these rates are at must 300% of medicare right? I'm not sure that's a fair assumption, given the rand study findings that on average commercial rates are 255% of Medicare prices (so presumably a decent proportion end up above 300%). That's based on claims data.

https://www.rand.org/health-care/projects/hospital-pricing.html

Chatted about this offline: goal here is to split out case rates that are incorrectly labeled as per diem, so it's kind of a delicate balance between false positives and negatives. I came up with 300% by eyeballing histograms of the per diem rates and Medicare rates, but definitely open to a better/more empirical boundary.

projects/2025_04_delivery_costs/queries/rates.sql

mayanajarian · 2025-05-05T21:58:29Z

projects/2025_04_delivery_costs/README.md

+
+The following additional data sources are used:
+
+- Policy Reporter data, which is used to weight different payers when


Specifically for Policy Reporter (and other reference datasets we pay for) we should make sure it's OK for us to include here (not sure if you've done that already @dfsnow). I don't anticipate any issues necessarily but it's worth a check. All of the other reference data below is public so shouldn't be an issue.

mayanajarian · 2025-05-05T22:01:59Z

projects/2025_04_delivery_costs/README.md

+the following logic applies:
+
+- If the provider has a PPO *and* an HMO plan with the same payer, take the
+  median of all PPO/HMO rates (drop the rest).


+1, CLD prioritizes PPO plans. I think it's reasonable to do that here. You could do a hierarchy like: PPO, HMO, everything else.

projects/2025_04_delivery_costs/README.md

mayanajarian · 2025-05-05T22:04:33Z

projects/2025_04_delivery_costs/README.md

+- **Case rate** - Negotiated dollar amount taken as-is (no transformation)
+- **Percent of total billed charges** - Negotiated percentage is multiplied by
+  the DRG list price
+- **Per diem** - Negotiated dollar amount is multiplied by CMS' geometric
+  mean length of stay (GLOS) for each DRG
+- **Estimated allowed amount** - Used as a fallback if no other rate
+  types exist
+- **Fee schedule** - Negotiated dollar amount taken as-is (no transformation)
+- **Other** - Rate type isn't specified, but the negotiated dollar amount
+  is taken as-is, if it exists.


We may want to consider adding the case rate imputation we have in CLD https://cld.turquoise.health/components/imputations/tiers

mayanajarian · 2025-05-05T22:07:58Z

projects/2025_04_delivery_costs/README.md

+
+1. Collapse negotiated rates across all revenue codes associated
+   with a provider-payer-plan-DRG combination, taking the mean of only
+   `NULL` revenue code rates first (if there are any).


I'd want to get sign-off that this approach makes sense - specifically, taking the mean of only NULL rev code rates as a first test

@mayanajarian For what it's worth, I snagged this approach from CLD.

projects/2025_04_delivery_costs/ingest.py

mayanajarian · 2025-05-05T22:24:17Z

projects/2025_04_delivery_costs/ingest.py

+    )
+)
+
+# Drop rates related to exchange and indemnity plans, per Arian


nit: remove "per Arian"

projects/2025_04_delivery_costs/queries/rates.sql

mayanajarian · 2025-05-05T22:28:20Z

projects/2025_04_delivery_costs/queries/rates.sql

+    SELECT DISTINCT
+        cbsa,
+        npi
+    FROM redshift.reference.provider_demographics


Customers may not have access to this exact table in this location - would be good to confirm that 1) they have access and 2) what the path is.

@mayanajarian Got it. Is there an easy way to find a list of customer-facing tables and their paths? I don't see anything in Notion/ZenDesk. I imagine some of the other tables in this query have the same issue (hive.labps.quality_cms_hospital_ratings_v0).

dfsnow added 25 commits April 8, 2025 12:41

Copy project template to delivery costs project

4c28b2e

Update deps

df21e39

Add initial provider pulls

20a9877

Add cleanup steps to ingest and query

40e5f26

Finalize rates pull

942ea13

Add clean hospital rates data with Census additions

7b5b115

Update deps, drop pandas, add duck

f0fa130

Finalize ingest script with travel times, per diem filter

475a081

Add initial R files

10cac30

Ignore .Rhistory files

c9d854f

Add initial plots

85ae064

Add Census insurance rates to ingest

f2a752f

Remove outliers using Rate Search methods

6ffa412

https://turquoise-co.slack.com/archives/C04PT9GNJ4A/p1744839291339909

Fix truncation when using untyped constants

a8e9934

Revert "Fix truncation when using untyped constants"

442992d

This reverts commit a8e9934.

Fix type issue for negotiated percentages

1b12539

Add APR-DRGs to rates sample

4fb2d69

Add zip and fertility plots

f02dc56

Fix payer stats join key

e9820bc

Ignore HTML and Quarto files

f80fec9

Snapshot all R dependencies

930da2b

Pull all rates, not just most common DRGs

a8bed42

Add maps, boxplots, fertility scatter

3fe875c

Merge branch 'main' into dfsnow/delivery-costs

9ea3e54

Drop analysis.py

802746c

dfsnow requested a review from Copilot April 21, 2025 06:42

This comment was marked as resolved.

Sign in to view

dfsnow added 3 commits April 22, 2025 10:52

Add ridgeline plots

503e998

Add ridgeline for severity/DRG

11916c4

Update ZIP weighting function and boxplot

e212c68

Add README

7f1f545

turquoisehealth deleted a comment from Copilot AI Apr 28, 2025

dfsnow commented Apr 28, 2025

View reviewed changes

dfsnow assigned mayanajarian Apr 28, 2025

dfsnow marked this pull request as ready for review April 28, 2025 18:21

mayanajarian requested review from zfx0726 and mayanajarian April 28, 2025 22:55

Fix regex / filter logic

0346e55

Accidentally dropping cases where the negated logic returned null

dfsnow commented Apr 28, 2025

View reviewed changes

projects/2025_04_delivery_costs/ingest.py Show resolved Hide resolved

dfsnow commented Apr 29, 2025

View reviewed changes

projects/2025_04_delivery_costs/README.md Outdated Show resolved Hide resolved

dfsnow added 3 commits April 29, 2025 12:14

Add MMR and NCHS plots

9a4e610

Style and lint file

1dd6051

Fix SQL error and update analysis/plots

209a30d

Missing condition in the WHERE clause removed a subset of valid rates

dfsnow commented Apr 30, 2025

View reviewed changes

projects/2025_04_delivery_costs/analysis.qmd Outdated Show resolved Hide resolved

dfsnow commented Apr 30, 2025

View reviewed changes

projects/2025_04_delivery_costs/queries/rates.sql Show resolved Hide resolved

Update R dependencies

0abeffc

zfx0726 reviewed May 1, 2025

View reviewed changes

projects/2025_04_delivery_costs/README.md Show resolved Hide resolved

zfx0726 reviewed May 5, 2025

View reviewed changes

Tweak plot titles, settings

6c62084

mayanajarian reviewed May 5, 2025

View reviewed changes

dfsnow added 7 commits May 5, 2025 20:21

Update README with PR suggestions

5a509f1

Fix PR nits

12ad28a

Update PR citations

fc29633

Drop negotiated rate max value

138bf1c

Drop unused datasets and Orlando

5bd93f7

Add data link

f6a0ae6

Update legend titles

9b3c79f

dfsnow merged commit d825372 into main May 7, 2025
5 checks passed

dfsnow deleted the dfsnow/delivery-costs branch May 7, 2025 14:03


		The following additional data sources are used:

		- Policy Reporter data, which is used to weight different payers when

Add delivery costs code and data #4

Add delivery costs code and data #4

Uh oh!

Conversation

dfsnow commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Questions for review

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zfx0726 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

dfsnow commented Apr 21, 2025 •

edited

Loading