Skip to content

Add delivery costs code and data #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
May 7, 2025
Merged

Add delivery costs code and data #4

merged 51 commits into from
May 7, 2025

Conversation

dfsnow
Copy link
Member

@dfsnow dfsnow commented Apr 21, 2025

This PR adds the initial queries, data cleaning, and analysis scripts for a 2025-04 national delivery rates analysis. The PR looks massive, but most of the line additions are actually lockfiles and other setup code. To make review easier, I've commented on sections or files which I think need the most attention.

There are still some outstanding plots I need to add to analysis.qmd, but I figure it's better to get the ball rolling on review than to put it off longer.

Questions for review

No need to answer these explicitly (although that would be helpful!), just some guiding questions for what kind of feedback would be helpful to me:

  • Are the methods sound? Do the filtering, cleaning, and aggregation steps make sense?
  • Is the code easy to understand? Is the project structure clear?
  • Does the code show an understanding of the subject i.e. domain expertise?

@dfsnow dfsnow requested a review from Copilot April 21, 2025 06:42
Copilot

This comment was marked as resolved.

@turquoisehealth turquoisehealth deleted a comment from Copilot AI Apr 28, 2025
Comment on lines +133 to +142
- **Case rate** - Negotiated dollar amount taken as-is (no transformation)
- **Percent of total billed charges** - Negotiated percentage is multiplied by
the DRG list price
- **Per diem** - Negotiated dollar amount is multiplied by CMS' geometric
mean length of stay (GLOS) for each DRG
- **Estimated allowed amount** - Used as a fallback if no other rate
types exist
- **Fee schedule** - Negotiated dollar amount taken as-is (no transformation)
- **Other** - Rate type isn't specified, but the negotiated dollar amount
is taken as-is, if it exists.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the cleaning steps for this project are adapted from techniques used for CLD.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to consider adding the case rate imputation we have in CLD https://cld.turquoise.health/components/imputations/tiers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayanajarian You mean this section? If so, definitely happy replicate it. I tried to make a "lite" version of CLD here just to keep things a little simpler.

@dfsnow dfsnow marked this pull request as ready for review April 28, 2025 18:21
Accidentally dropping cases where the negated logic returned null
dfsnow added 3 commits April 29, 2025 12:14
Missing condition in the WHERE clause removed a subset of valid rates
Copy link

@zfx0726 zfx0726 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, looks good so far!


The following additional data sources are used:

- Policy Reporter data, which is used to weight different payers when
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth mentioning that it's weighting payers by market share / # of covered lives, and why (ie to make sure rates are representative of rates that people would tend to see on their hospital bills.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically for Policy Reporter (and other reference datasets we pay for) we should make sure it's OK for us to include here (not sure if you've done that already @dfsnow). I don't anticipate any issues necessarily but it's worth a check. All of the other reference data below is public so shouldn't be an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking @mayanajarian. I started an ask about this in Slack here.

hp.hq_longitude AS lon,
hp.hq_latitude AS lat,
cmsq.hospital_overall_rating AS star_rating
FROM glue.hospital_data.hospital_rates AS hr
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to lock these in to a specific date eventually. Ie when we have a schema for the historical dated version of this data. Or just mention the as-of date.

-- per diem rates at 3x the Medicare "day rate" for the same DRG
WHEN hr.contract_methodology = 'per diem'
AND (
hr.negotiated_dollar < (hr.medicare_rate / drg.glos) * 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that these rates are at must 300% of medicare right? I'm not sure that's a fair assumption, given the rand study findings that on average commercial rates are 255% of Medicare prices (so presumably a decent proportion end up above 300%). That's based on claims data.

https://www.rand.org/health-care/projects/hospital-pricing.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted about this offline: goal here is to split out case rates that are incorrectly labeled as per diem, so it's kind of a delicate balance between false positives and negatives. I came up with 300% by eyeballing histograms of the per diem rates and Medicare rates, but definitely open to a better/more empirical boundary.


The following additional data sources are used:

- Policy Reporter data, which is used to weight different payers when

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically for Policy Reporter (and other reference datasets we pay for) we should make sure it's OK for us to include here (not sure if you've done that already @dfsnow). I don't anticipate any issues necessarily but it's worth a check. All of the other reference data below is public so shouldn't be an issue.

the following logic applies:

- If the provider has a PPO *and* an HMO plan with the same payer, take the
median of all PPO/HMO rates (drop the rest).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, CLD prioritizes PPO plans. I think it's reasonable to do that here. You could do a hierarchy like: PPO, HMO, everything else.

Comment on lines +133 to +142
- **Case rate** - Negotiated dollar amount taken as-is (no transformation)
- **Percent of total billed charges** - Negotiated percentage is multiplied by
the DRG list price
- **Per diem** - Negotiated dollar amount is multiplied by CMS' geometric
mean length of stay (GLOS) for each DRG
- **Estimated allowed amount** - Used as a fallback if no other rate
types exist
- **Fee schedule** - Negotiated dollar amount taken as-is (no transformation)
- **Other** - Rate type isn't specified, but the negotiated dollar amount
is taken as-is, if it exists.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to consider adding the case rate imputation we have in CLD https://cld.turquoise.health/components/imputations/tiers


1. Collapse negotiated rates across all revenue codes associated
with a provider-payer-plan-DRG combination, taking the mean of only
`NULL` revenue code rates first (if there are any).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd want to get sign-off that this approach makes sense - specifically, taking the mean of only NULL rev code rates as a first test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayanajarian For what it's worth, I snagged this approach from CLD.

)
)

# Drop rates related to exchange and indemnity plans, per Arian

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove "per Arian"

SELECT DISTINCT
cbsa,
npi
FROM redshift.reference.provider_demographics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customers may not have access to this exact table in this location - would be good to confirm that 1) they have access and 2) what the path is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayanajarian Got it. Is there an easy way to find a list of customer-facing tables and their paths? I don't see anything in Notion/ZenDesk. I imagine some of the other tables in this query have the same issue (hive.labps.quality_cms_hospital_ratings_v0).

@dfsnow dfsnow merged commit d825372 into main May 7, 2025
5 checks passed
@dfsnow dfsnow deleted the dfsnow/delivery-costs branch May 7, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants