Add macro for group_mean_continuity_check #4092

krivard · 2025-02-28T18:51:21Z

Overview

Three tables require a group_mean_continuity_check, which does the following:

Group by a specified column
Compute the mean of each group
Compute the percent change between successive groups
Check against a per-column threshold

This PR contains a draft for a possible implementation.

What did you change?

New macro, group_mean_continuity_check(group_column, max_pct_change)
Demo of macro in use for _core_eia923__cooling_system_information, using columns and thresholds from transform/eia923.py

Debatable design choices

This is written as a column-level check
- pros: macro source is simple (no loops); keeps all checks for a column together in the column entry
- cons: the grouping column and number of acceptable outliers is replicated in each data-test entry
- options: could rewrite as a table-level check with a list of columns and thresholds; see Add model/table-level macro for group_mean_continuity_check #4116

marianneke

I have a question: you mention pct change between successive groups. This suggests the group column is inherently ordered, right? Is this something you would generally expect to be the case?

I guess what I'm saying is: to me it is not intuitive that a "group" is always an ordered thing. Maybe this variable name should be changed to something that makes it clear that it is both a partition and something that has an inherent order to it

aesharpe · 2025-03-11T22:36:57Z

I'm inclined to say that this test should remain a column-level test (rather than a table test as in #4116). I think this because it doesn't have to do with the relationship between columns in a table or the relationship between a table and another table. For that reason, I think it makes sense to have tests that implicate a particular column be column level (unless, perhaps, it's something that implicates every single column in a table). But this to me feels like it should remain at the column-level.

krivard · 2025-03-12T14:48:30Z

@aesharpe even though 2 of the 3 arguments for the test are intended to be identical across all the column tests within a table? and if they ever change, we have to update N lines instead of just one?

I can live with that; I just want to make sure the impact is clear before we decide.

aesharpe · 2025-03-18T17:50:49Z

@aesharpe even though 2 of the 3 arguments for the test are intended to be identical across all the column tests within a table? and if they ever change, we have to update N lines instead of just one?

I can live with that; I just want to make sure the impact is clear before we decide.

Yes, this is good clarification. I think it's important to be able to customize things like max_fr_change and n_outliers_allowed to each column, hence, column level test :)

…a923__cooling_system_information

…_check. Also documentation.

…ration_maintenance

…ipment

…_equipment

krivard added dbt Issues related to the data build tool aka dbt data-validation Issues related to checking whether data meets our quality expectations. labels Feb 28, 2025

github-project-automation bot added this to Catalyst Megaproject Feb 28, 2025

github-project-automation bot moved this to New in Catalyst Megaproject Feb 28, 2025

krivard requested a review from marianneke February 28, 2025 18:51

krivard linked an issue Mar 4, 2025 that may be closed by this pull request

Migrate group_mean_continuity_check validation tests to dbt #4095

Open

marianneke reviewed Mar 6, 2025

View reviewed changes

krivard mentioned this pull request Mar 6, 2025

Add model/table-level macro for group_mean_continuity_check #4116

Closed

krivard self-assigned this Mar 10, 2025

krivard added 2 commits March 19, 2025 12:55

Add macro for group_mean_continuity_check, and demo in table _core_ei…

3219a92

…a923__cooling_system_information

Add max outliers and improve argument names for group_mean_continuity…

4c8c677

…_check. Also documentation.

krivard force-pushed the krivard/dbt-migrations_group_mean_continuity branch from e0da49a to 4c8c677 Compare March 19, 2025 16:55

krivard added 4 commits March 19, 2025 12:57

Move files to spec locations

fb39941

Add group_mean_continuity_check tests for table _core_eia923__fgd_ope…

e874862

…ration_maintenance

Add group_mean_continuity_check tests for table _core_eia860__fgd_equ…

0c438ae

…ipment

Add group_mean_continuity_check tests for table _core_eia860__cooling…

856027c

…_equipment

krivard mentioned this pull request Mar 25, 2025

Migrate group_mean_continuity_check validation tests to dbt #4095

Open

jdangerx moved this from New to Icebox in Catalyst Megaproject Jul 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add macro for group_mean_continuity_check #4092

Add macro for group_mean_continuity_check #4092

Uh oh!

krivard commented Feb 28, 2025 •

edited

Loading

Uh oh!

marianneke left a comment

Uh oh!

aesharpe commented Mar 11, 2025

Uh oh!

krivard commented Mar 12, 2025

Uh oh!

aesharpe commented Mar 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Add macro for group_mean_continuity_check #4092

Are you sure you want to change the base?

Add macro for group_mean_continuity_check #4092

Uh oh!

Conversation

krivard commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What did you change?

Debatable design choices

Uh oh!

marianneke left a comment

Choose a reason for hiding this comment

Uh oh!

aesharpe commented Mar 11, 2025

Uh oh!

krivard commented Mar 12, 2025

Uh oh!

aesharpe commented Mar 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

krivard commented Feb 28, 2025 •

edited

Loading