Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pipeline/00-ingest.R
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ training_data <- dbGetQuery(
sale.buyer_name AS meta_sale_buyer_name,
sale.sv_is_outlier,
sale.sv_outlier_type,
sale.sv_run_id,
res.*
FROM model.vw_card_res_input res
INNER JOIN default.vw_pin_sale sale
Expand Down
20 changes: 18 additions & 2 deletions pipeline/02-assess.R
Original file line number Diff line number Diff line change
Expand Up @@ -362,15 +362,31 @@ sales_data_ratio_study <- sales_data %>%
sales_data_two_most_recent <- sales_data %>%
distinct(
meta_pin, meta_year,
meta_sale_price, meta_sale_date, meta_sale_document_num
meta_sale_price, meta_sale_date, meta_sale_document_num,
sv_outlier_type, sv_run_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nitpick, non-blocking] You can leave out the sv_outlier_type stuff since it's only pertinent to my PR (#199), one of us can just resolve merge conflicts if the other merges first!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a question for @dfsnow: Do we actually need sale_recent_{n}_run_id in the assessment data? I figure it's most pertinent in the training data, right?

Copy link
Contributor

@dfsnow dfsnow Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could always merge the training data back to the assessment data to get the run ID. Alright, @wagnerlmichael I think @jeancochrane is right, let's actually nix this from the assess stage. Sorry for the extra work.

) %>%
rename(
meta_sale_outlier_type = sv_outlier_type,
meta_sale_sv_run_id = sv_run_id
) %>%
mutate(
meta_sale_outlier_type = ifelse(
meta_sale_outlier_type == "Not outlier", NA, meta_sale_outlier_type
)
) %>%
group_by(meta_pin) %>%
slice_max(meta_sale_date, n = 2) %>%
mutate(mr = paste0("sale_recent_", row_number())) %>%
tidyr::pivot_wider(
id_cols = meta_pin,
names_from = mr,
values_from = c(meta_sale_date, meta_sale_price, meta_sale_document_num),
values_from = c(
meta_sale_date,
meta_sale_price,
meta_sale_document_num,
meta_sale_outlier_type,
meta_sale_sv_run_id
),
names_glue = "{mr}_{gsub('meta_sale_', '', .value)}"
) %>%
select(meta_pin, contains("1"), contains("2")) %>%
Expand Down