Skip to content

Conversation

@wagnerlmichael
Copy link
Member

@wagnerlmichael wagnerlmichael commented Feb 1, 2024

Adds sv_run_id to outputs from 00-ingest.R and 02-assess.R. This will make it easier for us to look back on a sale's outlier status in previous model runs. I believe these changes are consistent with #199.

I've double checked that this column persists through the following locations:

  • write_parquet(paths$input$training$local) in line 354 in 00-ingest.R
  • write_parquet(paths$output$assessment_pin$local) on line 570 in 02-assess.R

Want to double check with @dfsnow if I need to do something with the dvc lock file per @jeancochrane's advice.

Closes #201

@wagnerlmichael wagnerlmichael linked an issue Feb 1, 2024 that may be closed by this pull request
@wagnerlmichael wagnerlmichael marked this pull request as ready for review February 1, 2024 16:18
@wagnerlmichael wagnerlmichael marked this pull request as draft February 1, 2024 19:08
@wagnerlmichael wagnerlmichael changed the title Add sv_version to training data ingest Add sv_run_id to training data ingest Feb 1, 2024
@wagnerlmichael wagnerlmichael marked this pull request as ready for review February 1, 2024 22:36
Copy link
Member

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good to me! You'll indeed need to re-run ingest and update dvc.lock for the changes to the training data to be persisted. Happy to pair on that if it would be helpful (and we should wait for Dan to review this as well to double-check my understanding, I think).

meta_pin, meta_year,
meta_sale_price, meta_sale_date, meta_sale_document_num
meta_sale_price, meta_sale_date, meta_sale_document_num,
sv_outlier_type, sv_run_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nitpick, non-blocking] You can leave out the sv_outlier_type stuff since it's only pertinent to my PR (#199), one of us can just resolve merge conflicts if the other merges first!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a question for @dfsnow: Do we actually need sale_recent_{n}_run_id in the assessment data? I figure it's most pertinent in the training data, right?

Copy link
Contributor

@dfsnow dfsnow Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could always merge the training data back to the assessment data to get the run ID. Alright, @wagnerlmichael I think @jeancochrane is right, let's actually nix this from the assess stage. Sorry for the extra work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add sales val run_id to res model outputs

3 participants