-
Notifications
You must be signed in to change notification settings - Fork 54
Add option to convert nulls for BQ keys #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! Would like a few changes. Should also look into why the tests are failing and fix that if possible.
ratatool-diffy/src/main/scala/com/spotify/ratatool/diffy/BigDiffy.scala
Outdated
Show resolved
Hide resolved
case (None, true) => "null" | ||
case (None, false) => { | ||
logger.error( | ||
s"""Null value found for key: ${xs.mkString(".")}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should match the existing behaviour with Avro. https://github.com/spotify/ratatool/blob/master/ratatool-diffy/src/main/scala/com/spotify/ratatool/diffy/BigDiffy.scala#L465
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably can reuse most of the existing avro code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you also want to warn here? I was just thinking in our case it was like a valid category, so we had hundreds of thousands of rows like that (thus wanting to make it a command line argument before).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think it makes sense to warn here as well, I'd rather err on the side of being too noisy first and course correct later if it's a problem
Codecov Report
@@ Coverage Diff @@
## master #472 +/- ##
==========================================
- Coverage 70.55% 70.44% -0.11%
==========================================
Files 35 35
Lines 1440 1445 +5
Branches 114 122 +8
==========================================
+ Hits 1016 1018 +2
- Misses 424 427 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Btw I had some issues with the formatting (and there are still some small differences vs. before - is there a scalafmt config for the repo? And do you want me to squash the commits when it is ready to merge? |
We don't have a scalafmt config right now but you can probably re-use the one from scio in the meantime, we follow the same style guide. Squashing is preferred |
85224de
to
aa27756
Compare
Follows same logic as in avroKeyFn
aa27756
to
bab0e87
Compare
Fixed, btw using the scalafmt config from the golden path:
Resulted in a lot of changes though so for now I just fixed it manually. |
Yea that's alright, I can address that separately afterwards |
Adds the
convert-nulls
option to convert nulls to "null" for use as keys in BigQuery diffs. This is very useful for comparing outputdatasets when
null
is an expected value/category for a string key e.g. in datasets wherenull
country codes might be expected for example.From the code it seems this is already the behaviour in Avro diffs, it is only BQ diffs that would fail previously.