Skip to content

Add option to convert nulls for BQ keys #472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 10, 2021

Conversation

jamesmcm
Copy link
Contributor

Adds the convert-nulls option to convert nulls to "null" for use as keys in BigQuery diffs. This is very useful for comparing output
datasets when null is an expected value/category for a string key e.g. in datasets where null country codes might be expected for example.

From the code it seems this is already the behaviour in Avro diffs, it is only BQ diffs that would fail previously.

@idreeskhan idreeskhan self-requested a review August 30, 2021 14:19
Copy link
Contributor

@idreeskhan idreeskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Would like a few changes. Should also look into why the tests are failing and fix that if possible.

case (None, true) => "null"
case (None, false) => {
logger.error(
s"""Null value found for key: ${xs.mkString(".")}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably can reuse most of the existing avro code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also want to warn here? I was just thinking in our case it was like a valid category, so we had hundreds of thousands of rows like that (thus wanting to make it a command line argument before).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I think it makes sense to warn here as well, I'd rather err on the side of being too noisy first and course correct later if it's a problem

@codecov
Copy link

codecov bot commented Sep 7, 2021

Codecov Report

Merging #472 (bab0e87) into master (09f1654) will decrease coverage by 0.10%.
The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #472      +/-   ##
==========================================
- Coverage   70.55%   70.44%   -0.11%     
==========================================
  Files          35       35              
  Lines        1440     1445       +5     
  Branches      114      122       +8     
==========================================
+ Hits         1016     1018       +2     
- Misses        424      427       +3     
Flag Coverage Δ
ratatoolCli 3.17% <0.00%> (-0.02%) ⬇️
ratatoolCommon ?
ratatoolDiffy 30.82% <50.00%> (+0.03%) ⬆️
ratatoolExamples 18.79% <0.00%> (-0.08%) ⬇️
ratatoolSampling 62.01% <ø> (ø)
ratatoolScalacheck 81.81% <ø> (ø)
ratatoolShapeless 5.27% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...in/scala/com/spotify/ratatool/diffy/BigDiffy.scala 55.11% <50.00%> (-0.45%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09f1654...bab0e87. Read the comment docs.

@jamesmcm
Copy link
Contributor Author

jamesmcm commented Sep 7, 2021

Btw I had some issues with the formatting (and there are still some small differences vs. before - is there a scalafmt config for the repo?

And do you want me to squash the commits when it is ready to merge?

@idreeskhan
Copy link
Contributor

We don't have a scalafmt config right now but you can probably re-use the one from scio in the meantime, we follow the same style guide.

Squashing is preferred

@jamesmcm jamesmcm force-pushed the convertnulls branch 3 times, most recently from 85224de to aa27756 Compare September 9, 2021 09:53
Follows same logic as in avroKeyFn
@jamesmcm
Copy link
Contributor Author

jamesmcm commented Sep 9, 2021

Fixed, btw using the scalafmt config from the golden path:

version = "2.4.2"
maxColumn = 100

binPack.literalArgumentLists = true

continuationIndent {
  callSite = 2
  defnSite = 2
}

newlines {
  afterImplicitKWInVerticalMultiline = true
  beforeImplicitKWInVerticalMultiline = true
  sometimesBeforeColonInMethodReturnType = true
}

docstrings = JavaDoc

project.git = false

rewrite {
  rules = [PreferCurlyFors, RedundantBraces, RedundantParens, SortImports]
  redundantBraces.maxLines = 1
}

Resulted in a lot of changes though so for now I just fixed it manually.

@idreeskhan
Copy link
Contributor

Yea that's alright, I can address that separately afterwards

@idreeskhan idreeskhan merged commit bfead75 into spotify:master Sep 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants