Skip to content

Fix filepath wildcard for avro/parquet #564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 20, 2022

Conversation

steveroy0226
Copy link
Contributor

Below changes have been tested and confirmed to work. Also proposing alternative approach since I'm not a big fan of guessing whether we are dealing with Avro vs Parquet:

Approach (1)

  • This PR, minimal change. Basically checking to see if the matching files has the relevant extensions and routing it to either AvroSampler or ParquetSample

Approach (2)

  • Not in this PR, but I think we should create a new arg called mode (which can be either parquet, avro or bigquery) and have that be a required field. It will be a breaking change, but then it will help avoid guessing which Sampler to use.

@codecov
Copy link

codecov bot commented Jul 18, 2022

Codecov Report

Merging #564 (670b28c) into master (1d816bd) will increase coverage by 0.32%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #564      +/-   ##
==========================================
+ Coverage   70.34%   70.66%   +0.32%     
==========================================
  Files          40       40              
  Lines        1703     1708       +5     
  Branches      150      146       -4     
==========================================
+ Hits         1198     1207       +9     
+ Misses        505      501       -4     
Flag Coverage Δ
ratatoolCli 3.07% <0.00%> (-0.02%) ⬇️
ratatoolCommon ?
ratatoolDiffy 29.89% <0.00%> (-0.11%) ⬇️
ratatoolExamples 15.96% <0.00%> (-0.06%) ⬇️
ratatoolSampling 62.57% <100.00%> (+0.57%) ⬆️
ratatoolScalacheck 81.53% <ø> (ø)
ratatoolShapeless 4.41% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ala/com/spotify/ratatool/samplers/BigSampler.scala 80.41% <100.00%> (+5.41%) ⬆️

@benkonz
Copy link
Contributor

benkonz commented Jul 18, 2022

I'm a fan of adding an optional parameter called --input-mode, similar to how we do it in BigDiffy https://github.com/spotify/ratatool/tree/master/ratatool-diffy#usage, just to keep the names consistent. In general, I'm not a fan of "guessing" what the input type is, but I also don't want to break existing users' workflows, so adding an optional parameter that we use, and inferring the input type if that parameter isn't there would work for me!

@steveroy0226
Copy link
Contributor Author

I'm a fan of adding an optional parameter called --input-mode, similar to how we do it in BigDiffy https://github.com/spotify/ratatool/tree/master/ratatool-diffy#usage, just to keep the names consistent. In general, I'm not a fan of "guessing" what the input type is, but I also don't want to break existing users' workflows, so adding an optional parameter that we use, and inferring the input type if that parameter isn't there would work for me!

@benkonz, I like the approach. I will create a ticket to add --input-mode. I agree that we shouldn't break existing users' workflows.

Copy link
Contributor

@idreeskhan idreeskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the one typo

input match {
case avroPath if input.endsWith("avro") =>
case avroPath if fileNames.exists(_.endsWith("avro")) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we log which one we resolve it as? might help us later on

@steveroy0226 steveroy0226 merged commit 1e25fa7 into master Jul 20, 2022
@steveroy0226 steveroy0226 deleted the sroyce/support-wildcard-extensions branch July 20, 2022 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants