This repository was archived by the owner on May 17, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 281
Refactor Artifacts Parser to be Native so it's less brittle with each dbt version change #688
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
dlawin
reviewed
Aug 31, 2023
Approach looks great, this will simplify things in an elegant way, appreciate it @sungchun12 |
dlawin
approved these changes
Sep 12, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, just need to remove the submodule before we merge
dlawin
reviewed
Sep 12, 2023
sungchun12
added a commit
that referenced
this pull request
Oct 2, 2023
… dbt version change (#688) * helpful notes for sung * v1 of native run results parser * remove debug comments * remove from import * Update data_diff/dbt_parser.py Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update data_diff/dbt_parser.py Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * remove an import * remove another print * add schema validation for specific fields * stricter validation * replaced manifest parser with native one * Apply suggestions from code review for spacing Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Apply suggestions from code review for double quotes Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * create space * Apply suggestions from code review for more formatting Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * add more necessary fields * something to think through * better type hints * remove comment * separation of duties * remove mock call * draft unit tests * first draft of unit tests * passing tests * more pythonic * remove nested git repo * require name * add strictness * black formatting * reduce scope of changes * fix imports * update patches * fix mocking * fix test failure * fix mock tests * remove submodule * update toml * remove submodule again * add pydantic back in --------- Co-authored-by: Sung Won Chung <sung@datafold.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Dan Lawin <daniel@datafold.com>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem: Each time a new dbt version is released, it results in running fire drills to fix breaking changes to the dbt integration specifically because of version upgrades to
run_results.json
andmanifest.json
. This is pronounced becausedata-diff
relies on a solo-developer maintained package to parse the json docs into python objects. https://github.com/yu-iskw/dbt-artifacts-parserSolution: Build native json/dictionary parsing within
data-diff
and focus on stable fields so that new version upgrades are immaterial to howdata-diff
interacts with dbt.Technical Approach
I create native pydantic schema enforcers for both the
manifest.json
andrun_results.json
with only the fields necessary fordata-diff
to work properly. The fieldsdata-diff
reads in are relatively stable across artifact schema changes and can handle malformed artifact schemas if they're manipulated by a process outside of dbt. This makes this piece of the codebase lighter weight as thedbt-artifacts-parser
enforces more artifact schema validation than needed. When future bugs arise because of breaking changes to the artifact schemas, we can simply adjust the native schema validator indata-diff
vs. relying on the solo-developer maintained package.Testing Approach
I installed multiple versions of
dbt-snowflake
anddbt-core
to producemanifest.json
andrun_results.json
artifacts for version>=1.0.0
. dbt versions<1.0.0
are no longer supported by dbt Labs, so they are not tested. Testing project: https://github.com/datafold/datafold-demo-sungNote: Only one version of
run_results.json
is maintained by dbt Labs which is v4.