Parallel pipelines can create entities in DB #2446

avishniakov · 2024-02-19T16:06:16Z

Describe changes

This PR solve a few parallelization issues we had:

save_artifact logic is improved, so it is now tolerant to parallel creation of Artifact and has a retry logic to create a new Artifact Versions for those without an explicit version name
Model Version creation is now also tolerant to parallel execution and equipped with retry logic to ensure that parallel runs do not get dumped into same Model Version
Both cases are loaded with heavy parallel test cases to proof it is sustainable on a long run
New MAX_RETRIES_FOR_VERSIONED_ENTITY_CREATION constant introduced and set to 10 reties for now with 0.2 seconds of cooldown growing by retry count (e.g. 0.2 * retry_num). 10 is quite empirical and might need some further tunning.

Tiny side improvement:

Decreased logging warning for Model config mismatch, because it doesn't make sense in production usage, based on user feedback and is only annoying ( OSSK-364 )

Pre-requisites

Please ensure you have done the following:

I have read the CONTRIBUTING.md document.
If my change requires a change to docs, I have updated the documentation accordingly.
I have added tests to cover my changes.
I have based my new branch on develop and the open PR is targeting develop. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.
If my changes require changes to the dashboard, these changes are communicated/requested.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Other (add details above)

Summary by CodeRabbit

New Features
- Enhanced artifact and model version creation with improved error handling and a retry mechanism.
- Implemented efficient pipeline registration and reuse functionality.
- Added a new test case for verifying parallel artifact registration in pipelines.
- Introduced a script for running pipelines with parallel steps and registering artifacts.
Bug Fixes
- Refined error handling for existing artifacts and model versions to prevent duplicate entries.
Refactor
- Streamlined artifact creation logic for better performance and reliability.
- Removed outdated model configuration comparison logic.
Tests
- Added integration tests for parallel model version creation and pipeline execution.

coderabbitai · 2024-02-19T16:07:10Z

Important

Auto Review Skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

The updates focus on enhancing the artifact and model versioning system in ZenML, introducing a retry mechanism for creation processes, and improving error handling for entity existence. Changes include the addition of a new constant for maximum retries, refactoring of artifact creation logic, and implementation of efficient registration and reuse strategies for pipelines. Additionally, integration tests have been expanded to cover parallel creation scenarios for models and pipelines, ensuring robustness in heavily parallelized environments.

Changes

File(s)	Summary
`src/zenml/.../utils.py`, `src/zenml/.../model.py`, `src/zenml/.../pipeline.py`, `src/zenml/.../sql_zen_store.py`	Enhanced artifact and model versioning with retry mechanisms and improved error handling. Introduced a new constant `MAX_RETRIES_FOR_VERSIONED_ENTITY_CREATION`.
`src/zenml/constants.py`	Added `MAX_RETRIES_FOR_VERSIONED_ENTITY_CREATION` constant.
`tests/integration/functional/model/test_model_version.py`, `tests/integration/functional/pipelines/test_pipeline_parallel.py`, `tests/integration/functional/pipelines/util_parallel_pipeline_script.py`	Expanded integration tests for parallel model and pipeline version creation, including artifact registration verification.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

avishniakov · 2024-02-19T16:08:24Z

@coderabbitai review

…il-to-create-artifacts

coderabbitai

Review Status

Actionable comments generated: 10

Configuration used: .coderabbit.yaml

Commits

Files that changed from the base of the PR and between dfe71ed and a88c99f.

Files selected for processing (8)

src/zenml/artifacts/utils.py (4 hunks)
src/zenml/constants.py (1 hunks)
src/zenml/model/model.py (5 hunks)
src/zenml/new/pipelines/pipeline.py (3 hunks)
src/zenml/zen_stores/sql_zen_store.py (4 hunks)
tests/integration/functional/model/test_model_version.py (5 hunks)
tests/integration/functional/pipelines/test_pipeline_parallel.py (1 hunks)
tests/integration/functional/pipelines/util_parallel_pipeline_script.py (1 hunks)

Additional comments: 15

tests/integration/functional/pipelines/util_parallel_pipeline_script.py (3)

8-10: The register_artifact step function is correctly defined with caching disabled, which is suitable for testing parallel executions where caching could interfere with the test's integrity. The return value simulates an artifact registration process.

13-16: The parallel_ pipeline function iterates over a range of steps_count and calls register_artifact for each iteration. This setup is appropriate for testing parallel executions of artifact registration. However, it's important to ensure that the steps_count and run_id parameters are correctly passed and used, especially in a parallel execution context.

19-21: The script execution entry point correctly parses command line arguments to extract run_prefix, i, and steps_count. It's crucial that these arguments are validated and correctly converted to their expected types (e.g., steps_count and i should be integers) to avoid runtime errors.

tests/integration/functional/pipelines/test_pipeline_parallel.py (1)

22-59: The test method test_parallel_runs_can_register_same_artifact is well-structured and follows a clear logic to test parallel artifact registration. It uses subprocesses to execute the pipeline script in parallel, which is a suitable approach for this test scenario. The assertions at the end of the test method are comprehensive, checking for the completion status of pipeline runs, the registration of all artifacts, their values, and unique versions. This thorough approach ensures that the parallel execution logic works as expected.

src/zenml/constants.py (1)

318-320: The introduction of MAX_RETRIES_FOR_VERSIONED_ENTITY_CREATION with a value of 10 is a sensible addition to handle parallelized tests for versioned entity creation. The comment "empirical value to pass heavy parallelized tests" provides context for the choice of value, though it might be beneficial to include more detail on how this value was determined or any specific scenarios it addresses.

tests/integration/functional/model/test_model_version.py (2)

14-14: The import of multiprocessing is necessary for the new test that validates parallel model version creation. This aligns with the PR's objective to improve parallel handling.

119-120: The function parallel_model_version_creation is introduced to simulate the parallel creation of model versions. It directly calls a method on the Model class to either get an existing model version or create a new one. This function is crucial for the new test that assesses the system's ability to handle parallel model version creation without conflicts or errors.

src/zenml/model/model.py (2)

16-16: The import of the time module is correctly added to support the sleep functionality used in the retry mechanism. This is a necessary addition for implementing delays between retries.

29-29: The import of MAX_RETRIES_FOR_VERSIONED_ENTITY_CREATION is correctly added and is essential for defining the maximum number of retries in the retry mechanism for creating model versions. This constant plays a crucial role in controlling the retry behavior.

src/zenml/artifacts/utils.py (5)

20-20: The import of the time module is correctly added to support the sleep functionality used in the retry mechanism for artifact version creation. This is a necessary addition for implementing delays between retries.

27-30: The addition of the MAX_RETRIES_FOR_VERSIONED_ENTITY_CREATION constant is correctly implemented. It's well-placed within the imports section, ensuring that it's available throughout the file. This constant is crucial for controlling the retry behavior in artifact and model version creation processes.

38-41: The inclusion of the EntityExistsError in the imports section is appropriate, given its usage in the updated save_artifact function to handle cases where an artifact version already exists. This change aligns with the PR's objective to improve error handling in parallel execution scenarios.

118-118: The documentation for the save_artifact function has been updated to include EntityExistsError under the Raises section. This accurately reflects the changes made to the function's implementation, ensuring that users are aware of the potential exceptions that can be raised.

248-250: Raising EntityExistsError when the artifact version creation fails after all retries is appropriate and aligns with the PR's objectives to improve error handling. This ensures that the caller is informed of the failure to create a unique artifact version, which is crucial in parallel execution environments.

src/zenml/new/pipelines/pipeline.py (1)

57-57: The import of EntityExistsError is correctly added to handle specific exceptions related to entity existence conflicts during pipeline registration. This aligns with the PR objectives of improving error handling for parallel operations.

tests/integration/functional/model/test_model_version.py

src/zenml/model/model.py

src/zenml/new/pipelines/pipeline.py

src/zenml/zen_stores/sql_zen_store.py

src/zenml/artifacts/utils.py

…il-to-create-artifacts

src/zenml/artifacts/utils.py

src/zenml/model/model.py

src/zenml/zen_stores/sql_zen_store.py

strickvl

LGTM. Let's let the CAB member know whenever it's merged.

bcdurak

Aside from a small nitpick (feel free to ignore), everything looks good!

src/zenml/new/pipelines/pipeline.py

* fix parallel artifacts registration * remove excessive warnings * parallel safe model versions * increase cool down a bit * coderabbitai * coderabbitai * update test signature * PR suggestions from Alex * kudos to windows * give some more retries for docker CIs * try to fix test case * fix parallel tests

avishniakov added 4 commits February 19, 2024 16:11

fix parallel artifacts registration

4647753

remove excessive warnings

d186731

parallel safe model versions

716a2c4

increase cool down a bit

a88c99f

avishniakov requested review from bcdurak and strickvl February 19, 2024 16:08

Merge branch 'develop' into bugfix/OSSK-438-parallel-pipelines-can-fa…

26e17f9

…il-to-create-artifacts

coderabbitai bot reviewed Feb 19, 2024

View reviewed changes

github-actions bot added internal To filter out internal PRs and issues bug Something isn't working labels Feb 19, 2024

avishniakov added 2 commits February 20, 2024 09:10

coderabbitai

93f9777

coderabbitai

f475b82

avishniakov added the run-slow-ci label Feb 20, 2024

avishniakov added 2 commits February 20, 2024 09:19

Merge branch 'develop' into bugfix/OSSK-438-parallel-pipelines-can-fa…

f295f90

…il-to-create-artifacts

update test signature

37d8aef

strickvl reviewed Feb 20, 2024

View reviewed changes

PR suggestions from Alex

87e1441

avishniakov requested a review from strickvl February 20, 2024 09:05

strickvl approved these changes Feb 20, 2024

View reviewed changes

kudos to windows

c1ea4d6

strickvl changed the title ~~Parallel pipelines can create entites in DB~~ Parallel pipelines can create entities in DB Feb 20, 2024

avishniakov added 3 commits February 20, 2024 13:41

give some more retries for docker CIs

8ea4e4c

try to fix test case

76998ef

fix parallel tests

9e46fca

bcdurak approved these changes Feb 21, 2024

View reviewed changes

src/zenml/new/pipelines/pipeline.py Show resolved Hide resolved

avishniakov merged commit 1ffe038 into develop Feb 21, 2024

avishniakov deleted the bugfix/OSSK-438-parallel-pipelines-can-fail-to-create-artifacts branch February 21, 2024 13:11

Parallel pipelines can create entities in DB #2446

Parallel pipelines can create entities in DB #2446

Uh oh!

Conversation

avishniakov commented Feb 19, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe changes

Pre-requisites

Types of changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto Review Skipped

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

Uh oh!

avishniakov commented Feb 19, 2024

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

strickvl left a comment

Choose a reason for hiding this comment

Uh oh!

bcdurak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

avishniakov commented Feb 19, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)