Skip to content

Conversation

@djudjuu
Copy link
Contributor

@djudjuu djudjuu commented Dec 10, 2025

  • test for bad behavior
  • fix
  • another test for the new mechanism

@djudjuu djudjuu linked an issue Dec 10, 2025 that may be closed by this pull request
@djudjuu djudjuu force-pushed the 3353-normalize-start_method-spawn-seems-to-ignore-environment-variables branch from e19526d to 0d50b31 Compare December 10, 2025 12:40
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Dec 10, 2025

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
docs 04ca239 Commit Preview URL Dec 18 2025, 09:27 AM

@djudjuu djudjuu force-pushed the 3353-normalize-start_method-spawn-seems-to-ignore-environment-variables branch from 0d50b31 to 85aa0f5 Compare December 10, 2025 12:42
@@ -0,0 +1,143 @@
"""Test that ConfigSectionContext is properly restored in spawned worker processes."""
Copy link
Contributor Author

@djudjuu djudjuu Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want this test? I wanted to have something that is closer to the actual change, whereas the other test is more of a regression test that fixes the exact issue

@djudjuu djudjuu requested a review from rudolfix December 10, 2025 13:12
@djudjuu djudjuu self-assigned this Dec 11, 2025
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code is good. tests are missing a case and some of them are not needed

),
ids=lambda x: x.name,
)
def test_normalize_compression_with_spawn_workers(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool test! but running for all destinations is an overkill. pls. move it to tests/pipeline, it does need load step.


# Check that normalized files are not compressed
load_storage = p._get_load_storage()
normalized_packages = load_storage.list_normalized_packages()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is already on the pipeline



@configspec
class SectionedTestConfig(BaseConfiguration):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to have those tests. pls. move them to test_runners.py. (tbh. worker init tests were missing altogether)

return config.test_value, section_ctx.sections


def test_config_section_context_restored_in_spawn_worker() -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually sections should be visible both in case of spawn and fork. so test both (parametrize)


# Store it in container
container = Container()
container[ConfigSectionContext] = section_context
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are also missing the case when there's not section context in container (do del on container[ConfigSectionContext]`)
you can parametrize this test to chekc that

@djudjuu
Copy link
Contributor Author

djudjuu commented Dec 16, 2025

all failing tests are :
image
but they work locally. is github ci somehow different?

@djudjuu
Copy link
Contributor Author

djudjuu commented Dec 16, 2025

In GitHub CI, the Python process typically has:

  • more active threads (pytest, coverage, logging, executor internals),
  • different import timing,
  • and lazier worker creation by ProcessPoolExecutor.

Forking in the presence of threads and lazy initialization is explicitly unsafe in Python. As a result, workers may be forked before the section context is set, or the container may be re-initialized in the worker process, causing the inherited ConfigSectionContext to be missing. This leads workers to fall back to the default (empty) sections, which is exactly what the failing test shows.

chatgpt-5

so maybe we set the pytest expectation for fork-method to fail and do a follow up issue?

are people running dlt-pipelines in github-actions? those would be affected...

Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at failed tests:

  • fork is not supported on windows. what's weird that creation of process pool is not failing. instead something else happens that skips passing the section context to your test. maybe somehow spawn is used? then obviously no context is passed (you think we do fork but system does spawn)
if start_method != "fork":
            ctx = Container()[PluggableRunContext]

you can see what happens if multiprocessing.get_context(method=start_method), force=True is passed

  1. another thing for fork:
    if you start the pool BEFORE adding section to container then obviously you get empty container in the fork - because the process forks when the pool starts

3, on mac the default method is spawn. so I think problem is same as windows. for some reason the start type is not really set

you need to debug carefully how pool is created

@djudjuu djudjuu force-pushed the 3353-normalize-start_method-spawn-seems-to-ignore-environment-variables branch from 58a797a to 55cbf17 Compare December 17, 2025 09:24
@rudolfix
Copy link
Collaborator

@djudjuu this looks good now. you can remove explicit start method from failing tests.

@djudjuu djudjuu force-pushed the 3353-normalize-start_method-spawn-seems-to-ignore-environment-variables branch from 00b7e85 to 04ca239 Compare December 18, 2025 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

normalize start_method spawn seems to ignore environment variables

3 participants