Skip to content

Remove tune stat from steps and fix non-discarding of tuning draws from trace#8015

Merged
ricardoV94 merged 7 commits intopymc-devs:mainfrom
eclipse1605:remove-tune-sampler-stat
Mar 8, 2026
Merged

Remove tune stat from steps and fix non-discarding of tuning draws from trace#8015
ricardoV94 merged 7 commits intopymc-devs:mainfrom
eclipse1605:remove-tune-sampler-stat

Conversation

@eclipse1605
Copy link
Copy Markdown
Contributor

Description

I tried to make “tuning vs draws” a driver owned concept again. Right now, parts of sampling/postprocessing infer warmup length from a per-step "tune" sampler stat, which can get out of sync (e.g. a step method returning "tune": False everywhere makes PyMC think n_tune == 0, so warmup isn’t discarded and the logs look wrong).

Related Issues

Fixes: #7997
Context: #7776 (progressbar/stat refactor that exposed the mismatch)
Related discussion/attempts: #7730, #7721, #7724, #8014

@ricardoV94
Copy link
Copy Markdown
Member

@OriolAbril / @aloctavodia does any part of Arviz require the step samples to have a tune flag? Is it enough that we have warmup / posterior distinction, each with their number of draws?

@ricardoV94
Copy link
Copy Markdown
Member

ricardoV94 commented Dec 21, 2025

Taking a step back, would it make sense for a tune=None mode where the sampler(s) decide how much tune they need? In that case it would make sense for the individual steps to report back whether they're tuning or not.

Even if that's the case, I think it still makes sense to remove this currently useless stat and reintroduce in a separate PR (provided nobody finds a reason why it is actually useful/needed).

CC @aloctavodia, @lucianopaz @aseyboldt

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 21, 2025

Codecov Report

❌ Patch coverage is 90.32258% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.56%. Comparing base (9082a04) to head (e9c066b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pymc/backends/ndarray.py 50.00% 1 Missing ⚠️
pymc/sampling/parallel.py 0.00% 1 Missing ⚠️
pymc/sampling/population.py 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #8015      +/-   ##
==========================================
+ Coverage   84.55%   84.56%   +0.01%     
==========================================
  Files         124      124              
  Lines       19872    19866       -6     
==========================================
- Hits        16802    16799       -3     
+ Misses       3070     3067       -3     
Files with missing lines Coverage Δ
pymc/backends/arviz.py 96.04% <100.00%> (ø)
pymc/backends/base.py 88.26% <100.00%> (-0.44%) ⬇️
pymc/backends/mcbackend.py 99.28% <100.00%> (+0.02%) ⬆️
pymc/backends/zarr.py 93.87% <100.00%> (+0.05%) ⬆️
pymc/sampling/mcmc.py 90.61% <100.00%> (+0.27%) ⬆️
pymc/smc/sampling.py 96.55% <100.00%> (ø)
pymc/step_methods/compound.py 98.68% <100.00%> (+0.81%) ⬆️
pymc/step_methods/hmc/base_hmc.py 92.25% <ø> (ø)
pymc/step_methods/hmc/hmc.py 94.59% <ø> (ø)
pymc/step_methods/hmc/nuts.py 97.61% <ø> (ø)
... and 6 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@michaelosthege
Copy link
Copy Markdown
Member

Taking a step back, would it make sense for a tune=None mode where the sampler(s) decide how much tune they need? In that case it would make sense for the individual steps to report back whether they're tuning or not.

Automatically stopping the warmup early would be nice. I think we should agree on cleanly separated definitions of warmup, burn-in and tuning. Samplers not needing to tune parameters doesn't mean that there's no need for a warmup phase of burn-in iterations (however one might call it).

Our current implementation is bad because it doesn't separate the concepts.

@aloctavodia
Copy link
Copy Markdown
Member

ArviZ does not require or use a "tune" stats anywhere.

@eclipse1605
Copy link
Copy Markdown
Contributor Author

#7997 (comment)

Copy link
Copy Markdown
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like where this is going!

Using a slightly different naming I think we can simplify a bit more.

@eclipse1605
Copy link
Copy Markdown
Contributor Author

@michaelosthege does this make sense?

@eclipse1605
Copy link
Copy Markdown
Contributor Author

@michaelosthege check this out

Copy link
Copy Markdown
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with how the progress bar gets updated. Possibly my two comments on that matter are invalid, but please check them.

I'll also trigger the CI tests

Comment on lines -296 to -302
tune = mtrace._straces[0].get_sampler_stats("tune")
assert isinstance(tune, np.ndarray)
# warmup is tracked by the sampling driver
if discard_warmup:
assert tune.shape == (7, 3)
assert len(mtrace) == 7
else:
assert tune.shape == (12, 3)
pass
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this test remain as before, but using the in_warmup stat instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eclipse1605 this comment still sounds relevant though

@eclipse1605
Copy link
Copy Markdown
Contributor Author

I'm not familiar with how the progress bar gets updated. Possibly my two comments on that matter are invalid, but please check them.

hey, sorry for the delay but i think they're valid because warmup bookkeeping is now explicitly driver owned

@eclipse1605
Copy link
Copy Markdown
Contributor Author

@michaelosthege ive made the tests consistent with the changes, running the ci tests again will mostly pass now

Copy link
Copy Markdown
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Thanks @eclipse1605 for your endurance with this!

@eclipse1605
Copy link
Copy Markdown
Contributor Author

Looks good to me!

Thanks @eclipse1605 for your endurance with this!

thanks a ton for the reviews and guidance @michaelosthege and @ricardoV94, really appreciate the patience since im still getting my bearings here :)

test_dict = {
"posterior": ["u1", "n1"],
"sample_stats": ["~tune", "accept"],
"sample_stats": ["~in_warmup", "accept"],
Copy link
Copy Markdown
Member

@ricardoV94 ricardoV94 Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about changing the output variable name, this seems like a breaking change for users?

The specific line I pointed to may not be relevant. The general question is whether we changed anything in MultiTrace/InferenceData output with this PR other than the tune flag not existing per step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it now writes the warmup flag once as in_warmup, but for users, nothing new shows up. when we persist sampler stats (e.g. in mcbackend) we store that boolean and keep trace.get_sampler_stats("tune") working by aliasing to the new field. the default NDArray backend still omits both names, just like before. and to_inference_data continues to drop whichever warmup marker exists, so the resulting InferenceData matches main; the test only switches the "absent" check to the new internal name. no other MultiTrace/InferenceData variables changed.

Copy link
Copy Markdown
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sleek, I just want to do a manual integration test locally before merging

@eclipse1605
Copy link
Copy Markdown
Contributor Author

This looks sleek, I just want to do a manual integration test locally before merging

sounds good!

@eclipse1605
Copy link
Copy Markdown
Contributor Author

hey @ricardoV94 i tried to understand the failed test but didn't really get very far with it. is it failing because jax spits out NaNs when the dirichlet concentration is super skewed, so the multinomial never sees a clean prob vector?

@ricardoV94
Copy link
Copy Markdown
Member

That one fails now and then, don't worry about it

@eclipse1605
Copy link
Copy Markdown
Contributor Author

@eclipse1605 did you see the missed comment above about doing more minimal changes to the pre-existing test?

do you mean this

@ricardoV94
Copy link
Copy Markdown
Member

#8015 (comment)

@eclipse1605
Copy link
Copy Markdown
Contributor Author

i saw that, but i wanted a clarification as to whether we want to add an explicit tune alias assertion to preserve that compatibility

Comment on lines +304 to +305
assert all(len(s) == 7 for s in in_warmup)
assert all(not np.any(s) for s in in_warmup)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this in_warmup object we're seeing here? From the test alone I have a hard time figuring out. Is it a numpy array?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me why this changed, it seemed like we just moved the source of tune/warmup, not the final stored contents?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eclipse1605 ^ comment

@eclipse1605
Copy link
Copy Markdown
Contributor Author

@ricardoV94 any changes required in this?

@ricardoV94
Copy link
Copy Markdown
Member

I have a question about why the test changed, thought the output would still be the same. Also we merged another PR so this one now has conflicts that need to be solved. Let me know if you need help

@eclipse1605
Copy link
Copy Markdown
Contributor Author

as i said above, the test changed because the tune sampler stat has been removed right. so the warmup tracking is now handled by the sampling driver, and the backend no longer stores tune. earlier in the test, tune was retrieved using get_sampler_stats("tune"), and its shape was checked to verify the number of warmup and posterior samples.

the test asserted that tune was a NumPy array and checked its shape:

  • if discard_warmup was True, the shape was (7, 3).
  • if discard_warmup was False, the shape was (12, 3).

because warmup tracking is now managed directly by the sampling process, tune doesn't need to be stored in the backend, so the test cannot retrieve it. so instead of checking the shape of tune, it checks the length of the MultiTrace object (len(mtrace)) to determine the number of posterior samples.

let me know if that helps clarify things.

@eclipse1605
Copy link
Copy Markdown
Contributor Author

also, if we move on to merging this, ill likely need help fixing the merge conflicts

@jessegrabowski
Copy link
Copy Markdown
Member

also, if we move on to merging this, ill likely need help fixing the merge conflicts

I'm happy to help but I'd want to let #8047 go in first because it will change things around again.

@eclipse1605
Copy link
Copy Markdown
Contributor Author

also, if we move on to merging this, ill likely need help fixing the merge conflicts

I'm happy to help but I'd want to let #8047 go in first because it will change things around again.

sure, makes sense. let me know when we want to merge this given there are no more changes required :)

@michaelosthege michaelosthege force-pushed the remove-tune-sampler-stat branch from 4f537f0 to e9c066b Compare March 8, 2026 12:48
@read-the-docs-community
Copy link
Copy Markdown

@ricardoV94
Copy link
Copy Markdown
Member

Thanks @michaelosthege. The failing tests are a known issue. I'll try to fix them after, but need not block this PR.

@ricardoV94 ricardoV94 merged commit b41a3bd into pymc-devs:main Mar 8, 2026
37 of 42 checks passed
ricardoV94 pushed a commit to ricardoV94/pymc that referenced this pull request Mar 8, 2026
…from trace (pymc-devs#8015)

Co-authored-by: Michael Osthege <michael.osthege@outlook.com>
@michaelosthege
Copy link
Copy Markdown
Member

Thanks @michaelosthege. The failing tests are a known issue. I'll try to fix them after, but need not block this PR.

😅 I just fixed them. Will do another PR then ;)

ricardoV94 pushed a commit that referenced this pull request Mar 8, 2026
…from trace (#8015)

Co-authored-by: Michael Osthege <michael.osthege@outlook.com>
@ricardoV94 ricardoV94 mentioned this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: CategoricalGibbsMetropolis doesn't respect the tune parameter

5 participants