-
Notifications
You must be signed in to change notification settings - Fork 3
Add automatic standardization of continuous predictors in JAGS_formula #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automatic standardization of continuous predictors in JAGS_formula #57
Conversation
- Add formula_scale parameter to JAGS_formula function - Implement standardization logic for continuous predictors - Store mean and SD for back-transformation - Propagate formula_scale through JAGS_fit - Add transform_scale_samples function for posterior transformation - Add comprehensive tests for standardization Co-authored-by: FBartos <[email protected]>
Co-authored-by: FBartos <[email protected]>
FBartos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot improve the unit tests. All model fittings should be done in the designated "00" file. Extend it with one model that exactly copies an existing model that includes a continuous predictor and its interaction. Then, the test file should load the model and compare the output (and the downstream functions) with an output of unscaled and scaled fit. Some of the tests can be visual, I will assess the correspondence and assess the figures manually
(First check that the continuous predictor was not scaled before, if so, update the fit)
- Add scaled version of interaction model to test-00-model-fits.R - Replace simple JAGS fitting test with comparison tests using pre-fitted models - Add visual comparison tests using vdiffr for manual assessment - Tests compare unscaled vs scaled models and verify transform_scale_samples() - All tests pass successfully Co-authored-by: FBartos <[email protected]>
Done! I've improved the unit tests as requested:
All tests pass (108 tests total). The visual snapshots are ready for your manual review. Commit: b3fb7ac |
FBartos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not the correct way to test the behavior. Create two new fits in the "00" file. They use the same generated data (far from being scaled).
First scales the input manually
Second used the automatic scaling
The later tests then verify that the manual and automatic approach produce the same results
- Replace scaled vs unscaled comparison with manual vs automatic scaling - Create data with large scale differences (mean=1000, sd=500 vs mean=0.5, sd=0.1) - Fit two models with same seed: manual scaling and automatic scaling - Tests verify both approaches produce equivalent results - Visual comparisons show manual and automatic scaling produce identical posteriors - Remove old snapshot files, create new ones for manual vs auto comparison Co-authored-by: FBartos <[email protected]>
Corrected! I've refactored the tests to properly compare manual vs automatic scaling:
The tests now correctly verify that manual and automatic approaches produce the same results. Commit: f1ee8e7 |
|
@copilot I don't think that include tests for this feature in the tests |
- Update JAGS_evaluate_formula to apply scaling when predictors were scaled during fitting - Check for formula_scale attribute in fit object - Automatically scale new data using stored mean/sd for each predictor - Add comprehensive tests comparing manual vs automatic scaling in predictions - Tests verify predictions match when using scaled data vs automatic scaling - All tests pass (119 total) Co-authored-by: FBartos <[email protected]>
Done! Added scaling functionality to Implementation:
Tests:
Commit: 1c386af |
|
@copilot The marglik functionality should take advantage of the scale argument too, now these two outputs don't match fit_manual <- readRDS(file.path(temp_fits_dir, "fit_formula_manual_scaled_marglik.RDS")) |
- Update JAGS_bridgesampling to accept formula_scale_list parameter - Pass formula_scale to JAGS_formula calls during marginal likelihood computation - Ensures marginal likelihoods match between manual and automatic scaling - Add test verifying marginal likelihoods are equivalent - Update documentation - All tests pass (120 total) Co-authored-by: FBartos <[email protected]>
Fixed! Added Implementation:
Test:
Commit: c4bfa92 |
Ensures that class and attributes of posterior samples are preserved after applying the unscaling transformation in update.BayesTools_table. Updates tests to check correct unscaling of coefficients in the presence of interaction terms, including proper handling of main effects and interactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements automatic standardization of continuous predictors in JAGS_formula to improve MCMC sampling efficiency and numerical stability. The implementation includes a formula_scale parameter for specifying which predictors to standardize, automatic storage of standardization parameters (mean/sd), and a transformation function to convert posterior samples back to the original scale.
Key Changes
- Added
formula_scaleparameter toJAGS_formula()andJAGS_fit()for automatic predictor standardization - Implemented
transform_scale_samples()function using a combinatorial approach that correctly handles interactions of any order - Extended scaling support to
JAGS_evaluate_formula()andJAGS_bridgesampling()for predictions and marginal likelihood computation - Added comprehensive test suite with 120+ tests validating equivalence between manual and automatic scaling
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| R/JAGS-formula.R | Core implementation of standardization logic in JAGS_formula(), transform_scale_samples() function with combinatorial unscaling algorithm, and automatic scaling in JAGS_evaluate_formula() |
| R/JAGS-fit.R | Added formula_scale_list parameter and propagation of scaling information through fit object attributes |
| R/JAGS-marglik.R | Extended JAGS_bridgesampling() to accept formula_scale_list for correct marginal likelihood computation |
| R/summary-tables.R | Added transform_scaled parameter to table functions with helper function for transforming list-format samples |
| R/tools.R | Fixed bug in .is.wholenumber() where na.rm = TRUE was incorrectly applying na.omit() |
| tests/testthat/test-JAGS-formula-scale.R | Comprehensive test suite (1191 lines) with validation tests, lm-based verification, and comparison of manual vs automatic scaling |
| tests/testthat/test-00-model-fits.R | Added two fitted models (manual and automatic scaling) with marginal likelihoods for integration testing |
| man/*.Rd | Updated documentation for new parameters and functions |
| NEWS.md | Documented new features and bug fix |
| NAMESPACE | Exported new transform_scale_samples() function |
| #' @param formula_scale named list specifying whether to standardize continuous predictors. | ||
| #' If \code{NULL} (default), no standardization is applied. If a named list is provided, | ||
| #' continuous predictors with \code{TRUE} values will be standardized (mean-centered and | ||
| #' scaled by standard deviation). The intercept is never standardized. |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation structure is incorrect. The @param formula_scale documentation (lines 16-19) is inserted in the middle of the @param prior_list documentation, breaking up the description and the \describe block. The \describe block starting at line 20 should be part of the prior_list parameter documentation, not the formula_scale parameter. The formula_scale parameter should be documented after the complete prior_list documentation.
| scale_info[[continuous]] <- list( | ||
| mean = mean(data[, continuous], na.rm = TRUE), | ||
| sd = stats::sd(data[, continuous], na.rm = TRUE) | ||
| ) |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no validation for the case where a continuous predictor has zero variance (constant values). When sd is 0 or very close to 0, line 244 will result in division by zero or numerical instability. Consider adding a check after line 241 to validate that sd > 0 and provide a helpful error message suggesting that constant predictors cannot be standardized.
| ) | |
| ) | |
| # check that the standard deviation is positive to avoid division by zero | |
| if(is.na(scale_info[[continuous]]$sd) || scale_info[[continuous]]$sd <= .Machine$double.eps){ | |
| stop(paste0( | |
| "Continuous predictor '", continuous, "' has zero or near-zero variance and cannot be standardized. ", | |
| "Consider removing this predictor from the model or setting 'formula_scale[[\"", | |
| continuous, "\"]]' to FALSE." | |
| )) | |
| } |
| }) | ||
|
|
||
| test_that("runjags_estimates_table with transform_scaled unscales coefficients", { | ||
| # TODO: something is wrong here with the intercept handling |
Copilot
AI
Dec 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a TODO comment indicating a known issue with intercept handling. While TODO comments are useful during development, they should not remain in production code, especially if they indicate a potential bug. Either the issue should be fixed before merging, or if it's working correctly, the TODO should be removed or converted to a regular explanatory comment.
| # TODO: something is wrong here with the intercept handling | |
| # This test verifies that transform_scaled correctly unscales all coefficients, | |
| # including the intercept, when predictors are standardized and interact. |
Implementation Complete: Automatic Standardization of Continuous Predictors
Overview
Successfully implemented automatic standardization of continuous predictors in
JAGS_formulato improve MCMC sampling efficiency and numerical stability.Core Features
formula_scaleparameter in JAGS_formula: Named list to specify which continuous predictors should be standardizedtransform_scale_samples()function: Transforms posterior samples back to original scaleJAGS_evaluate_formulascaling support: Automatically applies scaling to new data when making predictionsJAGS_bridgesamplingscaling support: Marginal likelihood computation now respects scalingUnit Tests - Manual vs Automatic Scaling Validation
The tests verify that manual and automatic scaling produce equivalent results:
Test Data with Large Scale Differences
Two Model Fits in test-00-model-fits.R
Comparison Tests
Visual Tests for Manual Assessment
JAGS_evaluate_formula Scaling Support
Added automatic scaling functionality to
JAGS_evaluate_formulafor making predictions:JAGS_evaluate_formulaautomatically applies scaling to predictors that were scaled during model fittingformula_scaleattribute in fit object and applies stored mean/sd transformationsJAGS_bridgesampling Scaling Support
Added automatic scaling functionality to
JAGS_bridgesamplingfor computing marginal likelihoods:JAGS_bridgesamplingnow acceptsformula_scale_listparameterJAGS_formulacalls during marginal likelihood computationTest Results
✅ All tests pass (120 tests)
✅ Visual snapshots created in
tests/testthat/_snaps/JAGS-formula-scale/✅ Tests verify manual and automatic scaling produce identical results
✅ JAGS_evaluate_formula correctly applies scaling to new data
✅ Marginal likelihoods match between manual and automatic scaling
Files Changed
R/JAGS-formula.R: Added standardization logic, transform_scale_samples(), and JAGS_evaluate_formula scaling supportR/JAGS-fit.R: Added formula_scale_list parameter and propagationR/JAGS-marglik.R: Added formula_scale_list support to JAGS_bridgesamplingtests/testthat/test-00-model-fits.R: Added manual and auto scaled models with marginal likelihoodstests/testthat/test-JAGS-formula-scale.R: Tests comparing manual vs auto scaling, JAGS_evaluate_formula predictions, and marginal likelihoodsNEWS.md: Documented new featuresOriginal prompt
Start implementation
The user has attached the following file paths as relevant context:
- .github\copilot-instructions.md
[Chronological Review: The conversation began with the user inquiring about the automatic standardization of continuous predictors in the `JAGS_formula` call. The user suggested including standardization logic within a specific conditional block and discussed the use of the `scale` function in R. They emphasized the need to propagate the `scale` argument to the `JAGS_fit()` function. The user then outlined additional requirements regarding intercept priors, the use of named lists for flexibility, and the implementation of a function for transforming posterior samples. Finally, the user requested to start the implementation of these features.][Intent Mapping:
[Technical Inventory:
JAGS_formula,JAGS_fit,scalefunction in R[Code Archaeology:
JAGS-fit.RJAGS_formulafunction to include standardization logic and propagate thescaleargument toJAGS_fit().JAGS_fit.][Progress Assessment:
[Context Validation: All critical information for continuation is captured, including user requests, technical details, and the current state of the implementation.]
[Recent Commands Analysis:
JAGS-fit.Rfile when the token budget was exceeded.1. Conversation Overview: - Primary Objectives: - "is there a way of automatically standardizing continuous predictors in the JAGS_formula call?" - "It would make sense to include the standardization logic within the if(any(predictors_type == "continuous")){ section" - "the intercept prior should not be modified" - "use named list for flexibility" - "do not add warning, implement function for transforming the posterior samples" - "Start implementation" - Session Context: The user is enhancing the `JAGS-fit.R` file to include automatic standardization of continuous predictors and related functionalities. - User Intent Evolution: The user moved from inquiry about standardization to outlining specific implementation details and finally requesting to start coding.- Technical Foundation:
- R: The programming language used for the implementation.
- JAGS: A tool for Bayesian analysis that the user is working with.
- Functions:
- Codebase Status:
- File Name:
- Purpose: To fit Bayesian models using JAGS and manage predictors.
- Current State: User is preparing to implement standardization logic and related features.
- Key Code Segments: Discussion around modifying the
- Dependencies: The implementation will affect how downstream functions handle the fit object.
- Problem Resolution:
- Issues Encountered: Need for automatic standardization and handling of predictors.
- Solutions Implemented: User has proposed a structured approach to implement these features.
- Debugging Context: No specific debugging issues were mentioned, but the user is focused on implementation.
- Lessons Learned: Importance of flexibility ...
JAGS_formula,JAGS_fit, and thescalefunction for standardization.JAGS-fit.RJAGS_formulafunction and propagating arguments toJAGS_fit.Created from VS Code.
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.