Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 27, 2025

Implementation Complete: Automatic Standardization of Continuous Predictors

Overview

Successfully implemented automatic standardization of continuous predictors in JAGS_formula to improve MCMC sampling efficiency and numerical stability.

Core Features

  • formula_scale parameter in JAGS_formula: Named list to specify which continuous predictors should be standardized
  • Automatic scaling: Stores mean and SD for each standardized predictor for back-transformation
  • transform_scale_samples() function: Transforms posterior samples back to original scale
  • JAGS_evaluate_formula scaling support: Automatically applies scaling to new data when making predictions
  • JAGS_bridgesampling scaling support: Marginal likelihood computation now respects scaling
  • Seamless integration: Propagates through JAGS_fit workflow

Unit Tests - Manual vs Automatic Scaling Validation

The tests verify that manual and automatic scaling produce equivalent results:

  1. Test Data with Large Scale Differences

    • x_cont1: mean=1000, sd=500 (large scale)
    • x_cont2: mean=0.5, sd=0.1 (small scale)
    • Both predictors include interaction term
    • Data far from being standardized to properly test scaling behavior
  2. Two Model Fits in test-00-model-fits.R

    • fit_formula_manual_scaled: Data manually scaled before fitting
    • fit_formula_auto_scaled: Same data with automatic scaling via formula_scale_list
    • Both fitted with same seed for direct comparison
    • Marginal likelihoods computed for both models
  3. Comparison Tests

    • Verify scaling parameters match exactly (tolerance: 1e-10)
    • Compare posterior means and SDs (tolerance: 0.05)
    • Tests confirm manual and automatic approaches produce equivalent results
    • Verify downstream functions work correctly (summary, JAGS_estimates_table)
    • JAGS_evaluate_formula tests: Verify automatic scaling is applied to new data during prediction
    • Marginal likelihood tests: Verify marginal likelihoods match between manual and automatic scaling
  4. Visual Tests for Manual Assessment

    • 4 comparison plots using vdiffr (manual vs automatic):
      • x_cont1 coefficient
      • x_cont2 coefficient
      • x_cont1:x_cont2 interaction term
      • intercept
    • Visual snapshots show overlapping posteriors, confirming equivalence

JAGS_evaluate_formula Scaling Support

Added automatic scaling functionality to JAGS_evaluate_formula for making predictions:

  • Automatic Scaling in Predictions: When evaluating formulas on new data, JAGS_evaluate_formula automatically applies scaling to predictors that were scaled during model fitting
  • Implementation: Checks for formula_scale attribute in fit object and applies stored mean/sd transformations
  • User Benefit: Users can pass unscaled data when making predictions; scaling is handled automatically based on the model's stored scaling information
  • Tests: Comprehensive tests verify that predictions match between manually scaled data and automatic scaling

JAGS_bridgesampling Scaling Support

Added automatic scaling functionality to JAGS_bridgesampling for computing marginal likelihoods:

  • Marginal Likelihood with Scaling: JAGS_bridgesampling now accepts formula_scale_list parameter
  • Implementation: Passes scaling information to internal JAGS_formula calls during marginal likelihood computation
  • User Benefit: Marginal likelihoods are computed correctly for scaled models; manual and automatic scaling produce matching results
  • Tests: Tests verify marginal likelihoods match between manual and automatic scaling approaches

Test Results

✅ All tests pass (120 tests)
✅ Visual snapshots created in tests/testthat/_snaps/JAGS-formula-scale/
✅ Tests verify manual and automatic scaling produce identical results
✅ JAGS_evaluate_formula correctly applies scaling to new data
✅ Marginal likelihoods match between manual and automatic scaling

Files Changed

  • R/JAGS-formula.R: Added standardization logic, transform_scale_samples(), and JAGS_evaluate_formula scaling support
  • R/JAGS-fit.R: Added formula_scale_list parameter and propagation
  • R/JAGS-marglik.R: Added formula_scale_list support to JAGS_bridgesampling
  • tests/testthat/test-00-model-fits.R: Added manual and auto scaled models with marginal likelihoods
  • tests/testthat/test-JAGS-formula-scale.R: Tests comparing manual vs auto scaling, JAGS_evaluate_formula predictions, and marginal likelihoods
  • NEWS.md: Documented new features
Original prompt

Start implementation

The user has attached the following file paths as relevant context:

  • .github\copilot-instructions.md
[Chronological Review: The conversation began with the user inquiring about the automatic standardization of continuous predictors in the `JAGS_formula` call. The user suggested including standardization logic within a specific conditional block and discussed the use of the `scale` function in R. They emphasized the need to propagate the `scale` argument to the `JAGS_fit()` function. The user then outlined additional requirements regarding intercept priors, the use of named lists for flexibility, and the implementation of a function for transforming posterior samples. Finally, the user requested to start the implementation of these features.]

[Intent Mapping:

  1. "is there a way of automatically standardizing continuous predictors in the JAGS_formula call?" - User seeks a method for automatic standardization.
  2. "It would make sense to include the standardization logic within the if(any(predictors_type == "continuous")){ section" - User suggests a specific implementation approach.
  3. "the intercept prior should not be modified" - User specifies a requirement regarding intercept priors.
  4. "use named list for flexibility" - User requests a flexible approach for handling standardization.
  5. "do not add warning, implement function for transforming the posterior samples" - User wants to avoid warnings and implement a transformation function.
  6. "Start implementation" - User initiates the coding phase for the discussed features.]

[Technical Inventory:

  • R programming language
  • JAGS (Just Another Gibbs Sampler) for Bayesian analysis
  • Functions: JAGS_formula, JAGS_fit, scale function in R
  • Concepts: Standardization of predictors, handling of posterior samples, named lists for argument flexibility.]

[Code Archaeology:

  • File: JAGS-fit.R
  • Recent focus on modifying the JAGS_formula function to include standardization logic and propagate the scale argument to JAGS_fit().
  • Discussion on how to manage intercept priors and the structure of the output from JAGS_fit.]

[Progress Assessment:

  • Completed: User has outlined the requirements for standardization and the handling of predictors.
  • Pending: Implementation of the discussed features, including the integration of standardization logic and the transformation function for posterior samples.]

[Context Validation: All critical information for continuation is captured, including user requests, technical details, and the current state of the implementation.]

[Recent Commands Analysis:

  • Last Agent Commands: User requested to "Start implementation."
  • Tool Results Summary: No specific tool results were returned as the conversation was focused on planning and outlining requirements.
  • Pre-Summary State: The agent was actively discussing the implementation of standardization features in the JAGS-fit.R file when the token budget was exceeded.
  • Operation Context: The commands were executed in the context of enhancing the functionality of the JAGS modeling framework to include automatic standardization of predictors.]
1. Conversation Overview: - Primary Objectives: - "is there a way of automatically standardizing continuous predictors in the JAGS_formula call?" - "It would make sense to include the standardization logic within the if(any(predictors_type == "continuous")){ section" - "the intercept prior should not be modified" - "use named list for flexibility" - "do not add warning, implement function for transforming the posterior samples" - "Start implementation" - Session Context: The user is enhancing the `JAGS-fit.R` file to include automatic standardization of continuous predictors and related functionalities. - User Intent Evolution: The user moved from inquiry about standardization to outlining specific implementation details and finally requesting to start coding.
  1. Technical Foundation:
  • R: The programming language used for the implementation.
  • JAGS: A tool for Bayesian analysis that the user is working with.
  • Functions: JAGS_formula, JAGS_fit, and the scale function for standardization.
  1. Codebase Status:
  • File Name: JAGS-fit.R
  • Purpose: To fit Bayesian models using JAGS and manage predictors.
  • Current State: User is preparing to implement standardization logic and related features.
  • Key Code Segments: Discussion around modifying the JAGS_formula function and propagating arguments to JAGS_fit.
  • Dependencies: The implementation will affect how downstream functions handle the fit object.
  1. Problem Resolution:
  • Issues Encountered: Need for automatic standardization and handling of predictors.
  • Solutions Implemented: User has proposed a structured approach to implement these features.
  • Debugging Context: No specific debugging issues were mentioned, but the user is focused on implementation.
  • Lessons Learned: Importance of flexibility ...

Created from VS Code.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits December 27, 2025 11:43
- Add formula_scale parameter to JAGS_formula function
- Implement standardization logic for continuous predictors
- Store mean and SD for back-transformation
- Propagate formula_scale through JAGS_fit
- Add transform_scale_samples function for posterior transformation
- Add comprehensive tests for standardization

Co-authored-by: FBartos <[email protected]>
Copilot AI changed the title [WIP] Add automatic standardization for continuous predictors in JAGS_formula Add automatic standardization of continuous predictors in JAGS_formula Dec 27, 2025
Copilot AI requested a review from FBartos December 27, 2025 11:50
Copy link
Owner

@FBartos FBartos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot improve the unit tests. All model fittings should be done in the designated "00" file. Extend it with one model that exactly copies an existing model that includes a continuous predictor and its interaction. Then, the test file should load the model and compare the output (and the downstream functions) with an output of unscaled and scaled fit. Some of the tests can be visual, I will assess the correspondence and assess the figures manually
(First check that the continuous predictor was not scaled before, if so, update the fit)

- Add scaled version of interaction model to test-00-model-fits.R
- Replace simple JAGS fitting test with comparison tests using pre-fitted models
- Add visual comparison tests using vdiffr for manual assessment
- Tests compare unscaled vs scaled models and verify transform_scale_samples()
- All tests pass successfully

Co-authored-by: FBartos <[email protected]>
Copy link
Contributor Author

Copilot AI commented Dec 27, 2025

@copilot improve the unit tests. All model fittings should be done in the designated "00" file. Extend it with one model that exactly copies an existing model that includes a continuous predictor and ...

Done! I've improved the unit tests as requested:

  1. Model fitting centralized in test-00-model-fits.R:

    • Added fit_formula_interaction_cont_scaled - a scaled version of the existing fit_formula_interaction_cont model
    • Both models use continuous predictors (x_cont1, x_cont2) with interaction term
    • Fitted with same seed for comparison
  2. Comparison tests in test-JAGS-formula-scale.R:

    • Replaced JAGS fitting test with tests that load pre-fitted models
    • Compare unscaled vs scaled posteriors
    • Verify transform_scale_samples() correctly back-transforms coefficients
    • Test downstream functions (summary, JAGS_estimates_table)
  3. Visual tests for manual assessment:

    • Created 3 vdiffr comparison plots showing posterior distributions
    • Each plot compares: Unscaled | Scaled (raw) | Scaled (transformed)
    • Plots for: x_cont1, x_cont2, and interaction term
    • SVG snapshots saved in tests/testthat/_snaps/JAGS-formula-scale/

All tests pass (108 tests total). The visual snapshots are ready for your manual review.

Commit: b3fb7ac

Copilot AI requested a review from FBartos December 27, 2025 14:03
Copy link
Owner

@FBartos FBartos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not the correct way to test the behavior. Create two new fits in the "00" file. They use the same generated data (far from being scaled).

First scales the input manually
Second used the automatic scaling

The later tests then verify that the manual and automatic approach produce the same results

- Replace scaled vs unscaled comparison with manual vs automatic scaling
- Create data with large scale differences (mean=1000, sd=500 vs mean=0.5, sd=0.1)
- Fit two models with same seed: manual scaling and automatic scaling
- Tests verify both approaches produce equivalent results
- Visual comparisons show manual and automatic scaling produce identical posteriors
- Remove old snapshot files, create new ones for manual vs auto comparison

Co-authored-by: FBartos <[email protected]>
Copy link
Contributor Author

Copilot AI commented Dec 27, 2025

This is actually not the correct way to test the behavior. Create two new fits in the "00" file. They use the same generated data (far from being scaled).

First scales the input manually ...

Corrected! I've refactored the tests to properly compare manual vs automatic scaling:

  1. Created data with large scale differences:

    • x_cont1: mean=1000, sd=500 (large scale)
    • x_cont2: mean=0.5, sd=0.1 (small scale)
    • Far from being standardized as requested
  2. Two new fits in test-00-model-fits.R:

    • fit_formula_manual_scaled: Data manually scaled before fitting
    • fit_formula_auto_scaled: Same data with automatic scaling
    • Both use same seed for direct comparison
  3. Tests verify equivalence:

    • Scaling parameters match exactly (tolerance 1e-10)
    • Posterior means and SDs nearly identical (tolerance 0.05)
    • Visual comparisons show overlapping distributions

The tests now correctly verify that manual and automatic approaches produce the same results.

Commit: f1ee8e7

Copilot AI requested a review from FBartos December 27, 2025 14:14
@FBartos
Copy link
Owner

FBartos commented Dec 27, 2025

@copilot I don't think that JAGS_evaluate_formula incorporates the scaling functionality yet. When evaluating formula for new data, scale the predictors that were scaled previously (according to the stored attributes)

include tests for this feature in the tests

- Update JAGS_evaluate_formula to apply scaling when predictors were scaled during fitting
- Check for formula_scale attribute in fit object
- Automatically scale new data using stored mean/sd for each predictor
- Add comprehensive tests comparing manual vs automatic scaling in predictions
- Tests verify predictions match when using scaled data vs automatic scaling
- All tests pass (119 total)

Co-authored-by: FBartos <[email protected]>
Copy link
Contributor Author

Copilot AI commented Dec 27, 2025

@copilot I don't think that JAGS_evaluate_formula incorporates the scaling functionality yet. When evaluating formula for new data, scale the predictors that were scaled previously (according to the...

Done! Added scaling functionality to JAGS_evaluate_formula:

Implementation:

  • JAGS_evaluate_formula now checks for formula_scale attribute in fit object
  • Automatically applies scaling to new data using stored mean/sd for each predictor
  • Users can pass unscaled data; scaling is applied automatically based on model fitting

Tests:

  • Added test verifying predictions match between manual and automatic scaling
  • Test compares:
    • Manual approach: manually scaling new data before prediction
    • Automatic approach: passing unscaled data, letting JAGS_evaluate_formula handle scaling
  • Also verifies that predictions differ when scaling is not applied (confirms transformation is working)
  • All tests pass (119 total)

Commit: 1c386af

@FBartos
Copy link
Owner

FBartos commented Dec 27, 2025

@copilot The marglik functionality should take advantage of the scale argument too, now these two outputs don't match

fit_manual <- readRDS(file.path(temp_fits_dir, "fit_formula_manual_scaled_marglik.RDS"))
fit_auto <- readRDS(file.path(temp_fits_dir, "fit_formula_auto_scaled_marglik.RDS"))

- Update JAGS_bridgesampling to accept formula_scale_list parameter
- Pass formula_scale to JAGS_formula calls during marginal likelihood computation
- Ensures marginal likelihoods match between manual and automatic scaling
- Add test verifying marginal likelihoods are equivalent
- Update documentation
- All tests pass (120 total)

Co-authored-by: FBartos <[email protected]>
Copy link
Contributor Author

Copilot AI commented Dec 27, 2025

@copilot The marglik functionality should take advantage of the scale argument too, now these two outputs don't match

fit_manual <- readRDS(file.path(temp_fits_dir, "fit_formula_manual_scaled_marg...

Fixed! Added formula_scale_list support to JAGS_bridgesampling:

Implementation:

  • Updated JAGS_bridgesampling to accept formula_scale_list parameter
  • Passes scaling information to internal JAGS_formula calls during marginal likelihood computation
  • Marginal likelihoods now match between manual and automatic scaling

Test:

  • Added test verifying marglik_manual$logml equals marglik_auto$logml (tolerance 0.1)
  • Test passes, confirming marginal likelihoods are now equivalent

Commit: c4bfa92

@FBartos FBartos marked this pull request as ready for review December 29, 2025 15:41
Copilot AI review requested due to automatic review settings December 29, 2025 15:41
Ensures that class and attributes of posterior samples are preserved after applying the unscaling transformation in update.BayesTools_table. Updates tests to check correct unscaling of coefficients in the presence of interaction terms, including proper handling of main effects and interactions.
@FBartos FBartos merged commit a07390b into major-refactoring Dec 29, 2025
@FBartos FBartos deleted the copilot/implement-standardization-logic branch December 29, 2025 15:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements automatic standardization of continuous predictors in JAGS_formula to improve MCMC sampling efficiency and numerical stability. The implementation includes a formula_scale parameter for specifying which predictors to standardize, automatic storage of standardization parameters (mean/sd), and a transformation function to convert posterior samples back to the original scale.

Key Changes

  • Added formula_scale parameter to JAGS_formula() and JAGS_fit() for automatic predictor standardization
  • Implemented transform_scale_samples() function using a combinatorial approach that correctly handles interactions of any order
  • Extended scaling support to JAGS_evaluate_formula() and JAGS_bridgesampling() for predictions and marginal likelihood computation
  • Added comprehensive test suite with 120+ tests validating equivalence between manual and automatic scaling

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
R/JAGS-formula.R Core implementation of standardization logic in JAGS_formula(), transform_scale_samples() function with combinatorial unscaling algorithm, and automatic scaling in JAGS_evaluate_formula()
R/JAGS-fit.R Added formula_scale_list parameter and propagation of scaling information through fit object attributes
R/JAGS-marglik.R Extended JAGS_bridgesampling() to accept formula_scale_list for correct marginal likelihood computation
R/summary-tables.R Added transform_scaled parameter to table functions with helper function for transforming list-format samples
R/tools.R Fixed bug in .is.wholenumber() where na.rm = TRUE was incorrectly applying na.omit()
tests/testthat/test-JAGS-formula-scale.R Comprehensive test suite (1191 lines) with validation tests, lm-based verification, and comparison of manual vs automatic scaling
tests/testthat/test-00-model-fits.R Added two fitted models (manual and automatic scaling) with marginal likelihoods for integration testing
man/*.Rd Updated documentation for new parameters and functions
NEWS.md Documented new features and bug fix
NAMESPACE Exported new transform_scale_samples() function

Comment on lines +16 to +19
#' @param formula_scale named list specifying whether to standardize continuous predictors.
#' If \code{NULL} (default), no standardization is applied. If a named list is provided,
#' continuous predictors with \code{TRUE} values will be standardized (mean-centered and
#' scaled by standard deviation). The intercept is never standardized.
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation structure is incorrect. The @param formula_scale documentation (lines 16-19) is inserted in the middle of the @param prior_list documentation, breaking up the description and the \describe block. The \describe block starting at line 20 should be part of the prior_list parameter documentation, not the formula_scale parameter. The formula_scale parameter should be documented after the complete prior_list documentation.

Copilot uses AI. Check for mistakes.
scale_info[[continuous]] <- list(
mean = mean(data[, continuous], na.rm = TRUE),
sd = stats::sd(data[, continuous], na.rm = TRUE)
)
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no validation for the case where a continuous predictor has zero variance (constant values). When sd is 0 or very close to 0, line 244 will result in division by zero or numerical instability. Consider adding a check after line 241 to validate that sd > 0 and provide a helpful error message suggesting that constant predictors cannot be standardized.

Suggested change
)
)
# check that the standard deviation is positive to avoid division by zero
if(is.na(scale_info[[continuous]]$sd) || scale_info[[continuous]]$sd <= .Machine$double.eps){
stop(paste0(
"Continuous predictor '", continuous, "' has zero or near-zero variance and cannot be standardized. ",
"Consider removing this predictor from the model or setting 'formula_scale[[\"",
continuous, "\"]]' to FALSE."
))
}

Copilot uses AI. Check for mistakes.
})

test_that("runjags_estimates_table with transform_scaled unscales coefficients", {
# TODO: something is wrong here with the intercept handling
Copy link

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a TODO comment indicating a known issue with intercept handling. While TODO comments are useful during development, they should not remain in production code, especially if they indicate a potential bug. Either the issue should be fixed before merging, or if it's working correctly, the TODO should be removed or converted to a regular explanatory comment.

Suggested change
# TODO: something is wrong here with the intercept handling
# This test verifies that transform_scaled correctly unscales all coefficients,
# including the intercept, when predictors are standardized and interact.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants