Add automatic standardization of continuous predictors in JAGS_formula #57

Copilot · 2025-12-27T11:35:26Z

Implementation Complete: Automatic Standardization of Continuous Predictors

Overview

Successfully implemented automatic standardization of continuous predictors in JAGS_formula to improve MCMC sampling efficiency and numerical stability.

Core Features

formula_scale parameter in JAGS_formula: Named list to specify which continuous predictors should be standardized
Automatic scaling: Stores mean and SD for each standardized predictor for back-transformation
transform_scale_samples() function: Transforms posterior samples back to original scale
JAGS_evaluate_formula scaling support: Automatically applies scaling to new data when making predictions
JAGS_bridgesampling scaling support: Marginal likelihood computation now respects scaling
Seamless integration: Propagates through JAGS_fit workflow

Unit Tests - Manual vs Automatic Scaling Validation

The tests verify that manual and automatic scaling produce equivalent results:

Test Data with Large Scale Differences
- x_cont1: mean=1000, sd=500 (large scale)
- x_cont2: mean=0.5, sd=0.1 (small scale)
- Both predictors include interaction term
- Data far from being standardized to properly test scaling behavior
Two Model Fits in test-00-model-fits.R
- fit_formula_manual_scaled: Data manually scaled before fitting
- fit_formula_auto_scaled: Same data with automatic scaling via formula_scale_list
- Both fitted with same seed for direct comparison
- Marginal likelihoods computed for both models
Comparison Tests
- Verify scaling parameters match exactly (tolerance: 1e-10)
- Compare posterior means and SDs (tolerance: 0.05)
- Tests confirm manual and automatic approaches produce equivalent results
- Verify downstream functions work correctly (summary, JAGS_estimates_table)
- JAGS_evaluate_formula tests: Verify automatic scaling is applied to new data during prediction
- Marginal likelihood tests: Verify marginal likelihoods match between manual and automatic scaling
Visual Tests for Manual Assessment
- 4 comparison plots using vdiffr (manual vs automatic):
  - x_cont1 coefficient
  - x_cont2 coefficient
  - x_cont1:x_cont2 interaction term
  - intercept
- Visual snapshots show overlapping posteriors, confirming equivalence

JAGS_evaluate_formula Scaling Support

Added automatic scaling functionality to JAGS_evaluate_formula for making predictions:

Automatic Scaling in Predictions: When evaluating formulas on new data, JAGS_evaluate_formula automatically applies scaling to predictors that were scaled during model fitting
Implementation: Checks for formula_scale attribute in fit object and applies stored mean/sd transformations
User Benefit: Users can pass unscaled data when making predictions; scaling is handled automatically based on the model's stored scaling information
Tests: Comprehensive tests verify that predictions match between manually scaled data and automatic scaling

JAGS_bridgesampling Scaling Support

Added automatic scaling functionality to JAGS_bridgesampling for computing marginal likelihoods:

Marginal Likelihood with Scaling: JAGS_bridgesampling now accepts formula_scale_list parameter
Implementation: Passes scaling information to internal JAGS_formula calls during marginal likelihood computation
User Benefit: Marginal likelihoods are computed correctly for scaled models; manual and automatic scaling produce matching results
Tests: Tests verify marginal likelihoods match between manual and automatic scaling approaches

Test Results

✅ All tests pass (120 tests)
✅ Visual snapshots created in tests/testthat/_snaps/JAGS-formula-scale/
✅ Tests verify manual and automatic scaling produce identical results
✅ JAGS_evaluate_formula correctly applies scaling to new data
✅ Marginal likelihoods match between manual and automatic scaling

Files Changed

R/JAGS-formula.R: Added standardization logic, transform_scale_samples(), and JAGS_evaluate_formula scaling support
R/JAGS-fit.R: Added formula_scale_list parameter and propagation
R/JAGS-marglik.R: Added formula_scale_list support to JAGS_bridgesampling
tests/testthat/test-00-model-fits.R: Added manual and auto scaled models with marginal likelihoods
tests/testthat/test-JAGS-formula-scale.R: Tests comparing manual vs auto scaling, JAGS_evaluate_formula predictions, and marginal likelihoods
NEWS.md: Documented new features

Original prompt

Start implementation

The user has attached the following file paths as relevant context:

.github\copilot-instructions.md

[Chronological Review: The conversation began with the user inquiring about the automatic standardization of continuous predictors in the `JAGS_formula` call. The user suggested including standardization logic within a specific conditional block and discussed the use of the `scale` function in R. They emphasized the need to propagate the `scale` argument to the `JAGS_fit()` function. The user then outlined additional requirements regarding intercept priors, the use of named lists for flexibility, and the implementation of a function for transforming posterior samples. Finally, the user requested to start the implementation of these features.]

[Intent Mapping:

"is there a way of automatically standardizing continuous predictors in the JAGS_formula call?" - User seeks a method for automatic standardization.
"It would make sense to include the standardization logic within the if(any(predictors_type == "continuous")){ section" - User suggests a specific implementation approach.
"the intercept prior should not be modified" - User specifies a requirement regarding intercept priors.
"use named list for flexibility" - User requests a flexible approach for handling standardization.
"do not add warning, implement function for transforming the posterior samples" - User wants to avoid warnings and implement a transformation function.
"Start implementation" - User initiates the coding phase for the discussed features.]

[Technical Inventory:

R programming language
JAGS (Just Another Gibbs Sampler) for Bayesian analysis
Functions: JAGS_formula, JAGS_fit, scale function in R
Concepts: Standardization of predictors, handling of posterior samples, named lists for argument flexibility.]

[Code Archaeology:

File: JAGS-fit.R
Recent focus on modifying the JAGS_formula function to include standardization logic and propagate the scale argument to JAGS_fit().
Discussion on how to manage intercept priors and the structure of the output from JAGS_fit.]

[Progress Assessment:

Completed: User has outlined the requirements for standardization and the handling of predictors.
Pending: Implementation of the discussed features, including the integration of standardization logic and the transformation function for posterior samples.]

[Context Validation: All critical information for continuation is captured, including user requests, technical details, and the current state of the implementation.]

[Recent Commands Analysis:

Last Agent Commands: User requested to "Start implementation."
Tool Results Summary: No specific tool results were returned as the conversation was focused on planning and outlining requirements.
Pre-Summary State: The agent was actively discussing the implementation of standardization features in the JAGS-fit.R file when the token budget was exceeded.
Operation Context: The commands were executed in the context of enhancing the functionality of the JAGS modeling framework to include automatic standardization of predictors.]

1. Conversation Overview: - Primary Objectives: - "is there a way of automatically standardizing continuous predictors in the JAGS_formula call?" - "It would make sense to include the standardization logic within the if(any(predictors_type == "continuous")){ section" - "the intercept prior should not be modified" - "use named list for flexibility" - "do not add warning, implement function for transforming the posterior samples" - "Start implementation" - Session Context: The user is enhancing the `JAGS-fit.R` file to include automatic standardization of continuous predictors and related functionalities. - User Intent Evolution: The user moved from inquiry about standardization to outlining specific implementation details and finally requesting to start coding.

Technical Foundation:

R: The programming language used for the implementation.
JAGS: A tool for Bayesian analysis that the user is working with.
Functions: JAGS_formula, JAGS_fit, and the scale function for standardization.

Codebase Status:

File Name: JAGS-fit.R
Purpose: To fit Bayesian models using JAGS and manage predictors.
Current State: User is preparing to implement standardization logic and related features.
Key Code Segments: Discussion around modifying the JAGS_formula function and propagating arguments to JAGS_fit.
Dependencies: The implementation will affect how downstream functions handle the fit object.

Problem Resolution:

Issues Encountered: Need for automatic standardization and handling of predictors.
Solutions Implemented: User has proposed a structured approach to implement these features.
Debugging Context: No specific debugging issues were mentioned, but the user is focused on implementation.
Lessons Learned: Importance of flexibility ...

Created from VS Code.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Add formula_scale parameter to JAGS_formula function - Implement standardization logic for continuous predictors - Store mean and SD for back-transformation - Propagate formula_scale through JAGS_fit - Add transform_scale_samples function for posterior transformation - Add comprehensive tests for standardization Co-authored-by: FBartos <[email protected]>

Co-authored-by: FBartos <[email protected]>

FBartos

@copilot improve the unit tests. All model fittings should be done in the designated "00" file. Extend it with one model that exactly copies an existing model that includes a continuous predictor and its interaction. Then, the test file should load the model and compare the output (and the downstream functions) with an output of unscaled and scaled fit. Some of the tests can be visual, I will assess the correspondence and assess the figures manually
(First check that the continuous predictor was not scaled before, if so, update the fit)

- Add scaled version of interaction model to test-00-model-fits.R - Replace simple JAGS fitting test with comparison tests using pre-fitted models - Add visual comparison tests using vdiffr for manual assessment - Tests compare unscaled vs scaled models and verify transform_scale_samples() - All tests pass successfully Co-authored-by: FBartos <[email protected]>

Copilot · 2025-12-27T14:01:39Z

@copilot improve the unit tests. All model fittings should be done in the designated "00" file. Extend it with one model that exactly copies an existing model that includes a continuous predictor and ...

Done! I've improved the unit tests as requested:

Model fitting centralized in test-00-model-fits.R:
- Added fit_formula_interaction_cont_scaled - a scaled version of the existing fit_formula_interaction_cont model
- Both models use continuous predictors (x_cont1, x_cont2) with interaction term
- Fitted with same seed for comparison
Comparison tests in test-JAGS-formula-scale.R:
- Replaced JAGS fitting test with tests that load pre-fitted models
- Compare unscaled vs scaled posteriors
- Verify transform_scale_samples() correctly back-transforms coefficients
- Test downstream functions (summary, JAGS_estimates_table)
Visual tests for manual assessment:
- Created 3 vdiffr comparison plots showing posterior distributions
- Each plot compares: Unscaled | Scaled (raw) | Scaled (transformed)
- Plots for: x_cont1, x_cont2, and interaction term
- SVG snapshots saved in tests/testthat/_snaps/JAGS-formula-scale/

All tests pass (108 tests total). The visual snapshots are ready for your manual review.

Commit: b3fb7ac

FBartos

This is actually not the correct way to test the behavior. Create two new fits in the "00" file. They use the same generated data (far from being scaled).

First scales the input manually
Second used the automatic scaling

The later tests then verify that the manual and automatic approach produce the same results

- Replace scaled vs unscaled comparison with manual vs automatic scaling - Create data with large scale differences (mean=1000, sd=500 vs mean=0.5, sd=0.1) - Fit two models with same seed: manual scaling and automatic scaling - Tests verify both approaches produce equivalent results - Visual comparisons show manual and automatic scaling produce identical posteriors - Remove old snapshot files, create new ones for manual vs auto comparison Co-authored-by: FBartos <[email protected]>

Copilot · 2025-12-27T14:13:35Z

This is actually not the correct way to test the behavior. Create two new fits in the "00" file. They use the same generated data (far from being scaled).

First scales the input manually ...

Corrected! I've refactored the tests to properly compare manual vs automatic scaling:

Created data with large scale differences:
- x_cont1: mean=1000, sd=500 (large scale)
- x_cont2: mean=0.5, sd=0.1 (small scale)
- Far from being standardized as requested
Two new fits in test-00-model-fits.R:
- fit_formula_manual_scaled: Data manually scaled before fitting
- fit_formula_auto_scaled: Same data with automatic scaling
- Both use same seed for direct comparison
Tests verify equivalence:
- Scaling parameters match exactly (tolerance 1e-10)
- Posterior means and SDs nearly identical (tolerance 0.05)
- Visual comparisons show overlapping distributions

The tests now correctly verify that manual and automatic approaches produce the same results.

Commit: f1ee8e7

FBartos · 2025-12-27T15:55:05Z

@copilot I don't think that JAGS_evaluate_formula incorporates the scaling functionality yet. When evaluating formula for new data, scale the predictors that were scaled previously (according to the stored attributes)

include tests for this feature in the tests

- Update JAGS_evaluate_formula to apply scaling when predictors were scaled during fitting - Check for formula_scale attribute in fit object - Automatically scale new data using stored mean/sd for each predictor - Add comprehensive tests comparing manual vs automatic scaling in predictions - Tests verify predictions match when using scaled data vs automatic scaling - All tests pass (119 total) Co-authored-by: FBartos <[email protected]>

Copilot · 2025-12-27T16:00:51Z

@copilot I don't think that JAGS_evaluate_formula incorporates the scaling functionality yet. When evaluating formula for new data, scale the predictors that were scaled previously (according to the...

Done! Added scaling functionality to JAGS_evaluate_formula:

Implementation:

JAGS_evaluate_formula now checks for formula_scale attribute in fit object
Automatically applies scaling to new data using stored mean/sd for each predictor
Users can pass unscaled data; scaling is applied automatically based on model fitting

Tests:

Added test verifying predictions match between manual and automatic scaling
Test compares:
- Manual approach: manually scaling new data before prediction
- Automatic approach: passing unscaled data, letting JAGS_evaluate_formula handle scaling
Also verifies that predictions differ when scaling is not applied (confirms transformation is working)
All tests pass (119 total)

Commit: 1c386af

FBartos · 2025-12-27T16:01:42Z

@copilot The marglik functionality should take advantage of the scale argument too, now these two outputs don't match

fit_manual <- readRDS(file.path(temp_fits_dir, "fit_formula_manual_scaled_marglik.RDS"))
fit_auto <- readRDS(file.path(temp_fits_dir, "fit_formula_auto_scaled_marglik.RDS"))

- Update JAGS_bridgesampling to accept formula_scale_list parameter - Pass formula_scale to JAGS_formula calls during marginal likelihood computation - Ensures marginal likelihoods match between manual and automatic scaling - Add test verifying marginal likelihoods are equivalent - Update documentation - All tests pass (120 total) Co-authored-by: FBartos <[email protected]>

Copilot · 2025-12-27T16:08:54Z

@copilot The marglik functionality should take advantage of the scale argument too, now these two outputs don't match

fit_manual <- readRDS(file.path(temp_fits_dir, "fit_formula_manual_scaled_marg...

Fixed! Added formula_scale_list support to JAGS_bridgesampling:

Implementation:

Updated JAGS_bridgesampling to accept formula_scale_list parameter
Passes scaling information to internal JAGS_formula calls during marginal likelihood computation
Marginal likelihoods now match between manual and automatic scaling

Test:

Added test verifying marglik_manual$logml equals marglik_auto$logml (tolerance 0.1)
Test passes, confirming marginal likelihoods are now equivalent

Commit: c4bfa92

Ensures that class and attributes of posterior samples are preserved after applying the unscaling transformation in update.BayesTools_table. Updates tests to check correct unscaling of coefficients in the presence of interaction terms, including proper handling of main effects and interactions.

Copilot

Pull request overview

This PR implements automatic standardization of continuous predictors in JAGS_formula to improve MCMC sampling efficiency and numerical stability. The implementation includes a formula_scale parameter for specifying which predictors to standardize, automatic storage of standardization parameters (mean/sd), and a transformation function to convert posterior samples back to the original scale.

Key Changes

Added formula_scale parameter to JAGS_formula() and JAGS_fit() for automatic predictor standardization
Implemented transform_scale_samples() function using a combinatorial approach that correctly handles interactions of any order
Extended scaling support to JAGS_evaluate_formula() and JAGS_bridgesampling() for predictions and marginal likelihood computation
Added comprehensive test suite with 120+ tests validating equivalence between manual and automatic scaling

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
R/JAGS-formula.R	Core implementation of standardization logic in `JAGS_formula()`, `transform_scale_samples()` function with combinatorial unscaling algorithm, and automatic scaling in `JAGS_evaluate_formula()`
R/JAGS-fit.R	Added `formula_scale_list` parameter and propagation of scaling information through fit object attributes
R/JAGS-marglik.R	Extended `JAGS_bridgesampling()` to accept `formula_scale_list` for correct marginal likelihood computation
R/summary-tables.R	Added `transform_scaled` parameter to table functions with helper function for transforming list-format samples
R/tools.R	Fixed bug in `.is.wholenumber()` where `na.rm = TRUE` was incorrectly applying `na.omit()`
tests/testthat/test-JAGS-formula-scale.R	Comprehensive test suite (1191 lines) with validation tests, lm-based verification, and comparison of manual vs automatic scaling
tests/testthat/test-00-model-fits.R	Added two fitted models (manual and automatic scaling) with marginal likelihoods for integration testing
man/*.Rd	Updated documentation for new parameters and functions
NEWS.md	Documented new features and bug fix
NAMESPACE	Exported new `transform_scale_samples()` function

Copilot · 2025-12-29T15:46:03Z

R/JAGS-formula.R

+#' @param formula_scale named list specifying whether to standardize continuous predictors.
+#' If \code{NULL} (default), no standardization is applied. If a named list is provided,
+#' continuous predictors with \code{TRUE} values will be standardized (mean-centered and
+#' scaled by standard deviation). The intercept is never standardized.


The documentation structure is incorrect. The @param formula_scale documentation (lines 16-19) is inserted in the middle of the @param prior_list documentation, breaking up the description and the \describe block. The \describe block starting at line 20 should be part of the prior_list parameter documentation, not the formula_scale parameter. The formula_scale parameter should be documented after the complete prior_list documentation.

Copilot · 2025-12-29T15:46:03Z

R/JAGS-formula.R

+          scale_info[[continuous]] <- list(
+            mean = mean(data[, continuous], na.rm = TRUE),
+            sd   = stats::sd(data[, continuous], na.rm = TRUE)
+          )


There's no validation for the case where a continuous predictor has zero variance (constant values). When sd is 0 or very close to 0, line 244 will result in division by zero or numerical instability. Consider adding a check after line 241 to validate that sd > 0 and provide a helpful error message suggesting that constant predictors cannot be standardized.

Suggested change

)

)

# check that the standard deviation is positive to avoid division by zero

if(is.na(scale_info[[continuous]]$sd) || scale_info[[continuous]]$sd <= .Machine$double.eps){

stop(paste0(

"Continuous predictor '", continuous, "' has zero or near-zero variance and cannot be standardized. ",

"Consider removing this predictor from the model or setting 'formula_scale[[\"",

continuous, "\"]]' to FALSE."

))

}

Copilot · 2025-12-29T15:46:04Z

tests/testthat/test-JAGS-formula-scale.R

+})
+
+test_that("runjags_estimates_table with transform_scaled unscales coefficients", {
+  # TODO: something is wrong here with the intercept handling


There's a TODO comment indicating a known issue with intercept handling. While TODO comments are useful during development, they should not remain in production code, especially if they indicate a potential bug. Either the issue should be fixed before merging, or if it's working correctly, the TODO should be removed or converted to a regular explanatory comment.

Suggested change

# TODO: something is wrong here with the intercept handling

# This test verifies that transform_scaled correctly unscales all coefficients,

# including the intercept, when predictors are standardized and interact.

Initial plan

684481c

Copilot AI assigned Copilot and FBartos Dec 27, 2025

Copilot started work on behalf of FBartos December 27, 2025 11:35 View session

Copilot AI and others added 2 commits December 27, 2025 11:43

Update NEWS.md with standardization feature documentation

d06da72

Co-authored-by: FBartos <[email protected]>

Copilot AI changed the title ~~[WIP] Add automatic standardization for continuous predictors in JAGS_formula~~ Add automatic standardization of continuous predictors in JAGS_formula Dec 27, 2025

Copilot AI requested a review from FBartos December 27, 2025 11:50

Copilot finished work on behalf of FBartos December 27, 2025 11:50

FBartos requested changes Dec 27, 2025

View reviewed changes

Copilot started work on behalf of FBartos December 27, 2025 13:54 View session

Copilot AI requested a review from FBartos December 27, 2025 14:03

Copilot finished work on behalf of FBartos December 27, 2025 14:03

FBartos requested changes Dec 27, 2025

View reviewed changes

Copilot started work on behalf of FBartos December 27, 2025 14:07 View session

Copilot AI requested a review from FBartos December 27, 2025 14:14

Copilot finished work on behalf of FBartos December 27, 2025 14:14

improve tests

4fdab7b

Copilot started work on behalf of FBartos December 27, 2025 15:55 View session

Copilot finished work on behalf of FBartos December 27, 2025 16:01

Copilot started work on behalf of FBartos December 27, 2025 16:02 View session

Copilot finished work on behalf of FBartos December 27, 2025 16:09

FBartos added 4 commits December 27, 2025 20:09

improve tests

49ee914

fix input

d84d3c9

catch changes

711c365

push to PC

b242e34

FBartos marked this pull request as ready for review December 29, 2025 15:41

Copilot AI review requested due to automatic review settings December 29, 2025 15:41

FBartos merged commit a07390b into major-refactoring Dec 29, 2025

FBartos deleted the copilot/implement-standardization-logic branch December 29, 2025 15:41

Copilot started reviewing on behalf of FBartos December 29, 2025 15:41 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

-          )
+          )
+          # check that the standard deviation is positive to avoid division by zero
+          if(is.na(scale_info[[continuous]]$sd) || scale_info[[continuous]]$sd <= .Machine$double.eps){
+            stop(paste0(
+              "Continuous predictor '", continuous, "' has zero or near-zero variance and cannot be standardized. ",
+              "Consider removing this predictor from the model or setting 'formula_scale[[\"",
+              continuous, "\"]]' to FALSE."
+            ))
+          }

	# TODO: something is wrong here with the intercept handling
	# This test verifies that transform_scaled correctly unscales all coefficients,
	# including the intercept, when predictors are standardized and interact.

Add automatic standardization of continuous predictors in JAGS_formula #57

Add automatic standardization of continuous predictors in JAGS_formula #57

Uh oh!

Conversation

Copilot AI commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation Complete: Automatic Standardization of Continuous Predictors

Overview

Core Features

Unit Tests - Manual vs Automatic Scaling Validation

JAGS_evaluate_formula Scaling Support

JAGS_bridgesampling Scaling Support

Test Results

Files Changed

Uh oh!

FBartos left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Dec 27, 2025

Uh oh!

FBartos left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Dec 27, 2025

Uh oh!

FBartos commented Dec 27, 2025

Uh oh!

Copilot AI commented Dec 27, 2025

Uh oh!

FBartos commented Dec 27, 2025

Uh oh!

Copilot AI commented Dec 27, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 27, 2025 •

edited

Loading