Clarifications on CytoNorm behaviour for various datasets

Hi -
  
 I've been playing around with the package on various datasets (QC beads / biological data) and I've encountered some things that require expert guidance. 

### Artefacts introduced through normalization, their behaviour and how to manage them.
Below you can see the normalization of QC beads that were measured on different instruments (but same model) under identical PMTV settings.  Input fcs data (`fcs_merge` is normalized (µ = 0, sd = 1)).

Regardless of SOM settings (or even skipping SOM), artefacts in bivariate scatter plots emerge that distort the original multivariate distribution. This was not picked-up in a univariate histogram inspection.

Is this known behaviour, and if so, how does one manage these distortions?

```R
cytonorm_obj <- suppressMessages(
      CytoNorm.train(
        files = fcs_merge,
        labels = ref_data$batch_id,
        channels = param,
        FlowSOM.params = list(
          nCells = 1e6,
          xdim = 15,
          ydim = 15,
          nClus = 5,
          scale = FALSE
        ),
        normMethod.train = QuantileNorm.train,
        normParams = list(goal = "mean"),
        seed = 777,
        transformList = NULL,
        clean = TRUE,
        recompute = TRUE,
        verbose = FALSE
      )
    )

``` 
**One sample from one instrument, normalized according to batch effects** (**red** after normalization, **black** before normalization)

![image](https://github.com/user-attachments/assets/b8612f95-382a-4f92-abb4-90ab1d3d4ccb)

**Two samples, one from each instrument, aligned.** (**red** sample 1 normalized, **black** sample 2 normalized)

![image](https://github.com/user-attachments/assets/c73d44fc-a8fe-4256-8028-3c55feec2dbd)


###  Goal-based normalization still normalizes goal batch data

How does is goal-based alignment implemented for `batch_ids`? My interpretation is that all other batches are aligned to the goal batch meaning that the goal batch is not normalized per use of `CytoNorm.normalize`? However, upon trying this out myself, the goal batch is still normalized. See figure below.

Can this be clarified? 

An example of a biological sample here below: (**red** after normalization, **black** before normalization)

![image](https://github.com/user-attachments/assets/d4968c65-f508-47c4-948f-5e3ee3e74d4f)

An example of a bead sample here below: (**red** after normalization, **black** before normalization)

![image](https://github.com/user-attachments/assets/05e9f13b-7ce0-48b9-8fd8-8518c5d1f7fb)

###  Advice on SOM clustering (yes/no)

I find in the instructions that SOM clustering could be skipped in case of a low number of discrete populations. I've implemented this but I instead see a similar performance and artefacts. Can there be more precise guidance on when to use SOM and when not to use SOM. Have there been any benchmarks done in this regard ? 

![image](https://github.com/user-attachments/assets/d6c7b8ae-a5c0-4a76-81e4-4dc652bcb3e3)

I can supply input data via email if required.

Thanks in advance for taking the time to address my remarks!

Ruben





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifications on CytoNorm behaviour for various datasets #48

Artefacts introduced through normalization, their behaviour and how to manage them.

Goal-based normalization still normalizes goal batch data

Advice on SOM clustering (yes/no)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarifications on CytoNorm behaviour for various datasets #48

Description

Artefacts introduced through normalization, their behaviour and how to manage them.

Goal-based normalization still normalizes goal batch data

Advice on SOM clustering (yes/no)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions