[Bug]: AnomalyScoreThreshold is incompatible with multi-GPU training

### Describe the bug

Trying to do multi-GPU training of `fastflow` by setting the config `strategy: ddp` and `optimizer: gpu`, I get the error:
```
Traceback (most recent call last):
  File "/home/sean/combinedpipe/run_anomalib.py", line 40, in <module>
    trainer.fit(model=model, datamodule=datamodule)
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run
    results = self._run_stage()
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage
    self._run_train()
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_train
    self.fit_loop.run()
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
    self.advance(*args, **kwargs)
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
    self._outputs = self.epoch_loop.run(self._data_fetcher)
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
    self.on_advance_end()
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 250, in on_advance_end
    self._run_validation()
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 308, in _run_validation
    self.val_loop.run()
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/loops/loop.py", line 206, in run
    output = self.on_run_end()
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 180, in on_run_end
    self._evaluation_epoch_end(self._outputs)
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 288, in _evaluation_epoch_end
    self.trainer._call_lightning_module_hook(hook_name, output_or_outputs)
  File "/home/sean/sean/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1356, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/sean/anomalib/src/anomalib/models/components/base/anomaly_module.py", line 145, in validation_epoch_end
    self._compute_adaptive_threshold(outputs)
  File "/home/sean/anomalib/src/anomalib/models/components/base/anomaly_module.py", line 162, in _compute_adaptive_threshold
    self.image_threshold.compute()
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/metric.py", line 529, in wrapped_func
    with self.sync_context(
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/metric.py", line 500, in sync_context
    self.sync(
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/metric.py", line 452, in sync
    self._sync_dist(dist_sync_fn, process_group=process_group)
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/metric.py", line 364, in _sync_dist
    output_dict = apply_to_collection(
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/utilities/data.py", line 203, in apply_to_collection
    return elem_type({k: apply_to_collection(v, dtype, function, *args, **kwargs) for k, v in data.items()})
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/utilities/data.py", line 203, in <dictcomp>
    return elem_type({k: apply_to_collection(v, dtype, function, *args, **kwargs) for k, v in data.items()})
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/utilities/data.py", line 209, in apply_to_collection
    return elem_type([apply_to_collection(d, dtype, function, *args, **kwargs) for d in data])
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/utilities/data.py", line 209, in <listcomp>
    return elem_type([apply_to_collection(d, dtype, function, *args, **kwargs) for d in data])
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/utilities/data.py", line 199, in apply_to_collection
    return function(data, *args, **kwargs)
  File "/home/sean/sean/lib/python3.10/site-packages/torchmetrics/utilities/distributed.py", line 131, in gather_all_tensors
    torch.distributed.all_gather(local_sizes, local_size, group=group)
  File "/home/sean/sean/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1451, in wrapper
    return func(*args, **kwargs)
  File "/home/sean/sean/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 2450, in all_gather
    work = group.allgather([tensor_list], [tensor])
RuntimeError: Tensors must be CUDA and dense
```
Getting around the `RuntimeError: Tensors must be CUDA and dense` error by removing all `.cpu()` calls in `src/anomalib/models/components/base/anomaly_module.py` results in `image_F1Score` being `0.0` during both testing and validation.

Why is `AnomalyScoreThreshold` incompatible multi-GPU training and how could it be modified to be compatible?

### Dataset

Other (please specify in the text field below)

### Model

FastFlow

### Steps to reproduce the behavior

See bug description.

### OS information

OS information:
- OS: Ubuntu 22.04.03
- Python version: 3.10.12
- Anomalib version: main on Github
- PyTorch version: 2.0.1
- CUDA/cuDNN version: 12.2
- GPU models and configuration: 2x NVIDIA RTX 6000 Ada
- Any other relevant information: I'm using the hazelnut toy dataset


### Expected behavior

I expected to be able to do multi-GPU training using Fastflow and for the F1 score to be non-zero.

### Screenshots

_No response_

### Pip/GitHub

GitHub

### What version/branch did you use?

main

### Configuration YAML

```yaml
See bug description.
```


### Logs

```shell
See bug description.
```


### Code of Conduct

- [X] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: AnomalyScoreThreshold is incompatible with multi-GPU training #1398

Describe the bug

Dataset

Model

Steps to reproduce the behavior

OS information

Expected behavior

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: AnomalyScoreThreshold is incompatible with multi-GPU training #1398

Description

Describe the bug

Dataset

Model

Steps to reproduce the behavior

OS information

Expected behavior

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions