-
Notifications
You must be signed in to change notification settings - Fork 402
Description
scANVI goes out of memory for training on 6.6 million cells.
Anndata is 22GB with 5,000 highly variable genes but watching htop during training I see peaks of CPU memory:
164GB virtual and 145GB RES towards the end of the epoch. I have now fixed the issue by requesting 400GB memory but that should not be necessary.
I am training from a pre-trained scVI model
model = scvi.model.SCVI(
adata, n_latent=50, dropout_rate=0.2, n_layers=2, gene_likelihood="nb"
)
tparams = {
"max_epochs": 30,
"early_stopping": True,
"early_stopping_patience": 5,
"simple_progress_bar": True,
"batch_size": 1024,
"check_val_every_n_epoch": 1,
"enable_model_summary": True,
"enable_checkpointing": True,
}
model.train(**tparams)
scvi.settings.seed = 0
scanvi_model = scvi.model.SCANVI.from_scvi_model(
model,
adata=adata,
labels_key="lineage_2",
unlabeled_category="nan",
)
scanvi_tparams = {
"batch_size": 2048,
"early_stopping": True,
"early_stopping_patience": 3,
"check_val_every_n_epoch": 1,
"early_stopping_monitor": "validation_loss",
"max_epochs": 20,
}
scanvi_model.train(**scanvi_tparams)
"lineage_2"
contains the following cell type counts:
celltype counts:
T 3900338
Monocyte/DC/basophil 1329707
NK/ILC 770840
B/plasma 376316
nan 177093 # unlabeled_category
HSC_MPP 3728
Platelet/erythroid 87
in total 6,558,109 cells
Likely the error occurs in the validation loss loop but I have not pinpointed it to a specific line of code.
Out of CPU memory
Thanks a lot for your help!
Versions:
1.3.2