Skip to content

Fix memory leak#307

Merged
seba-1511 merged 9 commits intolearnables:masterfrom
kzhang2:fix_memory_leak
Feb 10, 2022
Merged

Fix memory leak#307
seba-1511 merged 9 commits intolearnables:masterfrom
kzhang2:fix_memory_leak

Conversation

@kzhang2
Copy link
Copy Markdown
Contributor

@kzhang2 kzhang2 commented Feb 10, 2022

Description

Fixes #284

Fix memory leak in maml.py and meta-sgd.py and add tests to maml_test.py and metasgd_test.py to check for possible future memory leaks. A test involving cloning parameters seems to fail, but my changes have nothing to do with it.

If necessary, use the following space to provide context or more details.

Contribution Checklist

If your contribution modifies code in the core library (not docs, tests, or examples), please fill the following checklist.

  • My contribution is listed in CHANGELOG.md with attribution.
  • My contribution modifies code in the main library.
  • My modifications are tested.
  • My modifications are documented.

Optional

If you make major changes to the core library, please run make alltests and copy-paste the content of alltests.txt below.

make[1]: Entering directory '/home/kevin/Documents/umd_cp/research/open-source/learn2learn'
OMP_NUM_THREADS=1 \
MKL_NUM_THREADS=1 \
python -W ignore -m unittest discover -s 'tests' -p '*_test.py' -v
9464832it [00:01, 4735385.84it/s]                             otIntegrationTests) ... 
6463488it [00:01, 4715230.93it/s]                             
ok
test_adaptation (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_allow_nograd (unit.algorithms.gbml_test.TestGBMLgorithm) ... Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/learn2learn/optim/parameter_update.py", line 119, in forward
    gradients = torch.autograd.grad(
  File "/home/kevin/anaconda3/envs/research/lib/python3.8/site-packages/torch/autograd/__init__.py", line 234, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad
ok
test_allow_unused (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_clone_module (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_graph_connection (unit.algorithms.gbml_test.TestGBMLgorithm) ... ok
test_adaptation (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_allow_nograd (unit.algorithms.maml_test.TestMAMLAlgorithm) ... Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/learn2learn/algorithms/maml.py", line 159, in adapt
    gradients = grad(loss,
  File "/home/kevin/anaconda3/envs/research/lib/python3.8/site-packages/torch/autograd/__init__.py", line 234, in grad
    return Variable._execution_engine.run_backward(
RuntimeError: One of the differentiated Tensors does not require grad
ok
test_allow_unused (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_clone_module (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_first_order_adaptation (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_graph_connection (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_memory_consumption (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_module_shared_params (unit.algorithms.maml_test.TestMAMLAlgorithm) ... ok
test_adaptation (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_clone_module (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_graph_connection (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_memory_consumption (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
test_meta_lr (unit.algorithms.metasgd_test.TestMetaSGDAlgorithm) ... ok
9464832it [00:02, 3925553.32it/s]                             
6463488it [00:01, 4160425.13it/s]                             
test_data_labels_length (unit.data.metadataset_test.TestMetaDataset) ... ok
test_data_labels_values (unit.data.metadataset_test.TestMetaDataset) ... ok
test_data_length (unit.data.metadataset_test.TestMetaDataset) ... ok
test_fails_with_non_torch_dataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_filtered_metadataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_get_item (unit.data.metadataset_test.TestMetaDataset) ... ok
test_labels_to_indices (unit.data.metadataset_test.TestMetaDataset) ... ok
test_union_metadataset (unit.data.metadataset_test.TestMetaDataset) ... ok
test_dataloader (unit.data.task_dataset_test.TestTaskDataset) ... Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_background.zip to ./data/omniglot-py/images_background.zip
Extracting ./data/omniglot-py/images_background.zip to ./data/omniglot-py
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_evaluation.zip to ./data/omniglot-py/images_evaluation.zip
Extracting ./data/omniglot-py/images_evaluation.zip to ./data/omniglot-py
0 Meta Train Accuracy 0.42500000912696123
1 Meta Train Accuracy 0.5062500112690032
2 Meta Train Accuracy 0.537500012665987
3 Meta Train Accuracy 0.43125001015141606
4 Meta Train Accuracy 0.5187500142492354
learn2learn: Maybe try with allow_nograd=True and/orallow_unused=True ?
learn2learn: Maybe try with allow_nograd=True and/or allow_unused=True ?
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_background.zip to /tmp/datasets/omniglot-py/images_background.zip
Extracting /tmp/datasets/omniglot-py/images_background.zip to /tmp/datasets/omniglot-py
Downloading https://raw.githubusercontent.com/brendenlake/omniglot/master/python/images_evaluation.zip to /tmp/datasets/omniglot-py/images_evaluation.zip
Extracting /tmp/datasets/omniglot-py/images_evaluation.zip to /tmp/datasets/omniglot-py
Downloading FC100. (160Mb)
Downloading CIFARFS to  /home/kevin/data
Creating CIFARFS splits
ok
test_infinite_tasks (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_instanciation (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_task_caching (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_task_transforms (unit.data.task_dataset_test.TestTaskDataset) ... ok
test_filter_labels (unit.data.transforms_test.TestTransforms) ... ok
test_k_shots (unit.data.transforms_test.TestTransforms) ... ok
test_load_data (unit.data.transforms_test.TestTransforms) ... ok
test_n_ways (unit.data.transforms_test.TestTransforms) ... ok
test_remap_labels (unit.data.transforms_test.TestTransforms) ... ok
test_infinite_iterator (unit.data.utils_test.DataUtilsTests) ... ok
test_partition_task (unit.data.utils_test.DataUtilsTests) ... ok
test_illegal_dimensions (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_illegal_dimensions_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_n_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_m_n_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_n_edge (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_n_edge_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_simple (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_simple_1d (unit.nn.kroneckers_test.KroneckerLinearTests) ... ok
test_cosine_distance (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_euclidean_distance (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_simple (unit.nn.protonet_test.PrototypicalClassifierTests) ... ok
test_clone_module_basics (unit.utils_test.UtilTests) ... ok
test_clone_module_models (unit.utils_test.UtilTests) ... ok
test_clone_module_nomodule (unit.utils_test.UtilTests) ... ok
test_distribution_clone (unit.utils_test.UtilTests) ... ok
test_distribution_detach (unit.utils_test.UtilTests) ... ok
test_module_clone_shared_params (unit.utils_test.UtilTests) ... ok
test_module_detach (unit.utils_test.UtilTests) ... ok
test_module_detach_keep_requires_grad (unit.utils_test.UtilTests) ... ok
test_module_update_shared_params (unit.utils_test.UtilTests) ... FAIL
test_rnn_clone (unit.utils_test.UtilTests) ... ok

======================================================================
FAIL: test_module_update_shared_params (unit.utils_test.UtilTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/kevin/Documents/umd_cp/research/open-source/learn2learn/tests/unit/utils_test.py", line 268, in test_module_update_shared_params
    self.assertTrue(
AssertionError: False is not true : clone and original do not have same number of parameters.

----------------------------------------------------------------------
Ran 62 tests in 128.143s

FAILED (failures=1)
make[1]: *** [Makefile:31: tests] Error 1
make[1]: Leaving directory '/home/kevin/Documents/umd_cp/research/open-source/learn2learn'

@seba-1511
Copy link
Copy Markdown
Member

Thanks a lot @kzhang2 -- this looks great (incl. Meta-SGD!). I'll merge and cut a new release as soon as it passes the tests.

Comment thread tests/unit/algorithms/maml_test.py Outdated
N_STEPS = 5
N_EVAL = 2

device = torch.device('cuda:0')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub actions don't have cuda. Can we shield this test with: if torch.cuda.is_available(): so it doesn't run if the machine doesn't have cuda?

Copy link
Copy Markdown
Contributor Author

@kzhang2 kzhang2 Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

self.assertTrue(hasattr(p, 'grad'))
self.assertTrue(p.grad.norm(p=2).item() > 0.0)

def test_memory_consumption(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kzhang2 and shield this test too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@seba-1511
Copy link
Copy Markdown
Member

OK, it took a bit of elbow grease but it seems to work now (I also took the opportunity to rewrite some flaky tests). Thanks for contributing this.

@seba-1511 seba-1511 merged commit 883d36a into learnables:master Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential Memory Leak Error

2 participants