-
Notifications
You must be signed in to change notification settings - Fork 25
Closed
Description
Hi Kyle,
I'm running your avocado code as part of putting together the PLAsTiCC validation paper, and I'm running into an error generating the augmented dataset, with pandas claiming a file isn't open.
(avocado) gnarayan@dhcp194|~/work/plasticc> avocado_augment plasticc_train plasticc_augment
Loading augmentor...
Processing the dataset in 100 chunks...
Chunk: 0%| | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/Users/gnarayan/miniconda3/envs/avocado/bin/avocado_augment", line 4, in <module>
__import__('pkg_resources').run_script('avocado==0.1', 'avocado_augment')
File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1453, in run_script
exec(code, namespace, namespace)
File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/avocado-0.1-py3.7.egg/EGG-INFO/scripts/avocado_augment", line 84, in <module>
process_chunk(augmentor, chunk, args, verbose=False)
File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/avocado-0.1-py3.7.egg/EGG-INFO/scripts/avocado_augment", line 16, in process_chunk
num_chunks=args.num_chunks,
File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/avocado-0.1-py3.7.egg/avocado/dataset.py", line 182, in load
num_chunks=num_chunks, **kwargs)
File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/avocado-0.1-py3.7.egg/avocado/utils.py", line 167, in read_dataframes
key_store = store.get_storer(key)
File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pandas/io/pytables.py", line 1249, in get_storer
group = self.get_node(key)
File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pandas/io/pytables.py", line 1239, in get_node
self._check_if_open()
File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pandas/io/pytables.py", line 1360, in _check_if_open
raise ClosedFileError("{0} file is not open!".format(self._path))
pandas.io.pytables.ClosedFileError: ./data/plasticc_train.h5 file is not open!
Now the data very much does exist, and the files are about the right size:
(avocado) gnarayan@dhcp194|~/work/plasticc> ls -lh data/
total 57899032
drwxr-xr-x 19 gnarayan staff 608B Sep 5 11:49 plasticc_raw
-rw-r--r-- 1 gnarayan staff 28G Sep 5 12:17 plasticc_test.h5
-rw-r--r-- 1 gnarayan staff 88M Sep 5 11:50 plasticc_train.h5
I don't believe this is an environment issue, but attaching the conda env below.
Have you run into this before? Alternately, if you are willing to provide a copy of the augmented dataset you used, that'd be fine as well. We're just trying to get a sense of classification performance vs original training sample, augmented sample, and an effectively infinitely large simulated sample.
Best,
-Gautham
Metadata
Metadata
Assignees
Labels
No labels