Error when augmenting dataset (single machine)

Hi Kyle,

I'm running your avocado code as part of putting together the PLAsTiCC validation paper, and I'm running into an error generating the augmented dataset, with pandas claiming a file isn't open.


```
(avocado) gnarayan@dhcp194|~/work/plasticc> avocado_augment plasticc_train plasticc_augment
Loading augmentor...
Processing the dataset in 100 chunks...
Chunk:   0%|                                                                                                                                                                                                                                                                                                                                        | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/gnarayan/miniconda3/envs/avocado/bin/avocado_augment", line 4, in <module>
    __import__('pkg_resources').run_script('avocado==0.1', 'avocado_augment')
  File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1453, in run_script
    exec(code, namespace, namespace)
  File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/avocado-0.1-py3.7.egg/EGG-INFO/scripts/avocado_augment", line 84, in <module>
    process_chunk(augmentor, chunk, args, verbose=False)
  File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/avocado-0.1-py3.7.egg/EGG-INFO/scripts/avocado_augment", line 16, in process_chunk
    num_chunks=args.num_chunks,
  File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/avocado-0.1-py3.7.egg/avocado/dataset.py", line 182, in load
    num_chunks=num_chunks, **kwargs)
  File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/avocado-0.1-py3.7.egg/avocado/utils.py", line 167, in read_dataframes
    key_store = store.get_storer(key)
  File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pandas/io/pytables.py", line 1249, in get_storer
    group = self.get_node(key)
  File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pandas/io/pytables.py", line 1239, in get_node
    self._check_if_open()
  File "/Users/gnarayan/miniconda3/envs/avocado/lib/python3.7/site-packages/pandas/io/pytables.py", line 1360, in _check_if_open
    raise ClosedFileError("{0} file is not open!".format(self._path))
pandas.io.pytables.ClosedFileError: ./data/plasticc_train.h5 file is not open!
```

Now the data very much does exist, and the files are about the right size:
```
(avocado) gnarayan@dhcp194|~/work/plasticc> ls -lh data/
total 57899032
drwxr-xr-x  19 gnarayan  staff   608B Sep  5 11:49 plasticc_raw
-rw-r--r--   1 gnarayan  staff    28G Sep  5 12:17 plasticc_test.h5
-rw-r--r--   1 gnarayan  staff    88M Sep  5 11:50 plasticc_train.h5
```

I don't believe this is an environment issue, but attaching the conda env below. 

[avocado.txt](https://github.com/kboone/avocado/files/3580727/avocado.txt)


Have you run into this before? Alternately, if you are willing to provide a copy of the augmented dataset you used, that'd be fine as well. We're just trying to get a sense of classification performance vs original training sample, augmented sample, and an effectively infinitely large simulated sample. 

Best,
-Gautham


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error when augmenting dataset (single machine) #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Error when augmenting dataset (single machine) #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions