Closed
Description
It seems like the File object you get back from tables.openFile is not thread-safe. An exmple:
import threading
import tables
import os
class OpenFileThread(threading.Thread):
def run(self):
f = tables.openFile('test.h5', mode='r')
arr = f.root.testArray[8:12, 18:22]
f.close()
if __name__ == '__main__':
if not os.path.exists('test.h5'):
testFile = tables.openFile('test.h5', mode='w')
group = testFile.createCArray(testFile.root, 'testArray', tables.Int64Atom(), (200, 300))
testFile.close()
threads = []
for i in xrange(10):
t = OpenFileThread()
t.start()
threads.append(t)
for t in threads:
t.join()
I got many exceptions that look like:
Exception in thread Thread-7:
Traceback (most recent call last):
...
File ".../site-packages/tables/file.py", line 2162, in close
del _open_files[filename]
KeyError: 'test.h5'
It seems like Pytables is trying to cache and reuse the file handle when openFile is called. The same "File" object is returned. This is definitely not intuitive as I do not know what the expected behavior of sharing the same HDF5 handle in a threaded environment.
It also seems like the _open_file dict is hashed by file name. Thus, you can end up removing that name while another thread is closing the same file handle. I think the better way is to not cache this File object at all. Or rather, all calls to the underlying File object needs to be synchronized.
I'm running Pytables 2.3.1 with Python 2.7.2.
Metadata
Metadata
Assignees
Labels
No labels