Skip to content

File object is not thread-safe #130

Closed
@will133

Description

@will133

It seems like the File object you get back from tables.openFile is not thread-safe. An exmple:

import threading
import tables
import os

class OpenFileThread(threading.Thread):
    def run(self):
        f = tables.openFile('test.h5', mode='r')
        arr = f.root.testArray[8:12, 18:22]
        f.close()

if __name__  == '__main__':
    if not os.path.exists('test.h5'):
        testFile = tables.openFile('test.h5', mode='w')
        group = testFile.createCArray(testFile.root, 'testArray', tables.Int64Atom(), (200, 300))
        testFile.close()

    threads = []
    for i in xrange(10):
        t = OpenFileThread()
        t.start()
        threads.append(t)

    for t in threads:
        t.join()

I got many exceptions that look like:

Exception in thread Thread-7:
Traceback (most recent call last):
...
  File ".../site-packages/tables/file.py", line 2162, in close
    del _open_files[filename]
KeyError: 'test.h5'

It seems like Pytables is trying to cache and reuse the file handle when openFile is called. The same "File" object is returned. This is definitely not intuitive as I do not know what the expected behavior of sharing the same HDF5 handle in a threaded environment.

It also seems like the _open_file dict is hashed by file name. Thus, you can end up removing that name while another thread is closing the same file handle. I think the better way is to not cache this File object at all. Or rather, all calls to the underlying File object needs to be synchronized.

I'm running Pytables 2.3.1 with Python 2.7.2.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions