-
-
Notifications
You must be signed in to change notification settings - Fork 259
Performance improvement for Reader.__shapeIndex
#52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
x2.0 speedup over master
x1.7 speedup over previous commit
x1.3 speedup over previous commit
x1.7 speedup over previous commit
…elements x1.5 speedup over previous commit Due to unexplained reasons, this seems to significantly speed up the apparently untouched read().
🐇 💨 🏃 💨 🏇 💨 If only all performance improvements were this easy to achieve. |
Tell me about it. Could not resist. From: Micah Cochran [mailto:[email protected]] by a factor of x10.2. [:rabbit2:][:dash:][:runner:][:dash:][:horse_racing:][:dash:]If only all performance improvements were this easy to achieve. — |
This PR was intended to judge how performance tweaks would be received by the maintainer(s?) of this project. If viewed upon favourably, I am planning to apply the same principles to Does this project have a maintainer that could give a tentative indication? Also: does this project have a test suite? If I blunder about in the code, I would feel better if I knew that I was not breaking thinks inadvertently. I see a |
I think @GeospatialPython is the only maintainer.
Yes, there are some good doctests in the README.txt, which is great. I had to do a little bit of work to get it to work under Travis PR #43 (mainly ran |
Absolutely! I'm open to all improvements and even other maintainers. In fact I would love to have more maintainers. My only guiding principles for this library are to keep it as a single file and to only use Python standard libraries. No folder module stricture or multiple files, no compiled C code outside the standard library, no libraries outside a standard Python install. |
@GeospatialPython Great! Thanks for the feedback. re additional maintainers: I am probably not the best person for the job as I am only solving a one-off spatial problem. Normally I have absolutely nothing to do with shapefiles, etc. Edit: never mind re python 2.6 compatibility. We can have |
x2.0 speedup over previous commit
eec5efe
to
4c0b9b4
Compare
@ARF1 how exactly have you used memoryview for those types of gains? Seems to me the only way it would help is when calling "records()" or "shapes()" to load all items at once and return them in a list. In this case the items would be loaded as usual, but then instead of returning the items in a list you return them in a memoryview as a wrapper around the list. The way i understand the benefits of memoryviews is they allow faster slicing and indexing of some preexisting buffer, so if i am correct, any speedup would be dependent on what types of action one does to the returned list, so not necessarily faster loading or iteration? Or maybe there was another way you were thinking? Re py26, it should still be possible to implement memoryview without losing 2.6, returning data as original list if 2.6, or wrapped in a memoryview for later versions.
|
x378 speedup over master with numpy available x22 speedup over master without numpy
c42ad69
to
fb25a04
Compare
@karimbahgat The performance gains stem not primarily from the use of
Edit: The memoryview speed-up and the two identified points above were artefacts of my profiler. Use of memoryview has not performance implications what-so-ever. All other performance improvements hold up. |
@ARF1 Can you provide the code that you used in order to benchmark this? |
@micahcochran I am sorry but I moved on from this project. As I explained above, I was only solving a one-off spatial problem. Also I found that ShapeIndex is much easier to optimize than the efficient pipe-lining to Shapely. For the latter (where the true perfomance gains lie) non-trivial changes to the codebase are required. But you can rig up the test-code yourself easily:
This should first build the ShapeIndex before returning the data. Voila... |
It seems they just might be. Based on the principle of batch reading/unpacking as in this PR, with only a few lines of trivial changes it was possible to get 5-8x performance boost for shapes, and 15-20x for listrecords. See #62 . And in case @ARF1's PR stalls due to the numpy inclusion (albeit optional), I also added an even more minimal version of the shapindex speedup. |
Actually, if you're still open for additional maintainers @GeospatialPython I wouldn't mind helping out? I can't commit to anything specifically, but I'd likely be able to sort out some of the issues and PRs from time to time. I rely on pyshp on an almost daily basis, and like you I want pyshp to stay simple and would try to keep any changes minimal. |
Implemented without numpy in #62. |
Improves the performance of
Reader.__shapeIndex
.x305 speedup over master with numpy available
x19 speedup over master without numpy
(Benchmarked on CPython 3.4.4 on Windows 8.1)