Skip to content

Commit b6263a1

Browse files
committed
fix list (+6 squashed commits)
Squashed commits: [c45b871] update for Pillow-SIMD 3.4.0 [bedd83f] no alpha compositing in this release [e8fe730] update results for latest version add Skia results [a16ff97] add SIMD changes [82ffbd6] fix readme (+4 squashed commits) Squashed commits: [85677f9] fix error [f44ebb1] update results for unrolled implementation [83968c3] fix #4 [cd73c51] update link (+11 squashed commits) Squashed commits: [5882178] correct spelling [a0e5956] Why Pillow-SIMD is even faster [108e72e] Why Pillow itself is so fast [e8eeda1] spelling fixes [e816e9c] spelling [d2eefef] methodology, why not contributed [2e55786] installation and conclusion [9f6415e] more info [67e55b7] more benchmarks test files [471d4c5] remove spaces [904d89d] add performance tests [4fe17fe] simple readme
1 parent c28bf86 commit b6263a1

File tree

3 files changed

+251
-1
lines changed

3 files changed

+251
-1
lines changed

CHANGES.SIMD.rst

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
Changelog (Pillow-SIMD)
2+
=======================
3+
4+
3.3.0.post1
5+
-----------
6+
7+
Alpha compositing
8+
~~~~~~~~~~~~~~~~~
9+
10+
- SSE4 and AVX2 fixed-point full loading implementation.
11+
Up to 4.6x faster.
12+
13+
3.3.0.post0
14+
-----------
15+
16+
Resampling
17+
~~~~~~~~~~
18+
19+
- SSE4 and AVX2 fixed-point full loading horizontal pass.
20+
- SSE4 and AVX2 fixed-point full loading vertical pass.
21+
22+
Convertion
23+
~~~~~~~~~~
24+
25+
- RGBA -> RGBa SSE4 and AVX2 fixed-point full loading implementations.
26+
Up to 2.6x faster.
27+
- RGBa -> RGBA AVX2 implementation using gather instructions.
28+
Up to 5x faster.
29+
30+
31+
3.2.0.post3
32+
-----------
33+
34+
Resampling
35+
~~~~~~~~~~
36+
37+
- SSE4 and AVX2 float full loading horizontal pass.
38+
- SSE4 float full loading vertical pass.
39+
40+
41+
3.2.0.post2
42+
-----------
43+
44+
Resampling
45+
~~~~~~~~~~
46+
47+
- SSE4 and AVX2 float full loading horizontal pass.
48+
- SSE4 float per-pixel loading vertical pass.
49+
50+
51+
2.9.0.post1
52+
-----------
53+
54+
Resampling
55+
~~~~~~~~~~
56+
57+
- SSE4 and AVX2 float per-pixel loading horizontal pass.
58+
- SSE4 float per-pixel loading vertical pass.
59+
- SSE4: Up to 2x for downscaling. Up to 3.5x for upscaling.
60+
- AVX2: Up to 2.7x for downscaling. Up to 3.5x for upscaling.
61+
62+
63+
Box blur
64+
~~~~~~~~
65+
66+
- Simple SSE4 fixed-point implementations with per-pixel loading.
67+
- Up to 2.1x faster.

README.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Pillow-SIMD
2+
3+
Pillow-SIMD is "following" Pillow fork (which is PIL fork itself).
4+
5+
For more information about original Pillow, please
6+
[read the documentation][original-docs],
7+
[check the changelog][original-changelog] and
8+
[find out how to contribute][original-contribute].
9+
10+
11+
## Why SIMD
12+
13+
There are many ways to improve the performance of image processing.
14+
You can use better algorithms for the same task, you can make better
15+
implementation for current algorithms, or you can use more processing unit
16+
resources. It is perfect when you can just use more efficient algorithm like
17+
when gaussian blur based on convolutions [was replaced][gaussian-blur-changes]
18+
by sequential box filters. But a number of such improvements are very limited.
19+
It is also very tempting to use more processor unit resources
20+
(via parallelization) when they are available. But it is handier just
21+
to make things faster on the same resources. And that is where SIMD works better.
22+
23+
SIMD stands for "single instruction, multiple data". This is a way to perform
24+
same operations against the huge amount of homogeneous data.
25+
Modern CPU have different SIMD instructions sets like
26+
MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
27+
28+
Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default)
29+
and AVX2 support.
30+
31+
32+
## Status
33+
34+
[![Uploadcare][uploadcare.logo]][uploadcare.com]
35+
36+
Pillow-SIMD can be used in production. Pillow-SIMD has been operating on
37+
[Uploadcare][uploadcare.com] servers for more than 1 year.
38+
Uploadcare is SAAS for image storing and processing in the cloud
39+
and the main sponsor of Pillow-SIMD project.
40+
41+
Currently, following operations are accelerated:
42+
43+
- Resize (convolution-based resampling): SSE4, AVX2
44+
- Gaussian and box blur: SSE4
45+
- Alpha composition: SSE4, AVX2
46+
- RGBA → RGBa (alpha premultiplication): SSE4, AVX2
47+
- RGBa → RGBA (division by alpha): AVX2
48+
49+
See [CHANGES](CHANGES.SIMD.rst).
50+
51+
52+
## Benchmarks
53+
54+
The numbers in the table represent processed megapixels of source RGB 2560x1600
55+
image per second. For example, if resize of 2560x1600 image is done
56+
in 0.5 seconds, the result will be 8.2 Mpx/s.
57+
58+
- Skia 53
59+
- ImageMagick 6.9.3-8 Q8 x86_64
60+
- Pillow 3.3.0
61+
- Pillow-SIMD 3.3.0.post1
62+
63+
Operation | Filter | IM | Pillow| SIMD SSE4| SIMD AVX2| Skia 53
64+
------------------------|---------|------|-------|----------|----------|--------
65+
**Resize to 16x16** | Bilinear| 41.37| 337.12| 571.67| 903.40| 809.49
66+
| Bicubic | 20.58| 185.79| 305.72| 552.85| 453.10
67+
| Lanczos | 14.17| 113.27| 189.19| 355.40| 292.57
68+
**Resize to 320x180** | Bilinear| 29.46| 209.06| 366.33| 558.57| 592.76
69+
| Bicubic | 15.75| 124.43| 224.91| 353.53| 327.68
70+
| Lanczos | 10.80| 82.25| 153.10| 244.22| 196.92
71+
**Resize to 1920x1200** | Bilinear| 17.80| 55.87| 131.27| 152.11| 192.30
72+
| Bicubic | 9.99| 43.64| 90.20| 112.34| 112.84
73+
| Lanczos | 6.95| 34.51| 72.55| 103.16| 104.76
74+
**Resize to 7712x4352** | Bilinear| 2.54| 6.71| 16.06| 20.33| 20.58
75+
| Bicubic | 1.60| 5.51| 12.65| 16.46| 16.52
76+
| Lanczos | 1.09| 4.62| 9.84| 13.38| 12.05
77+
**Blur** | 1px | 6.60| 16.94| 35.16| |
78+
| 10px | 2.28| 16.94| 35.47| |
79+
| 100px | 0.34| 16.93| 35.53| |
80+
81+
82+
### Some conclusion
83+
84+
Pillow is always faster than ImageMagick. And Pillow-SIMD is faster
85+
than Pillow in 2—2.5 times. In general, Pillow-SIMD with AVX2 always
86+
**8-20 times faster** than ImageMagick and almost equal to the Skia results,
87+
high-speed graphics library used in Chromium.
88+
89+
### Methodology
90+
91+
All tests were performed on Ubuntu 14.04 64-bit running on
92+
Intel Core i5 4258U with AVX2 CPU on the single thread.
93+
94+
ImageMagick performance was measured with command-line tool `convert` with
95+
`-verbose` and `-bench` arguments. I use command line because
96+
I need to test the latest version and this is the easiest way to do that.
97+
98+
All operations produce exactly the same results.
99+
Resizing filters compliance:
100+
101+
- PIL.Image.BILINEAR == Triangle
102+
- PIL.Image.BICUBIC == Catrom
103+
- PIL.Image.LANCZOS == Lanczos
104+
105+
In ImageMagick, the radius of gaussian blur is called sigma and the second
106+
parameter is called radius. In fact, there should not be additional parameters
107+
for *gaussian blur*, because if the radius is too small, this is *not*
108+
gaussian blur anymore. And if the radius is big this does not give any
109+
advantages but makes operation slower. For the test, I set the radius
110+
to sigma × 2.5.
111+
112+
Following script was used for testing:
113+
https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63
114+
115+
116+
## Why Pillow itself is so fast
117+
118+
There are no cheats. High-quality resize and blur methods are used for all
119+
benchmarks. Results are almost pixel-perfect. The difference is only effective
120+
algorithms. Resampling in Pillow was rewritten in version 2.7 with
121+
minimal usage of floating point numbers, precomputed coefficients and
122+
cache-awareness transposition.
123+
124+
125+
## Why Pillow-SIMD is even faster
126+
127+
Because of SIMD, of course. There are some ideas how to achieve even better
128+
performance.
129+
130+
- **Efficient work with memory** Currently, each pixel is read from
131+
memory to the SSE register, while every SSE register can handle
132+
four pixels at once.
133+
- **Integer-based arithmetic** Experiments show that integer-based arithmetic
134+
does not affect the quality and increases the performance of non-SIMD code
135+
up to 50%.
136+
- **Aligned pixels allocation** Well-known that the SIMD load and store
137+
commands work better with aligned memory.
138+
139+
140+
## Why do not contribute SIMD to the original Pillow
141+
142+
Well, it's not that simple. First of all, Pillow supports a large number
143+
of architectures, not only x86. But even for x86 platforms, Pillow is often
144+
distributed via precompiled binaries. To integrate SIMD in precompiled binaries
145+
we need to do runtime checks of CPU capabilities.
146+
To compile the code with runtime checks we need to pass `-mavx2` option
147+
to the compiler. However this automatically activates all `if (__AVX2__)`
148+
and below conditions. And SIMD instructions under such conditions exist
149+
even in standard C library and they do not have any runtime checks.
150+
Currently, I don't know how to allow SIMD instructions in the code
151+
but *do not allow* such instructions without runtime checks.
152+
153+
154+
## Installation
155+
156+
In general, you need to do `pip install pillow-simd` as always and if you
157+
are using SSE4-capable CPU everything should run smoothly.
158+
Do not forget to remove original Pillow package first.
159+
160+
If you want the AVX2-enabled version, you need to pass the additional flag to C
161+
compiler. The easiest way to do that is define `CC` variable while compilation.
162+
163+
```bash
164+
$ pip uninstall pillow
165+
$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd
166+
```
167+
168+
169+
## Contributing to Pillow-SIMD
170+
171+
Pillow-SIMD and Pillow are two separate projects.
172+
Please submit bugs and improvements not related to SIMD to
173+
[original Pillow][original-issues]. All bugs and fixes in Pillow
174+
will appear in next Pillow-SIMD version automatically.
175+
176+
177+
[original-docs]: http://pillow.readthedocs.io/
178+
[original-issues]: https://github.com/python-pillow/Pillow/issues/new
179+
[original-changelog]: https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst
180+
[original-contribute]: https://github.com/python-pillow/Pillow/blob/master/.github/CONTRIBUTING.md
181+
[gaussian-blur-changes]: http://pillow.readthedocs.io/en/3.2.x/releasenotes/2.7.0.html#gaussian-blur-and-unsharp-mask
182+
[uploadcare.com]: https://uploadcare.com/?utm_source=github&utm_medium=description&utm_campaign=pillow-simd
183+
[uploadcare.logo]: https://ucarecdn.com/dc4b8363-e89f-402f-8ea8-ce606664069c/-/preview/

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -760,7 +760,7 @@ def debug_build():
760760
setup(name=NAME,
761761
version=PILLOW_VERSION,
762762
description='Python Imaging Library (Fork)',
763-
long_description=_read('README.rst').decode('utf-8'),
763+
long_description=_read('README.md').decode('utf-8'),
764764
author='Alex Clark (Fork Author)',
765765
author_email='[email protected]',
766766
url='http://python-pillow.org',

0 commit comments

Comments
 (0)