@@ -10,19 +10,19 @@ For more information about original Pillow, please
1010
1111## Why SIMD
1212
13- There are many ways to improve performance of image processing.
13+ There are many ways to improve the performance of image processing.
1414You can use better algorithms for the same task, you can make better
1515implementation for current algorithms, or you can use more processing unit
16- resources. It is perfect when you can just use more efficient algirithm like
16+ resources. It is perfect when you can just use more efficient algorithm like
1717when gaussian blur based on convolutions [ was replaced] [ gaussian-blur-changes ]
18- by sequential box filters. But a number of such improvements is very limited.
18+ by sequential box filters. But a number of such improvements are very limited.
1919It is also very tempting to use more processor unit resources
20- (via parallelization), when they are available. But it is more handy just
20+ (via parallelization) when they are available. But it is handier just
2121to make things faster on the same resources. And that is where SIMD works better.
2222
2323SIMD stands for "single instruction, multiple data". This is a way to perform
2424same operations against the huge amount of homogeneous data.
25- Modern CPU have differnt SIMD instructions sets like
25+ Modern CPU have different SIMD instructions sets like
2626MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON.
2727
2828Currently, Pillow-SIMD can be [ compiled] ( #installation ) with SSE4 (default)
@@ -38,7 +38,7 @@ and the main sponsor of Pillow-SIMD project.
3838
3939Currently, following operations are accelerated:
4040
41- - Resize (convolustion -based resample): SSE4, AVX2
41+ - Resize (convolution -based resample): SSE4, AVX2
4242- Gaussian and box blur: SSE4
4343
4444
@@ -83,17 +83,17 @@ Source | Operation | Filter | IM | Pillow | SIMD SSE4 | SIMD
8383### Some conclusion
8484
8585Pillow is always faster than ImageMagick. And Pillow-SIMD is faster
86- than Pillow in 2—2.5 time . In general, Pillow-SIMD with AVX2 almost always
86+ than Pillow in 2—2.5 times . In general, Pillow-SIMD with AVX2 almost always
8787** 10 times faster** than ImageMagick.
8888
8989### Methodology
9090
91- All tests were performed on Ubuntu 14.04 64-bit runing on
92- Intel Core i5 4258U with AVX2 CPU on single thread.
91+ All tests were performed on Ubuntu 14.04 64-bit running on
92+ Intel Core i5 4258U with AVX2 CPU on the single thread.
9393
9494ImageMagick performance was measured with command-line tool ` convert ` with
9595` -verbose ` and ` -bench ` arguments. I use command line because
96- I need to test latest version and this is the easist way to do that.
96+ I need to test the latest version and this is the easiest way to do that.
9797
9898All operations produce exactly the same results.
9999Resizing filters compliance:
@@ -102,11 +102,12 @@ Resizing filters compliance:
102102- PIL.Image.BICUBIC == Catrom
103103- PIL.Image.LANCZOS == Lanczos
104104
105- In ImageMagick the radius of gaussian blur is called sigma and second parameter
106- is called radius. In fact, there should not be additional parameters for
107- * gaussian blur* , because if the radius is too small, this is * not*
105+ In ImageMagick, the radius of gaussian blur is called sigma and the second
106+ parameter is called radius. In fact, there should not be additional parameters
107+ for * gaussian blur* , because if the radius is too small, this is * not*
108108gaussian blur anymore. And if the radius is big this does not give any
109- advantages, but makes operation slower. For test I set radius to sigma × 2.5.
109+ advantages but makes operation slower. For the test, I set the radius
110+ to sigma × 2.5.
110111
111112Following script was used for testing:
112113https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63
@@ -115,9 +116,9 @@ https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63
115116## Why Pillow itself is so fast
116117
117118There are no cheats. High-quality resize and blur methods are used for all
118- benchmarks. Results are almost pixel-perfect. The difference only in effective
119- algorithms. Resampling in Pillow was rewriten in version 2.7 with
120- minimal usage on floating point numbers, precomputed coefficients and
119+ benchmarks. Results are almost pixel-perfect. The difference is only effective
120+ algorithms. Resampling in Pillow was rewritten in version 2.7 with
121+ minimal usage of floating point numbers, precomputed coefficients and
121122cache-awareness transposition.
122123
123124
@@ -126,25 +127,25 @@ cache-awareness transposition.
126127Because of SIMD, of course. There are some ideas how to achieve even better
127128performance.
128129
129- - ** Efficient work with memory** Currently, each pixel is readed from
130+ - ** Efficient work with memory** Currently, each pixel is read from
130131 memory to the SSE register, while every SSE register can handle
131132 four pixels at once.
132133- ** Integer-based arithmetic** Experiments show that integer-based arithmetic
133- does not affects the quality and increases performance of non-SIMD code
134- up to 50%, but unfortunately give no advantages on SIMD version .
134+ does not affect the quality and increases the performance of non-SIMD code
135+ up to 50%.
135136- ** Aligned pixels allocation** Well-known that the SIMD load and store
136- commands works better with aligned memory.
137+ commands work better with aligned memory.
137138
138139
139140## Why do not contribute SIMD to the original Pillow
140141
141142Well, it's not that simple. First of all, Pillow supports a large number
142- of architectures, not only x86. But even for x86 platforms Pillow is often
143+ of architectures, not only x86. But even for x86 platforms, Pillow is often
143144distributed via precompiled binaries. To integrate SIMD in precompiled binaries
144- we need to do runtime checks of CPU capabilites .
145- To compile code with runtime checks we need to pass ` -mavx2 ` option
146- to the compiler. However this automaticaly activates all ` if (__AVX2__) `
147- and below conditions. And SIMD instructions under such conditions are exist
145+ we need to do runtime checks of CPU capabilities .
146+ To compile the code with runtime checks we need to pass ` -mavx2 ` option
147+ to the compiler. However this automatically activates all ` if (__AVX2__) `
148+ and below conditions. And SIMD instructions under such conditions exist
148149even in standard C library and they do not have any runtime checks.
149150Currently, I don't know how to allow SIMD instructions in the code
150151but * do not allow* such instructions without runtime checks.
@@ -156,7 +157,7 @@ In general, you need to do `pip install pillow-simd` as always and if you
156157are using SSE4-capable CPU everything should run smoothly.
157158Do not forget to remove original Pillow package first.
158159
159- If you want AVX2-enabled version, you need to pass additional flag to C
160+ If you want the AVX2-enabled version, you need to pass the additional flag to C
160161compiler. The easiest way to do that is define ` CC ` variable while compilation.
161162
162163``` bash
0 commit comments