|
| 1 | +# Pillow-SIMD |
| 2 | + |
| 3 | +Pillow-SIMD is "following" Pillow fork (which is PIL fork itself). |
| 4 | + |
| 5 | +For more information about original Pillow, please |
| 6 | +[read the documentation][original-docs], |
| 7 | +[check the changelog][original-changelog] and |
| 8 | +[find out how to contribute][original-contribute]. |
| 9 | + |
| 10 | + |
| 11 | +## Why SIMD |
| 12 | + |
| 13 | +There are many ways to improve the performance of image processing. |
| 14 | +You can use better algorithms for the same task, you can make better |
| 15 | +implementation for current algorithms, or you can use more processing unit |
| 16 | +resources. It is perfect when you can just use more efficient algorithm like |
| 17 | +when gaussian blur based on convolutions [was replaced][gaussian-blur-changes] |
| 18 | +by sequential box filters. But a number of such improvements are very limited. |
| 19 | +It is also very tempting to use more processor unit resources |
| 20 | +(via parallelization) when they are available. But it is handier just |
| 21 | +to make things faster on the same resources. And that is where SIMD works better. |
| 22 | + |
| 23 | +SIMD stands for "single instruction, multiple data". This is a way to perform |
| 24 | +same operations against the huge amount of homogeneous data. |
| 25 | +Modern CPU have different SIMD instructions sets like |
| 26 | +MMX, SSE-SSE4, AVX, AVX2, AVX512, NEON. |
| 27 | + |
| 28 | +Currently, Pillow-SIMD can be [compiled](#installation) with SSE4 (default) |
| 29 | +and AVX2 support. |
| 30 | + |
| 31 | + |
| 32 | +## Status |
| 33 | + |
| 34 | +[![Uploadcare][uploadcare.logo]][uploadcare.com] |
| 35 | + |
| 36 | +Pillow-SIMD can be used in production. Pillow-SIMD has been operating on |
| 37 | +[Uploadcare][uploadcare.com] servers for more than 1 year. |
| 38 | +Uploadcare is SAAS for image storing and processing in the cloud |
| 39 | +and the main sponsor of Pillow-SIMD project. |
| 40 | + |
| 41 | +Currently, following operations are accelerated: |
| 42 | + |
| 43 | +- Resize (convolution-based resampling): SSE4, AVX2 |
| 44 | +- Gaussian and box blur: SSE4 |
| 45 | +- Alpha composition: SSE4, AVX2 |
| 46 | +- RGBA → RGBa (alpha premultiplication): SSE4, AVX2 |
| 47 | +- RGBa → RGBA (division by alpha): AVX2 |
| 48 | + |
| 49 | +See [CHANGES](CHANGES.SIMD.rst). |
| 50 | + |
| 51 | + |
| 52 | +## Benchmarks |
| 53 | + |
| 54 | +The numbers in the table represent processed megapixels of source RGB 2560x1600 |
| 55 | +image per second. For example, if resize of 2560x1600 image is done |
| 56 | +in 0.5 seconds, the result will be 8.2 Mpx/s. |
| 57 | + |
| 58 | +- Skia 53 |
| 59 | +- ImageMagick 6.9.3-8 Q8 x86_64 |
| 60 | +- Pillow 3.3.0 |
| 61 | +- Pillow-SIMD 3.3.0.post1 |
| 62 | + |
| 63 | +Operation | Filter | IM | Pillow| SIMD SSE4| SIMD AVX2| Skia 53 |
| 64 | +------------------------|---------|------|-------|----------|----------|-------- |
| 65 | +**Resize to 16x16** | Bilinear| 41.37| 337.12| 571.67| 903.40| 809.49 |
| 66 | + | Bicubic | 20.58| 185.79| 305.72| 552.85| 453.10 |
| 67 | + | Lanczos | 14.17| 113.27| 189.19| 355.40| 292.57 |
| 68 | +**Resize to 320x180** | Bilinear| 29.46| 209.06| 366.33| 558.57| 592.76 |
| 69 | + | Bicubic | 15.75| 124.43| 224.91| 353.53| 327.68 |
| 70 | + | Lanczos | 10.80| 82.25| 153.10| 244.22| 196.92 |
| 71 | +**Resize to 1920x1200** | Bilinear| 17.80| 55.87| 131.27| 152.11| 192.30 |
| 72 | + | Bicubic | 9.99| 43.64| 90.20| 112.34| 112.84 |
| 73 | + | Lanczos | 6.95| 34.51| 72.55| 103.16| 104.76 |
| 74 | +**Resize to 7712x4352** | Bilinear| 2.54| 6.71| 16.06| 20.33| 20.58 |
| 75 | + | Bicubic | 1.60| 5.51| 12.65| 16.46| 16.52 |
| 76 | + | Lanczos | 1.09| 4.62| 9.84| 13.38| 12.05 |
| 77 | +**Blur** | 1px | 6.60| 16.94| 35.16| | |
| 78 | + | 10px | 2.28| 16.94| 35.47| | |
| 79 | + | 100px | 0.34| 16.93| 35.53| | |
| 80 | + |
| 81 | + |
| 82 | +### Some conclusion |
| 83 | + |
| 84 | +Pillow is always faster than ImageMagick. And Pillow-SIMD is faster |
| 85 | +than Pillow in 2—2.5 times. In general, Pillow-SIMD with AVX2 always |
| 86 | +**8-20 times faster** than ImageMagick and almost equal to the Skia results, |
| 87 | +high-speed graphics library used in Chromium. |
| 88 | + |
| 89 | +### Methodology |
| 90 | + |
| 91 | +All tests were performed on Ubuntu 14.04 64-bit running on |
| 92 | +Intel Core i5 4258U with AVX2 CPU on the single thread. |
| 93 | + |
| 94 | +ImageMagick performance was measured with command-line tool `convert` with |
| 95 | +`-verbose` and `-bench` arguments. I use command line because |
| 96 | +I need to test the latest version and this is the easiest way to do that. |
| 97 | + |
| 98 | +All operations produce exactly the same results. |
| 99 | +Resizing filters compliance: |
| 100 | + |
| 101 | +- PIL.Image.BILINEAR == Triangle |
| 102 | +- PIL.Image.BICUBIC == Catrom |
| 103 | +- PIL.Image.LANCZOS == Lanczos |
| 104 | + |
| 105 | +In ImageMagick, the radius of gaussian blur is called sigma and the second |
| 106 | +parameter is called radius. In fact, there should not be additional parameters |
| 107 | +for *gaussian blur*, because if the radius is too small, this is *not* |
| 108 | +gaussian blur anymore. And if the radius is big this does not give any |
| 109 | +advantages but makes operation slower. For the test, I set the radius |
| 110 | +to sigma × 2.5. |
| 111 | + |
| 112 | +Following script was used for testing: |
| 113 | +https://gist.github.com/homm/f9b8d8a84a57a7e51f9c2a5828e40e63 |
| 114 | + |
| 115 | + |
| 116 | +## Why Pillow itself is so fast |
| 117 | + |
| 118 | +There are no cheats. High-quality resize and blur methods are used for all |
| 119 | +benchmarks. Results are almost pixel-perfect. The difference is only effective |
| 120 | +algorithms. Resampling in Pillow was rewritten in version 2.7 with |
| 121 | +minimal usage of floating point numbers, precomputed coefficients and |
| 122 | +cache-awareness transposition. |
| 123 | + |
| 124 | + |
| 125 | +## Why Pillow-SIMD is even faster |
| 126 | + |
| 127 | +Because of SIMD, of course. There are some ideas how to achieve even better |
| 128 | +performance. |
| 129 | + |
| 130 | +- **Efficient work with memory** Currently, each pixel is read from |
| 131 | + memory to the SSE register, while every SSE register can handle |
| 132 | + four pixels at once. |
| 133 | +- **Integer-based arithmetic** Experiments show that integer-based arithmetic |
| 134 | + does not affect the quality and increases the performance of non-SIMD code |
| 135 | + up to 50%. |
| 136 | +- **Aligned pixels allocation** Well-known that the SIMD load and store |
| 137 | + commands work better with aligned memory. |
| 138 | + |
| 139 | + |
| 140 | +## Why do not contribute SIMD to the original Pillow |
| 141 | + |
| 142 | +Well, it's not that simple. First of all, Pillow supports a large number |
| 143 | +of architectures, not only x86. But even for x86 platforms, Pillow is often |
| 144 | +distributed via precompiled binaries. To integrate SIMD in precompiled binaries |
| 145 | +we need to do runtime checks of CPU capabilities. |
| 146 | +To compile the code with runtime checks we need to pass `-mavx2` option |
| 147 | +to the compiler. However this automatically activates all `if (__AVX2__)` |
| 148 | +and below conditions. And SIMD instructions under such conditions exist |
| 149 | +even in standard C library and they do not have any runtime checks. |
| 150 | +Currently, I don't know how to allow SIMD instructions in the code |
| 151 | +but *do not allow* such instructions without runtime checks. |
| 152 | + |
| 153 | + |
| 154 | +## Installation |
| 155 | + |
| 156 | +In general, you need to do `pip install pillow-simd` as always and if you |
| 157 | +are using SSE4-capable CPU everything should run smoothly. |
| 158 | +Do not forget to remove original Pillow package first. |
| 159 | + |
| 160 | +If you want the AVX2-enabled version, you need to pass the additional flag to C |
| 161 | +compiler. The easiest way to do that is define `CC` variable while compilation. |
| 162 | + |
| 163 | +```bash |
| 164 | +$ pip uninstall pillow |
| 165 | +$ CC="cc -mavx2" pip install -U --force-reinstall pillow-simd |
| 166 | +``` |
| 167 | + |
| 168 | + |
| 169 | +## Contributing to Pillow-SIMD |
| 170 | + |
| 171 | +Pillow-SIMD and Pillow are two separate projects. |
| 172 | +Please submit bugs and improvements not related to SIMD to |
| 173 | +[original Pillow][original-issues]. All bugs and fixes in Pillow |
| 174 | +will appear in next Pillow-SIMD version automatically. |
| 175 | + |
| 176 | + |
| 177 | + [original-docs]: http://pillow.readthedocs.io/ |
| 178 | + [original-issues]: https://github.com/python-pillow/Pillow/issues/new |
| 179 | + [original-changelog]: https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst |
| 180 | + [original-contribute]: https://github.com/python-pillow/Pillow/blob/master/.github/CONTRIBUTING.md |
| 181 | + [gaussian-blur-changes]: http://pillow.readthedocs.io/en/3.2.x/releasenotes/2.7.0.html#gaussian-blur-and-unsharp-mask |
| 182 | + [uploadcare.com]: https://uploadcare.com/?utm_source=github&utm_medium=description&utm_campaign=pillow-simd |
| 183 | + [uploadcare.logo]: https://ucarecdn.com/dc4b8363-e89f-402f-8ea8-ce606664069c/-/preview/ |
0 commit comments