Skip to content

toBuffer optimization #306

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kangax opened this issue Jul 26, 2013 · 42 comments
Closed

toBuffer optimization #306

kangax opened this issue Jul 26, 2013 · 42 comments

Comments

@kangax
Copy link
Collaborator

kangax commented Jul 26, 2013

I ran into performance issues with toBuffer. Here's an example of drawing (rather) large image onto canvas, then calling toBuffer. toBuffer is what's taking the longest here (drawImage, for example, is insignificant). I'm getting ~7-7.5 sec on toBuffer call. This is on node v0.10.13, node-canvas v1.0.3, and Mac OS X (10.7.5).

Is there a way to speed things up? Anything at all? I also tried async toBuffer but it doesn't really help.

var fs = require('fs'),
    Canvas = require('canvas'),
    Image = Canvas.Image;

// 3.9MB, 4686x4668
fs.readFile(__dirname + '/big_img.jpg', function(err, bigImg) {
  if (err) throw err;

  img = new Image;
  img.src = bigImg;

  var canvas = new Canvas(img.width, img.height);
  var ctx = canvas.getContext('2d');

  ctx.drawImage(img, 0, 0);

  console.time('toBuffer');
  canvas.toBuffer();
  console.timeEnd('toBuffer'); // 7196ms, 7435ms, 7237ms
});

Attaching image used in a test.

big_img

@tj
Copy link
Contributor

tj commented Jul 26, 2013

yikes that does seem pretty slow, I can't remember if we have .toBuffer('image/jpeg'), I don't think so but you might want to try the jpeg stream thing instead

@kangax
Copy link
Collaborator Author

kangax commented Jul 26, 2013

You're right JPEG stream is almost 10 times faster — I'm getting ~900ms consistently even for 100% quality. I'll see if I can use JPEG instead of PNG, but would still be curious to see if we can optimize PNG somehow.

Offtopic: there's pngStream and createJPEGStream. Inconsistency! :)

@rvagg
Copy link
Contributor

rvagg commented Jul 27, 2013

I'll have a look at this to see if I can understand it; just cause it's interesting and because we have some new Buffer stuff added into the next NAN release.

@rvagg
Copy link
Contributor

rvagg commented Jul 27, 2013

oh, and I can see an additional memcpy that I left in with my NAN PR that may be contributing here. @kangax can you confirm you're using the latest node-canvas?

@rvagg
Copy link
Contributor

rvagg commented Jul 27, 2013

Ouch, not what I expected. See flame graph.

This memcpy is dominating the stack, it's actually called ~11,000 times for @kangax's example image.

From what I can glean, this is a libpng issue where it's writing the image in many small chunks.

@rvagg
Copy link
Contributor

rvagg commented Jul 27, 2013

By default, the data comes out of libpng in these chunks: 4b, 8b, 8192b. There's a memcpy for each of these chunks and it goes on for the whole image. I don't believe you can control this behaviour, it'd be nice if you could increase the buffer or have more control over the buffer allocation and usage but I can't see a way that you can.

@kangax
Copy link
Collaborator Author

kangax commented Jul 27, 2013

@rvagg Interesting. Thanks for investigating. What about compression level? Is there a way to change it? Would it affect performance at all or is the bottleneck purely in data output?

@rvagg
Copy link
Contributor

rvagg commented Jul 27, 2013

I don't believe you can control the compression level via Cairo without reimplementing the write_png() function. http://cgit.freedesktop.org/cairo/tree/src/cairo-png.c#n170 the png_struct *png; is the object you need to set png_set_compression_level() on: http://www.libpng.org/pub/png/libpng-1.2.5-manual.html#section-5.8 but you can't access it.

In fact, just to confirm that this is a libpng/Cairo problem, I did this in Canvas.cc:

    //status = cairo_surface_write_to_png_stream(canvas->surface(), toBuffer, &closure);
    status = cairo_surface_write_to_png(canvas->surface(), "/tmp/tmp.png");

which bypasses the whole memcpy-to-buffer stuff completely, the toBuffer callback isn't invoked at all, Cairo simply calls its write_png() and does an fwrite() for each chunk that libpng returns.

Timing the execution shows that it takes roughly the same amount of time as the original.

@kkoopa
Copy link
Contributor

kkoopa commented Jul 27, 2013

What about rewriting it to use a buffered dynamic memory implementation instead of just doing malloc each time? Each buffer would be backed by a dynamic array that doubled in size each time it didn't have space. It would be possible to give a starting size, or it should default to... 10, like Java's ArrayList.

@rvagg
Copy link
Contributor

rvagg commented Jul 28, 2013

Well it sort of is at the moment, it starts off at 1024 bytes and then does a realloc each time it overflows and rounds up to the closest multiple of 1024. I had some success on a slower (SmartOS) VM by bumping this up 8192, speeding it up by nearly a factor of two, but it had no impact on speed on my much faster Linux machine.
Unfortunately you can't control the buffer that libpng uses internally to write the data to, it'd be nice to be able to intercept that and use your own buffer mechanism but I don't think that's possible, even with a local reimplementation of Cairo's write_png().

@kkoopa
Copy link
Contributor

kkoopa commented Jul 28, 2013

If a normal page size is 4k, it would make slightly more sense to do things in increments of that, rather than in 1k chunks. However, if the required size is large, it will be much faster to do it by exponential than linear growth, as it will lead to O(log n) mallocs rather than O(n) mallocs.

4 MB in 1K chunks
Linear: 4 MB / 1 kB = 4096 mallocs
Exponential: log 2^22 - log 2^10 + 1 = 13 mallocs

4 MB in 4K chunks
Linear: 4 MB / 4 kB = 1024 mallocs
Exponential: log 2^22 - log 2^12 + 1 = 11 mallocs

Anyway, LodePNG http://lodev.org/lodepng/ is a highly customizable PNG library. Could libpng be replaced? Might be possible to write a libpng facade on top, if one does not exist already.

@kkoopa
Copy link
Contributor

kkoopa commented Jul 28, 2013

I did some checking of my own, the solution seems rather straightforward. Starting with a bit from cairo_png.c

/**
 * SECTION:cairo-png
 * @Title: PNG Support
 * @Short_Description: Reading and writing PNG images
 * @See_Also: #cairo_surface_t
 *
 * The PNG functions allow reading PNG images into image surfaces, and writing
 * any surface to a PNG file.
 *
 * It is a toy API. It only offers very simple support for reading and
 * writing PNG files, which is sufficient for testing and
 * demonstration purposes. Applications which need more control over
 * the generated PNG file should access the pixel data directly, using
 * cairo_image_surface_get_data() or a backend-specific access
 * function, and process it with another library, e.g. gdk-pixbuf or
 * libpng.
 **/

The reason it works in this bad fashion is because it does not use buffering with libpng.

static void
png_simple_output_flush_fn (png_structp png_ptr)
{
}

So, the solution is to use cairo_image_surface_get_data() and feed it through libpng with buffering.

By the way, libpng does support custom memory management. If you want to use your own memory allocation routines, define PNG_USER_MEM_SUPPORTED and use png_create_read_struct_2() instead of png_create_read_struct():

png_structp png_ptr = png_create_read_struct_2
       (PNG_LIBPNG_VER_STRING, (png_voidp)user_error_ptr,
        user_error_fn, user_warning_fn, (png_voidp)
        user_mem_ptr, user_malloc_fn, user_free_fn);

@rvagg
Copy link
Contributor

rvagg commented Jul 29, 2013

Nice find! It sounds like a lot of work, are you offering?

@kkoopa
Copy link
Contributor

kkoopa commented Jul 29, 2013

Shouldn't be too much. I'll cook something up later today..

@kkoopa
Copy link
Contributor

kkoopa commented Jul 29, 2013

PNG writing code has been extracted: kkoopa/node-canvas@9981b0f
Everything is still very slow, as there is no custom memory management or flushing behavior set up.

@kkoopa
Copy link
Contributor

kkoopa commented Jul 29, 2013

The memory stuff and flushing and whatnot do not matter much. Neither does zlib's compression buffer size (8192). It is all about PNG compression being slow as hell, especially with filtering. I added a second argument to Canvas::toBuffer, which allows setting the zlib compression level (0 - 9), default being 6.

$ node canvastest.js --level=6
toBuffer: 9401ms

$ node canvastest.js --level=3
toBuffer: 4416ms

$ node canvastest.js --level=0
toBuffer: 2175ms

kkoopa/node-canvas@e67cdee

I can clean it up and make a pull request if everybody's happy.

@kkoopa
Copy link
Contributor

kkoopa commented Jul 29, 2013

I added the ability to optionally set the row filters to use. For fastest results, use no filtering and no compression.

$ node canvastest.js 0 NO_FILTERS
toBuffer: 2156ms

The resulting picture is 84 MB.

@tj
Copy link
Contributor

tj commented Jul 29, 2013

are you using @kangax's image? I thought these numbers sounded pretty crazy but check this out:

 time convert out.jpg out.png 

real    0m22.789s
user    0m22.669s
sys 0m0.099s

haha.. certainly inherent to png I guess

@kkoopa
Copy link
Contributor

kkoopa commented Jul 29, 2013

Yes, I am using that image, and yes -- I know for a fact that PNG compression is very heavy. I did some measurements a couple of months ago and switched to jpeg in some of my canvas projects.

Part of the reason it is slow is that the DEFLATE algorithm used by libpng through zlib cannot be effectively parallellized, contrary to JPEG compression, which quite easily runs on a GPU (OpenGL, OpenCL) and involves more of matrix operations.

@tj
Copy link
Contributor

tj commented Jul 29, 2013

I knew it was worse but 1/s per megapixel is pretty harsh

@kkoopa
Copy link
Contributor

kkoopa commented Jul 29, 2013

It is only using one thread and likely thrashing the cache somewhat too. Also remember that there still is overhead compared to a normal bitmap format from computing a CRC_32 for each 8k chunk.

@kangax
Copy link
Collaborator Author

kangax commented Jul 30, 2013

I can clean it up and make a pull request if everybody's happy.

The performance gains look amazing. Thanks @kkoopa! I'm definitely happy with this if @visionmedia is ok with pull request and such.

@kangax
Copy link
Collaborator Author

kangax commented Aug 4, 2013

@kkoopa @visionmedia how is compression level passed to pngStream/toBuffer in 1.1.0?

@kkoopa
Copy link
Contributor

kkoopa commented Aug 4, 2013

toBuffer and streamPNGSync now accept two optional arguments tagged on at the end in order not to break backwards compatibility with the old API. This is not the most elegant way of structuring it, but a decent compromise.

Illustrating with toBuffer, its definition is now toBuffer([cb, compression_level=6, filter_mask=canvas.PNG_ALL_FILTERS]). compression_level is an integer in the interval [0, 9] and filter_mask is a combination of the newly exposed constants

canvas.PNG_NO_FILTERS
canvas.PNG_FILTER_NONE
canvas.PNG_FILTER_SUB
canvas.PNG_FILTER_UP
canvas.PNG_FILTER_AVG
canvas.PNG_FILTER_PAETH
canvas.PNG_ALL_FILTERS

which correspond 1:1 to those in libpng.

So, in order to make a synchronous toBuffer call, you do canvas.toBuffer(undefined, 6, canvas.PNG_ALL_FILTERS); These values for compression level and filter settings are the defaults, which are used if unspecified.

@kangax
Copy link
Collaborator Author

kangax commented Aug 4, 2013

Gotcha. FWIW, tested it with original image and got these fun numbers (time taken + size for each compression level):

toBuffer 0: 2034ms
83.58MB
toBuffer 1: 3137ms
33.88MB
toBuffer 2: 3325ms
33.39MB
toBuffer 3: 4185ms
32.49MB
toBuffer 4: 4861ms
33.77MB
toBuffer 5: 6077ms
32.46MB
toBuffer 6: 9464ms
31.39MB
toBuffer 7: 12958ms
31.04MB
toBuffer 8: 29828ms
30.63MB
toBuffer 9: 41531ms
30.53MB

@kkoopa
Copy link
Contributor

kkoopa commented Aug 4, 2013

Mmm, seems about right, but you left out level 0, resulting in 84 something MB. If you turn off filtering you should gain a couple hundred ms. PNG docs say zlib levels 3-6 have been determined best for normal PNGs, however in this case there is very little size increase between 3 and 1, but a 25 % time reduction -- so I would go with level 1. I guess this issue can be closed then.

@kangax
Copy link
Collaborator Author

kangax commented Aug 4, 2013

Definitely 1 seems the best as far as size/time ratio. I edited the comment to include level 0. It's the fastest but size is way too big.

@kangax kangax closed this as completed Aug 4, 2013
@kangax
Copy link
Collaborator Author

kangax commented Aug 6, 2013

Just noticed that travis build fails when trying to install 1.1.0 — https://travis-ci.org/kangax/fabric.js/jobs/9902326#L463

In file included from ../src/Canvas.cc:8:0:
../src/PNG.h: In function ‘cairo_status_t canvas_write_png(cairo_surface_t*, png_rw_ptr, void*)’:
../src/PNG.h:139:10: error: ‘CAIRO_FORMAT_RGB30’ was not declared in this scope

It works for me locally (on OS X 10.7.5)

@kkoopa
Copy link
Contributor

kkoopa commented Aug 6, 2013

See #312. It would be possible to simply add

#ifndef CAIRO_FORMAT_RGB30
  #define CAIRO_FORMAT_RGB30 5
#endif

but this might cover up other serious problem (or it might work fine. For what it's worth, I've tried it like that on Windows, as I couldn't find any newer prebuilt libcairo binaries, and it seems to work).

@kangax
Copy link
Collaborator Author

kangax commented Aug 6, 2013

Looks like CAIRO_FORMAT_RGB30 was introduced in 1.12. I have 1.12.14 installed locally. Not sure about the one on Travis' Ubuntu.

@kkoopa
Copy link
Contributor

kkoopa commented Aug 6, 2013

It is reasonable to assume it is Ubuntu 12.04, which only has 1.10.something IIRC. Include a newer libpng in your project and make it link to that and use its includes instead.

@kkoopa
Copy link
Contributor

kkoopa commented Aug 6, 2013

#315 should rectify this

@springmeyer
Copy link

In terms of PNG compression speed one way to mitigate the overhead of zlib is to quantize the image first, reducing the total # of colors and effectively reducing the work zlib needs to do. This is what Mapnik does and these are the results with the above image:

64 color png / z1 2049ms 7.6mb
256 color png / z1 2214ms 11.92mb
256 color png / z6 3594ms 11.23mb
full color png / z1 2198ms 42.61mb
full color png / z6 4322ms 39.89mb

Note: z1 here means compression=1 passed to zlib and quantization is done with this code: https://github.com/mapnik/mapnik/blob/master/include/mapnik/hextree.hpp.

Testcase using node-mapnik bindings (npm install mapnik):

var mapnik = require('mapnik');
var bytes = require('bytes');
var fs = require('fs');

var image = new mapnik.Image.open('b4f4d900-f5f7-11e2-888e-2c2289c6edd0.jpg')

var tests = [
 { name:'64 color png / z1', format:'png8:c=64:z=1' },
 { name:'256 color png / z1', format:'png8:z=1' },
 { name:'256 color png / z6', format:'png8:z=6' },
 { name:'full color png / z1', format:'png32:z=1' },
 { name:'full color png / z6', format:'png32:z=6' },
]

tests.forEach(function(test) {
    var start = +new Date;
    var buffer = image.encodeSync(test.format);
    var elapsed = new Date-start;
    var size = bytes(buffer.length);
    fs.writeFileSync(test.format+'.png',buffer);
    console.log(test.name, elapsed+'ms', size);
});

@springmeyer
Copy link

Here is the 64 color png / z1 image output for reference:

png8 c 64 z 1

@TooTallNate
Copy link
Contributor

We got a Burner on our hands 🔥 :D

@kkoopa
Copy link
Contributor

kkoopa commented Mar 8, 2014

Nice results, I must say, if the color loss is acceptable (easily noticable at full resolution, red houses and such are no long red). Storing photos as PNGs does not make too much sense itself, but I wonder what the gains are when encoding an image with large swaths of not too many colors. Does normal full-color png compression still take a lot more time than the color-reduced version?

@springmeyer
Copy link

@kkoopa - Yes, I agree that PNG is not a good format for photos. But yes: images with a limited set of colors (like tiles for maps) but that still needing alpha transparency (for compositing): this is where I prefer PNG. For example:

https://foursquare.com/explore?mode=url&near=San%20Francisco%2C%20CA

That is compressed to 64 colors and still looks good.

@springmeyer
Copy link

Looks like webp is the real winner (of course if your browser supports it):

64 color png / z1 1788ms 7.6mb
256 color png / z1 2010ms 11.92mb
256 color png / z6 3393ms 11.23mb
full color png / z1 2002ms 42.61mb
full color png / z6 3889ms 39.89mb
jpeg80 598ms 4.07mb
jpeg90 610ms 5.34mb
jpeg100 748ms 10.55mb
webp 1158ms 2.9mb
webp 1070ms 2.28mb
var mapnik = require('mapnik');
var bytes = require('bytes');
var fs = require('fs');

var image = new mapnik.Image.open('b4f4d900-f5f7-11e2-888e-2c2289c6edd0.jpg');

var tests = [
 { name:'64 color png / z1', ext: '.png', format:'png8:c=64:z=1' },
 { name:'256 color png / z1', ext: '.png', format:'png8:z=1' },
 { name:'256 color png / z6', ext: '.png', format:'png8:z=6' },
 { name:'full color png / z1', ext: '.png', format:'png32:z=1' },
 { name:'full color png / z6', ext: '.png', format:'png32:z=6' },
 { name:'jpeg80', ext: '.jpg', format:'jpeg80' },
 { name:'jpeg90', ext: '.jpg', format:'jpeg90' },
 { name:'jpeg100', ext: '.jpg', format:'jpeg100' },
 { name:'webp', ext: '.webp', format:'webp:image_hint=2:quality=75:method=0' },
 { name:'webp', ext: '.webp', format:'webp:image_hint=2:quality=55:method=0' },
]

tests.forEach(function(test) {
    var start = +new Date;
    var buffer = image.encodeSync(test.format);
    var elapsed = new Date-start;
    var size = bytes(buffer.length);
    fs.writeFileSync('out/'+test.format+test.ext,buffer);
    console.log(test.name, elapsed+'ms', size);
});

Download the webp quality:55 from here and open in chrome.

@benzmuircroft
Copy link

#562 webp is 13% in size of a png and 87% space saved! If webp support is not provided by the users choice in browser then the developer will default to png fallback.

Webp is supported on Chrome, so users of chrome should be advantaged with lower size images and a faster browsing experience.

The comparison in size is too great to ignore, faster web experience when available is gold! Yet so much effort goes to supporting legacy browsers.

I could ask the client if they have Chrome browser then if not - in node use image/png

@adamhooper
Copy link
Contributor

Isn't the obvious solution to make this library not encode image data?

In my dream world, toBuffer() returns a Buffer of native-endian ARGB32 pixel data: no encoding whatsoever. That would give users their choice of encoding libraries, at the cost of a single memcpy().

Side-benefits: node-canvas would be easier to maintain and wouldn't depend on libpng or libjpeg.

Obviously, that's a breaking change. During transition, toBuffer('argb32') would be backward-compatible....

@zbjornson
Copy link
Collaborator

@adamhooper getImageData returns a Uint8ClampedArray of RGBA values, which sounds like what your asking for, except for the color depth?

@adamhooper
Copy link
Contributor

@zbjornson Ooh, didn't see that method.

It's similar, but not the same: the purpose is completely different. getImageData() is built to be compatible with HTML5; getBuffer() is built to encode. There should be a method whose purpose is to access raw pixel data.

(Furthermore, I opine that the method to access raw pixel data should be called toBuffer(), and I opine that the current toBuffer() should be called toPNGBuffer(). This suggestion is surely more controversial/problematic.)

getImageData() does two things that are not appropriate in this method:

  1. It rotates the values from RGBA to ARGB
  2. It premultiplies alpha values. This is lossy, and because of the RGBA->ARGB conversion, the whole method is slower than memcpy(), even for fully-opaque images.

Obviously, getImageData() must rotate and premultiply. Equally obviously, rotating and premultiplying are inappropriate for other image-encoding libraries.

I think there's room in the Node ecosystem to put pixel-manipulation and image-encoding in separate libraries. Maybe I'd like to do custom pixel-manipulation (like giving pixel (0,0) alpha 254, so Twitter doesn't convert my PNG to JPEG). Maybe I'd like to convert to indexed RGB. Maybe I'd like to use LodePNG, node-png (with truly custom parameters), JPEG (fully customized), WebP (#562), etc.

(As it so happens, I want to do all those things :))

There's no substitute for memcpy() in a canvas library. toBuffer() is far too slow; getImageData() is lossy (and still a bit slow).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants