Skip to content

Conversation

@ltdrdata
Copy link
Member

add SavePreviewLatent node.

  • use .png as container of latent.
  • exif=latent_tensor, pnginfo=same as saveimage -> you can load workflow on frontend
  • image_opt is optional: use logo.png if None -> logo.png is temporary image for testing. it must be changed to proper image.

LoadLatent

  • additional support for .latent.png
preview preview2

@WASasquatch
Copy link
Contributor

WASasquatch commented May 18, 2023

Absolutely should not use PNG. It's a massive wasteful format, that even with optimization, will only save you anywhere form nothing to 10%.

One of the main points here was something that is tiny and can be shared, but not just a straight thumbnail because we're using a terrible format.

@levaleureux
Copy link

If you want to keep the data with not loss you can use .bmp format as a no compression row format.
https://en.wikipedia.org/wiki/BMP_file_format

@ltdrdata
Copy link
Member Author

ltdrdata commented May 19, 2023

If you want to keep the data with not loss you can use .bmp format as a no compression row format. https://en.wikipedia.org/wiki/BMP_file_format

The reason I chose PNG is not because it is lossless. From a preview perspective, it is clearly a disadvantage because it sacrifices the advantages of file size. However, it is simply because the default image format in the current ComfyUI is PNG, which makes it easier to share the codebase. For example, there is no need to do any additional code work for tasks like workflow load using PNG.

Currently, it can be seen as an implementation that is close to a PoC. We can perceive the usability of both the safetensor format alone and its usage as an image container.

@ltdrdata
Copy link
Member Author

ltdrdata commented May 19, 2023

I'm planning to improve it by applying a method demonstrated on how to decode without a VAE that was introduced last night. Instead of using a logo, I want to generate a basic thumbnail using this approach.

@WASasquatch
Copy link
Contributor

WASasquatch commented May 19, 2023

I added PNG support to my class earlier, and also a form of PNG compression. Loss of colors heavily impacts PNG filesize. It could be applied to latent to RGB previews for further optimization. However the RGB previews of latents are pretty bad looking. Kinda worse then heavy jpeg compression. Probably why it hasn't been thougtht about for thumbnailing compression.

https://github.com/WASasquatch/ComfyLatentImage

@ltdrdata
Copy link
Member Author

The consideration of using latent to RGB as a preview is solely intended as a convenient method for individuals who have no intention of connecting VAE for image visualization and storage. It would be much more useful than a meaningless logo, at the very least.

Furthermore, I am contemplating the idea of incorporating a marker indicating the presence of "latent" into the image, rather than simply providing it as a thumbnail.

I added PNG support to my class earlier, and also a form of PNG compression. Loss of colors heavily impacts PNG filesize. It could be applied to latent to RGB previews for further optimization. However the RGB previews of latents are pretty bad looking. Kinda worse then heavy jpeg compression. Probably why it hasn't been thougtht about for thumbnailing compression.

https://github.com/WASasquatch/ComfyLatentImage

The consideration of using latent to RGB as a preview is solely intended as a convenient method for individuals who have no intention of connecting VAE for image visualization and storage. It would be much more useful than a meaningless logo, at the very least.

Furthermore, I am contemplating the idea of incorporating a marker indicating the presence of "latent" into the image, rather than simply providing it as a thumbnail.

@WASasquatch
Copy link
Contributor

WASasquatch commented May 19, 2023

You mean connecting a optional image to store as the preview? Shouldn't need VAE there. That is probably better than a placeholder image. My idea was an overlay of some basic information, as well as branding like link to repo for exposure since a1111 dominates all

Also I hope you know you are talking to WAS, who proposed this original idea in chat.

You should also consider all the uses. The latent to RGB image is tiny, which means a lot of viewers will be upscaling it to fit within their minimum width/height containers, leading to further degraded viewing. It should be a small image, and compressed, but also not just really bad. Civitai does this fors example to display images within their template correctly. Large image thumbnails in win11 which famously blur upscales too.

@ltdrdata
Copy link
Member Author

You mean connecting a optional image to store as the preview? Shouldn't need VAE there. That is probably better than a placeholder image. My idea was an overlay of some basic information, as well as branding like link to repo for exposure since a1111 dominates all

Also I hope you know you are talking to WAS, who proposed this original idea in chat.

You should also consider all the uses. The latent to RGB image is tiny, which means a lot of viewers will be upscaling it to fit within their minimum width/height containers, leading to further degraded viewing. It should be a small image, and compressed, but also not just really bad. Civitai does this fors example to display images within their template correctly. Large image thumbnails in win11 which famously blur upscales too.

If the creator intentionally connects a decoded image and provides it, then it will be used as the preview. If it's not provided, then the intention is to generate a preview by simply pixelating it using latent to RGB.

The reason for making the provision of the image optional is twofold. Firstly, it allows avoiding VAE decoding unless a preview is genuinely needed for identification purposes during the intermediate process. Secondly, it enables the creator to provide high-quality previews if they desire to do so for the purpose of sharing with a large number of users.

And is the suggestion to enhance the pixelated image generated by latent to RGB for the preview by applying post-processing techniques such as blur and upscale?

@WASasquatch
Copy link
Contributor

WASasquatch commented May 19, 2023

And is the suggestion to enhance the pixelated image generated by latent to RGB for the preview by applying post-processing techniques such as blur and upscale?

I think a small upscale (simple resize) is all that's needed. Just so other viewers don't apply their horrendous "optimized" upscalers that just make their upscale from thumbnails blurry and jpegy.

apply optimize for png
attach format text for preview
@ltdrdata
Copy link
Member Author

I applied the result of latent_to_rgb as the default preview and added a text at the bottom explaining the format called "ComfyUI LATENT." Additionally, I applied the optimize option for slight size optimization. For latent_to_rgb, I limited the size within the range of 128 to 512 for preventing meaningless high-resolution previews or excessively small previews. When intentionally providing image_opt, it is structured in a way that users are responsible for resizing to allow high-resolution output.
default-preview

@ltdrdata
Copy link
Member Author

ComfyUI_00004_ latent

Changed the upscale method to nearest-exact in order to achieve a more pronounced feeling of latent rawness.

@morphles
Copy link

I was not aware latent can be extracted like that without vae, frankly image looks quite amazing.

@ltdrdata
Copy link
Member Author

I was not aware latent can be extracted like that without vae, frankly image looks quite amazing.

I was the same. It was possible with a very simple code provided by Comfy.

@morphles
Copy link

This somehow just convinces me even more that my hi res/multi-sampling idea is good :) I though latients would be something more abstract, and not directly convert-able to pixel values, thus making that multi scale combine some weird as thing.

@WASasquatch
Copy link
Contributor

Mind just making this a plugin that hijacks latent saving in ComfyUI? I don't think Comfy is interested unfortunately.

Copy link
Contributor

@dennwc dennwc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ltdrdata Great idea! I'd really like this one merged eventually. Do you still plan to update the PR?

@staticmethod
def save_to_file(tensor_bytes, prompt, extra_pnginfo, image, image_path):
compressed_data = BytesIO()
with zipfile.ZipFile(compressed_data, mode='w') as archive:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe gzip or zstd would be better? It's unlikely that it will contain other files, right?

with zipfile.ZipFile(compressed_data, mode='w') as archive:
archive.writestr("latent", tensor_bytes)
image = image.copy()
exif_data = {"Exif": {piexif.ExifIFD.UserComment: compressed_data.getvalue()}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative to EXIF, it's also possible to write arbitrary data after the end of a PNG.

Can't find a good link at the moment, but TL;DR is: PNG decoder must stop after seeing IEND chunk, any data after that will not be read.

Thus, you could take a PNG encoded image and append a safetensors file to it. Decoder is also simple - PNG chunks encode the length, so it will just skip all of them until IEND and then will read the rest as a safetensors file. I could write a PoC encoder/decoder, if you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any advantage over using EXIF?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will likely has less encoding overhead.

Also, in theory, you'll still get the benefits of safetensors - the file can still be memory-mapped, since it's just added at the end. You just need to adjust the offset for tensor data. Although latents are pretty small, so I doubt it will be used that way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants