-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Feature/save preview latent #672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Absolutely should not use PNG. It's a massive wasteful format, that even with optimization, will only save you anywhere form nothing to 10%. One of the main points here was something that is tiny and can be shared, but not just a straight thumbnail because we're using a terrible format. |
|
If you want to keep the data with not loss you can use .bmp format as a no compression row format. |
The reason I chose PNG is not because it is lossless. From a preview perspective, it is clearly a disadvantage because it sacrifices the advantages of file size. However, it is simply because the default image format in the current ComfyUI is PNG, which makes it easier to share the codebase. For example, there is no need to do any additional code work for tasks like workflow load using PNG. Currently, it can be seen as an implementation that is close to a PoC. We can perceive the usability of both the safetensor format alone and its usage as an image container. |
|
I'm planning to improve it by applying a method demonstrated on how to decode without a VAE that was introduced last night. Instead of using a logo, I want to generate a basic thumbnail using this approach. |
|
I added PNG support to my class earlier, and also a form of PNG compression. Loss of colors heavily impacts PNG filesize. It could be applied to latent to RGB previews for further optimization. However the RGB previews of latents are pretty bad looking. Kinda worse then heavy jpeg compression. Probably why it hasn't been thougtht about for thumbnailing compression. https://github.com/WASasquatch/ComfyLatentImage |
|
The consideration of using latent to RGB as a preview is solely intended as a convenient method for individuals who have no intention of connecting VAE for image visualization and storage. It would be much more useful than a meaningless logo, at the very least. Furthermore, I am contemplating the idea of incorporating a marker indicating the presence of "latent" into the image, rather than simply providing it as a thumbnail.
The consideration of using latent to RGB as a preview is solely intended as a convenient method for individuals who have no intention of connecting VAE for image visualization and storage. It would be much more useful than a meaningless logo, at the very least. Furthermore, I am contemplating the idea of incorporating a marker indicating the presence of "latent" into the image, rather than simply providing it as a thumbnail. |
|
You mean connecting a optional image to store as the preview? Shouldn't need VAE there. That is probably better than a placeholder image. My idea was an overlay of some basic information, as well as branding like link to repo for exposure since a1111 dominates all Also I hope you know you are talking to WAS, who proposed this original idea in chat. You should also consider all the uses. The latent to RGB image is tiny, which means a lot of viewers will be upscaling it to fit within their minimum width/height containers, leading to further degraded viewing. It should be a small image, and compressed, but also not just really bad. Civitai does this fors example to display images within their template correctly. Large image thumbnails in win11 which famously blur upscales too. |
If the creator intentionally connects a decoded image and provides it, then it will be used as the preview. If it's not provided, then the intention is to generate a preview by simply pixelating it using latent to RGB. The reason for making the provision of the image optional is twofold. Firstly, it allows avoiding VAE decoding unless a preview is genuinely needed for identification purposes during the intermediate process. Secondly, it enables the creator to provide high-quality previews if they desire to do so for the purpose of sharing with a large number of users. And is the suggestion to enhance the pixelated image generated by latent to RGB for the preview by applying post-processing techniques such as blur and upscale? |
I think a small upscale (simple resize) is all that's needed. Just so other viewers don't apply their horrendous "optimized" upscalers that just make their upscale from thumbnails blurry and jpegy. |
apply optimize for png attach format text for preview
|
I was not aware latent can be extracted like that without vae, frankly image looks quite amazing. |
I was the same. It was possible with a very simple code provided by Comfy. |
|
This somehow just convinces me even more that my hi res/multi-sampling idea is good :) I though latients would be something more abstract, and not directly convert-able to pixel values, thus making that multi scale combine some weird as thing. |
|
Mind just making this a plugin that hijacks latent saving in ComfyUI? I don't think Comfy is interested unfortunately. |
dennwc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ltdrdata Great idea! I'd really like this one merged eventually. Do you still plan to update the PR?
| @staticmethod | ||
| def save_to_file(tensor_bytes, prompt, extra_pnginfo, image, image_path): | ||
| compressed_data = BytesIO() | ||
| with zipfile.ZipFile(compressed_data, mode='w') as archive: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe gzip or zstd would be better? It's unlikely that it will contain other files, right?
| with zipfile.ZipFile(compressed_data, mode='w') as archive: | ||
| archive.writestr("latent", tensor_bytes) | ||
| image = image.copy() | ||
| exif_data = {"Exif": {piexif.ExifIFD.UserComment: compressed_data.getvalue()}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative to EXIF, it's also possible to write arbitrary data after the end of a PNG.
Can't find a good link at the moment, but TL;DR is: PNG decoder must stop after seeing IEND chunk, any data after that will not be read.
Thus, you could take a PNG encoded image and append a safetensors file to it. Decoder is also simple - PNG chunks encode the length, so it will just skip all of them until IEND and then will read the rest as a safetensors file. I could write a PoC encoder/decoder, if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any advantage over using EXIF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will likely has less encoding overhead.
Also, in theory, you'll still get the benefits of safetensors - the file can still be memory-mapped, since it's just added at the end. You just need to adjust the offset for tensor data. Although latents are pretty small, so I doubt it will be used that way.


add SavePreviewLatent node.
LoadLatent