Skip to content

Commit f584680

Browse files
Your NameAMEERAZAM08
authored andcommitted
photodoodel added
1 parent c934720 commit f584680

File tree

5 files changed

+737
-0
lines changed

5 files changed

+737
-0
lines changed

src/diffusers/pipelines/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
"ledits_pp": [],
3131
"marigold": [],
3232
"pag": [],
33+
"photodoodle": [],
3334
"stable_diffusion": [],
3435
"stable_diffusion_xl": [],
3536
}
@@ -53,6 +54,7 @@
5354
_import_structure["ddpm"] = ["DDPMPipeline"]
5455
_import_structure["dit"] = ["DiTPipeline"]
5556
_import_structure["latent_diffusion"].extend(["LDMSuperResolutionPipeline"])
57+
_import_structure["photodoodle"].extend(["PhotoDoodlePipeline"])
5658
_import_structure["pipeline_utils"] = [
5759
"AudioPipelineOutput",
5860
"DiffusionPipeline",
@@ -286,6 +288,7 @@
286288
_import_structure["mochi"] = ["MochiPipeline"]
287289
_import_structure["musicldm"] = ["MusicLDMPipeline"]
288290
_import_structure["omnigen"] = ["OmniGenPipeline"]
291+
_import_structure["photodoodle"].extend(["PhotoDoodlePipeline"])
289292
_import_structure["visualcloze"] = ["VisualClozePipeline", "VisualClozeGenerationPipeline"]
290293
_import_structure["paint_by_example"] = ["PaintByExamplePipeline"]
291294
_import_structure["pia"] = ["PIAPipeline"]
@@ -492,6 +495,7 @@
492495
from .deprecated import KarrasVePipeline, LDMPipeline, PNDMPipeline, RePaintPipeline, ScoreSdeVePipeline
493496
from .dit import DiTPipeline
494497
from .latent_diffusion import LDMSuperResolutionPipeline
498+
from .photodoodle import PhotoDoodlePipeline
495499
from .pipeline_utils import (
496500
AudioPipelineOutput,
497501
DiffusionPipeline,
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# PhotoDoodle Pipeline
2+
3+
The PhotoDoodle pipeline is designed for image generation with conditional image input. It uses a combination of text and image conditioning to generate high-quality images.
4+
5+
## Model Architecture
6+
7+
The pipeline uses the following components:
8+
9+
1. **Transformer**: A FluxTransformer2DModel for denoising image latents
10+
2. **VAE**: An AutoencoderKL for encoding/decoding images
11+
3. **Text Encoders**:
12+
- CLIP text encoder for initial text embedding
13+
- T5 encoder for additional text understanding
14+
4. **Scheduler**: FlowMatchEulerDiscreteScheduler for the diffusion process
15+
16+
## Usage
17+
18+
```python
19+
from diffusers import PhotoDoodlePipeline
20+
import torch
21+
22+
pipeline = PhotoDoodlePipeline.from_pretrained("black-forest-labs/FLUX.1-dev")
23+
pipeline = pipeline.to("cuda")
24+
# Load initial model weights
25+
pipeline.load_lora_weights("nicolaus-huang/PhotoDoodle", weight_name="pretrain.safetensors")
26+
pipeline.fuse_lora()
27+
pipeline.unload_lora_weights()
28+
29+
pipeline.load_lora_weights("nicolaus-huang/PhotoDoodle",weight_name="sksmagiceffects.safetensors")
30+
31+
# Generate image with text prompt and condition image
32+
prompt = "add a halo and wings for the cat by sksmagiceffects"
33+
condition_image = load_image("path/to/condition.jpg") # PIL Image
34+
output = pipeline(
35+
prompt=prompt,
36+
condition_image=condition_image,
37+
num_inference_steps=28,
38+
guidance_scale=3.5
39+
)
40+
41+
# Save the generated image
42+
output.images[0].save("generated_image.png")
43+
```
44+
45+
## Parameters
46+
47+
- `prompt`: Text prompt for image generation
48+
- `prompt_2`: Optional secondary prompt for T5 encoder
49+
- `condition_image`: Input image for conditioning
50+
- `height`: Output image height (default: 512)
51+
- `width`: Output image width (default: 512)
52+
- `num_inference_steps`: Number of denoising steps (default: 28)
53+
- `guidance_scale`: Classifier-free guidance scale (default: 3.5)
54+
- `num_images_per_prompt`: Number of images to generate per prompt
55+
- `generator`: Random number generator for reproducibility
56+
- `output_type`: Output format ("pil", "latent", or "pt")
57+
58+
## Features
59+
60+
- Dual text encoder architecture (CLIP + T5)
61+
- Image conditioning support
62+
- Position encoding for better spatial understanding
63+
- Support for LoRA fine-tuning
64+
- VAE slicing and tiling for memory efficiency
65+
- Progress bar during generation
66+
- Callback support for step-by-step monitoring
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
"""
2+
PhotoDoodle pipeline for image generation.
3+
"""
4+
5+
from typing import TYPE_CHECKING
6+
7+
from ...utils import (
8+
DIFFUSERS_SLOW_IMPORT,
9+
OptionalDependencyNotAvailable,
10+
_LazyModule,
11+
is_torch_available,
12+
is_transformers_available,
13+
)
14+
15+
_dummy_objects = {}
16+
_import_structure = {
17+
"pipeline_photodoodle": ["PhotoDoodlePipeline"],
18+
}
19+
20+
try:
21+
if not (is_torch_available() and is_transformers_available()):
22+
raise OptionalDependencyNotAvailable()
23+
except OptionalDependencyNotAvailable:
24+
from ...utils.dummy_torch_and_transformers_objects import * # noqa F403
25+
else:
26+
from .pipeline_photodoodle import PhotoDoodlePipeline
27+
28+
if TYPE_CHECKING:
29+
from .pipeline_photodoodle import PhotoDoodlePipeline
30+
31+
else:
32+
import sys
33+
34+
sys.modules[__name__] = _LazyModule(
35+
__name__,
36+
globals()["__file__"],
37+
_import_structure,
38+
module_spec=__spec__,
39+
)

0 commit comments

Comments
 (0)