Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
0e3c818
Vae added and matched flux checkpoint
Victor49152 Sep 4, 2024
8c9c56f
Flux model added.
Victor49152 Sep 18, 2024
9a304dc
Copying FlowMatchEulerScheduler over
Victor49152 Sep 19, 2024
73c714d
WIP: Start to test the pipeline forward pass
Victor49152 Sep 24, 2024
f4d7747
Vae added and matched flux checkpoint
Victor49152 Sep 4, 2024
6e4de91
Inference pipeline runs with offloading function
Victor49152 Sep 27, 2024
2cb67f2
Start to test image generation
Victor49152 Oct 1, 2024
c18cf60
Decoding with VAE part has been verified. Still need to check the den…
Victor49152 Oct 2, 2024
072ce16
The inference pipeline is verified.
Victor49152 Oct 3, 2024
b4d281f
Add arg parsers and refactoring
Victor49152 Oct 3, 2024
7d27534
Tested on multi batch sizes and prompts.
Victor49152 Oct 4, 2024
6d2da09
Add headers
Victor49152 Oct 4, 2024
db43ec7
Apply isort and black reformatting
Victor49152 Oct 4, 2024
597a646
Renaming
Victor49152 Oct 4, 2024
d2bfbc3
Merge remote-tracking branch 'origin/mingyuanm/diffusion' into mingyu…
Victor49152 Oct 4, 2024
7894f2c
Merge branch 'refs/heads/main' into mingyuanm/diffusion
Victor49152 Oct 14, 2024
6fb7433
Move shceduler to sampler folder
Victor49152 Oct 14, 2024
f4cf498
Merging folders.
Victor49152 Oct 14, 2024
756b8ee
Apply isort and black reformatting
Victor49152 Oct 14, 2024
73e2099
Tested after path changing.
Victor49152 Oct 14, 2024
aec7a13
Apply isort and black reformatting
Victor49152 Oct 14, 2024
15db8ad
Move MMDIT block to NeMo
Victor49152 Oct 14, 2024
6801903
Apply isort and black reformatting
Victor49152 Oct 14, 2024
7d34b30
Add joint attention and single attention to NeMo
Victor49152 Oct 15, 2024
2bf20e1
Apply isort and black reformatting
Victor49152 Oct 15, 2024
d78c682
Joint attention updated
Victor49152 Oct 16, 2024
fbd6987
Apply isort and black reformatting
Victor49152 Oct 16, 2024
aa9df2a
Remove redundant importing
Victor49152 Oct 16, 2024
ae18bb6
Refactor to inherit megatron module
Victor49152 Oct 17, 2024
94b1a3d
Adding mockdata
Victor49152 Oct 21, 2024
15761a4
DDP training works
Victor49152 Oct 25, 2024
83456df
Added flux controlnet training components while not tested yet
Victor49152 Oct 30, 2024
e0de704
Flux training with DDP tested on 1 GPU
Victor49152 Nov 1, 2024
3b62be0
Flux and controlnet now could train on precached mode.
Victor49152 Nov 5, 2024
b8044db
Custom FSDP path added to megatron parallel.
Victor49152 Nov 12, 2024
a57162b
Bug fix
Victor49152 Nov 13, 2024
fe0f705
A hacky way to wrap frozen flux into FSDP to reproduce illegal memory…
Victor49152 Nov 13, 2024
0237c05
Typo
Victor49152 Nov 13, 2024
cb8cd6e
Bypass the no grad issue when no single layers exists
Victor49152 Nov 13, 2024
d48a60e
Merge branch 'refs/heads/main' into mingyuanm/flux_controlnet
Victor49152 Nov 14, 2024
e2fb592
A hacky way to wrap frozen flux into FSDP to reproduce illegal memory…
Victor49152 Nov 13, 2024
31b849c
Merge remote-tracking branch 'origin/mingyuanm/fsdp_debugging' into m…
Victor49152 Nov 14, 2024
0226a88
Let the flux model's dtype autocast before FSDP wrapping
shjwudp Nov 14, 2024
4ed3a6d
fix RuntimeError: "Output 0 of SliceBackward0 is a view and is being …
shjwudp Nov 14, 2024
d1a28bb
Add a wrapper to flux controlnet so they are all wrapped into FSDP au…
Victor49152 Nov 15, 2024
47ca7e5
Get rid of concat op in flux single transformer
Victor49152 Nov 20, 2024
3ff2c1b
Get rid of concat op in flux single transformer
Victor49152 Nov 20, 2024
de62607
Merge remote-tracking branch 'origin/mingyuanm/single_transformer_tp_…
Victor49152 Nov 20, 2024
8786981
single block attention.linear_proj.bias must not require grads after …
Victor49152 Nov 20, 2024
47fbedc
use cpu initialization to avoid OOM
Victor49152 Nov 25, 2024
8b051d4
Set up flux training script with tp
Victor49152 Dec 5, 2024
733f5fe
SDXL fid image generation script updated.
Victor49152 Dec 9, 2024
950b362
Mcore self attention API changed
Victor49152 Dec 10, 2024
229ece4
Add a dummy task encoder for raw image inputs
Victor49152 Dec 10, 2024
f8f31df
Support loading crudedataset via energon dataloader
Victor49152 Dec 12, 2024
389c14e
Default save last to True
Victor49152 Dec 12, 2024
0ddca3e
Add controlnet inference pipeline
Victor49152 Dec 13, 2024
615dbea
Add controlnet inference script
Victor49152 Dec 13, 2024
b5ea320
Image resize mode update
Victor49152 Dec 13, 2024
bdb8155
Remove unnecessary bias to avoid sharding issue.
Victor49152 Dec 13, 2024
78eed47
Handle MCore custom fsdp checkpoint load (#11621)
shjwudp Dec 17, 2024
f94f142
Checkpoint naming
Victor49152 Dec 17, 2024
31a6bae
Image logger WIP
Victor49152 Dec 17, 2024
ba0d84e
Image logger works fine
Victor49152 Dec 17, 2024
d78a329
save hint and output to image logger.
Victor49152 Dec 19, 2024
e50cbfd
Update flux controlnet training step
Victor49152 Dec 20, 2024
9c26721
Add model connector and try to load from dist ckpt but failed.
Victor49152 Dec 31, 2024
1026c0a
Renaming and refactoring submodel configs for nemo run compatibility
Victor49152 Jan 6, 2025
2dfab9e
Nemo run script works for basic testing recipe
Victor49152 Jan 6, 2025
0d874f5
Added tp2 training factory
Victor49152 Jan 7, 2025
ec3154e
Added convergence recipe
Victor49152 Jan 8, 2025
0aab7dc
Added flux training scripts
Victor49152 Jan 8, 2025
5053c41
Inference script tested
Victor49152 Jan 8, 2025
0d73d69
Controlnet inference script tested
Victor49152 Jan 8, 2025
b016614
Moving scripts to correct folder and modify headers
Victor49152 Jan 8, 2025
848bd27
Apply isort and black reformatting
Victor49152 Jan 8, 2025
6129c82
Doc strings update
Victor49152 Jan 9, 2025
790abdd
Apply isort and black reformatting
Victor49152 Jan 9, 2025
90a20e8
pylint correction
Victor49152 Jan 9, 2025
36bcf5c
Apply isort and black reformatting
Victor49152 Jan 9, 2025
36644c8
Merge branch 'refs/heads/main' into mingyuanm/flux_controlnet
Victor49152 Jan 9, 2025
1ec2c16
Add import guard since custom fsdp is not merged to mcore yet
Victor49152 Jan 9, 2025
6c2cbeb
Add copy right headers and correct code check
Victor49152 Jan 9, 2025
20a52e3
Apply isort and black reformatting
Victor49152 Jan 9, 2025
3dcdfe6
Merge branch 'main' into mingyuanm/flux_controlnet
Victor49152 Jan 10, 2025
acb7f54
Code Scan
Victor49152 Jan 13, 2025
9049b6c
Merge branch 'main' into mingyuanm/flux_controlnet
Victor49152 Jan 13, 2025
230d00d
Merge branch 'main' into mingyuanm/flux_controlnet
Victor49152 Jan 15, 2025
f25405c
Minor fix
Victor49152 Jan 15, 2025
896ac09
Merge branch 'refs/heads/main' into mingyuanm/flux_controlnet
Victor49152 Jan 16, 2025
0a1592b
Merge branch 'main' into mingyuanm/flux_controlnet
Victor49152 Jan 17, 2025
3952667
Merge branch 'main' into mingyuanm/flux_controlnet
Victor49152 Jan 17, 2025
45fbb1e
Merge branch 'main' into mingyuanm/flux_controlnet
Victor49152 Jan 18, 2025
e7a151c
Update megatron fsdp guard for importing errors
Victor49152 Jan 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,15 @@ name: stable-diffusion-train

fid:
classifier_free_guidance:
- 1.5
- 2
- 3
- 4
- 5
- 6
- 7
- 8
nnodes_per_cfg: 1
nnodes_per_cfg: 2
ntasks_per_node: 8
local_task_id: null
num_images_to_eval: 30000
coco_captions_path: /coco2014/coco2014_val_sampled_30k/captions
coco_images_path: /coco2014/coco2014_val/images_256
coco_captions_path: /datasets/coco2014/coco2014_val_sampled_30k/captions
coco_images_path: /datasets/coco2014/coco2014_val/images_256
save_path: output

model:
restore_from_path:
is_legacy: False

use_refiner: False
use_fp16: False # use fp16 model weights
Expand Down Expand Up @@ -88,8 +78,128 @@ sampling:
order: 4

trainer:
devices: ${evaluation.fid.ntasks_per_node}
devices: ${fid.ntasks_per_node}
num_nodes: 1
accelerator: gpu
precision: 32
logger: False # logger provided by exp_manager


model:
restore_from_path: null
is_legacy: False
scale_factor: 0.13025
disable_first_stage_autocast: True

fsdp: False
fsdp_set_buffer_dtype: null
fsdp_sharding_strategy: 'full'
use_cpu_initialization: True

optim:
name: fused_adam
lr: 1e-4
weight_decay: 0.0
betas:
- 0.9
- 0.999
sched:
name: WarmupHoldPolicy
warmup_steps: 10
hold_steps: 10000000000000 # Incredibly large value to hold the lr as constant

denoiser_config:
_target_: nemo.collections.multimodal.modules.stable_diffusion.diffusionmodules.denoiser.DiscreteDenoiser
num_idx: 1000

weighting_config:
_target_: nemo.collections.multimodal.modules.stable_diffusion.diffusionmodules.denoiser_weighting.EpsWeighting
scaling_config:
_target_: nemo.collections.multimodal.modules.stable_diffusion.diffusionmodules.denoiser_scaling.EpsScaling
discretization_config:
_target_: nemo.collections.multimodal.modules.stable_diffusion.diffusionmodules.discretizer.LegacyDDPMDiscretization

unet_config:
_target_: nemo.collections.multimodal.modules.stable_diffusion.diffusionmodules.openaimodel.UNetModel
from_pretrained:
from_NeMo: True
adm_in_channels: 2816
num_classes: sequential
use_checkpoint: False
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions: [ 4, 2 ]
num_res_blocks: 2
channel_mult: [ 1, 2, 4 ]
num_head_channels: 64
use_spatial_transformer: True
use_linear_in_transformer: True
transformer_depth: [ 1, 2, 10 ] # note: the first is unused (due to attn_res starting at 2) 32, 16, 8 --> 64, 32, 16
context_dim: 2048
image_size: 64 # unused
# spatial_transformer_attn_type: softmax #note: only default softmax is supported now
legacy: False
use_flash_attention: False

first_stage_config:
# _target_: nemo.collections.multimodal.models.stable_diffusion.ldm.autoencoder.AutoencoderKLInferenceWrapper
_target_: nemo.collections.multimodal.models.text_to_image.stable_diffusion.ldm.autoencoder.AutoencoderKLInferenceWrapper
from_pretrained:
from_NeMo: False
embed_dim: 4
monitor: val/rec_loss
ddconfig:
attn_type: vanilla
double_z: true
z_channels: 4
resolution: 256
in_channels: 3
out_ch: 3
ch: 128
ch_mult: [ 1, 2, 4, 4 ]
num_res_blocks: 2
attn_resolutions: [ ]
dropout: 0.0
lossconfig:
target: torch.nn.Identity

conditioner_config:
_target_: nemo.collections.multimodal.modules.stable_diffusion.encoders.modules.GeneralConditioner
emb_models:
# crossattn cond
- is_trainable: False
input_key: txt
emb_model:
_target_: nemo.collections.multimodal.modules.stable_diffusion.encoders.modules.FrozenCLIPEmbedder
layer: hidden
layer_idx: 11
# crossattn and vector cond
- is_trainable: False
input_key: txt
emb_model:
_target_: nemo.collections.multimodal.modules.stable_diffusion.encoders.modules.FrozenOpenCLIPEmbedder2
arch: ViT-bigG-14
version: laion2b_s39b_b160k
freeze: True
layer: penultimate
always_return_pooled: True
legacy: False
# vector cond
- is_trainable: False
input_key: original_size_as_tuple
emb_model:
_target_: nemo.collections.multimodal.modules.stable_diffusion.encoders.modules.ConcatTimestepEmbedderND
outdim: 256 # multiplied by two
# vector cond
- is_trainable: False
input_key: crop_coords_top_left
emb_model:
_target_: nemo.collections.multimodal.modules.stable_diffusion.encoders.modules.ConcatTimestepEmbedderND
outdim: 256 # multiplied by two
# vector cond
- is_trainable: False
input_key: target_size_as_tuple
emb_model:
_target_: nemo.collections.multimodal.modules.stable_diffusion.encoders.modules.ConcatTimestepEmbedderND
outdim: 256 # multiplied by two
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ model:
scale_factor: 0.13025
disable_first_stage_autocast: True
is_legacy: False
restore_from_path: ""
restore_from_path: null

fsdp: False
fsdp_set_buffer_dtype: null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@
from nemo.core.config import hydra_runner


@hydra_runner(config_path='conf/stable_diffusion/conf', config_name='sd_xl_fid_images')
@hydra_runner(config_path='conf', config_name='sd_xl_fid_images')
def main(cfg):
# pylint: disable=C0116
# Read configuration parameters
nnodes_per_cfg = cfg.fid.nnodes_per_cfg
ntasks_per_node = cfg.fid.ntasks_per_node
Expand Down
Loading
Loading