Skip to content

Conversation

@simsim314
Copy link

Multi GPU support added by modifying mainly the tranfsormer_hidream_image.py

Some usage of pytorch in the file (i.e. cat function) was assuming all the model is fully on single GPU, this patch solves the issue by copying the relevant computational results to the same GPU (before applying torch.cat)

The initial distribution of the models among different GPUs is using accelerate library with auto.

adaln_input: Optional[torch.FloatTensor] = None,
rope: torch.FloatTensor = None,
) -> torch.FloatTensor:
) -> Tuple[torch.FloatTensor, torch.FloatTensor]: # Use Tuple from typing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments look like they were scattered around by an LLM and aren't needed.

adaln_input,
rope,
)
): # Removed return type hint
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove return typehint?

Comment on lines 322 to 340
if is_training:
x = einops.rearrange(x, 'B S (p1 p2 C) -> B C S (p1 p2)', p1=self.config.patch_size, p2=self.config.patch_size)
x = einops.rearrange(x, 'B S (p1 p2 C) -> B C S (p1 p2)', p1=patch_size, p2=patch_size)
H = self.config.max_resolution[0] // patch_size
W = self.config.max_resolution[1] // patch_size
if x.shape[2] == H * W:
x = einops.rearrange(x, 'B C (H W) (p1 p2) -> B C (H p1) (W p2)', H=H, W=W, p1=patch_size, p2=patch_size)
else:
logger.warning(f"Training unpatchify mismatch: S={x.shape[2]} vs H*W={H*W}")
else:
x_arr = []
for i, img_size in enumerate(img_sizes):
pH, pW = img_size
x_arr.append(
einops.rearrange(x[i, :pH*pW].reshape(1, pH, pW, -1), 'B H W (p1 p2 C) -> B C (H p1) (W p2)',
p1=self.config.patch_size, p2=self.config.patch_size)
)
num_patches = pH * pW
current_x = x[i, :num_patches]
rearranged_x = einops.rearrange(current_x.unsqueeze(0), 'b (H W) (p1 p2 C) -> b C (H p1) (W p2)',
H=pH, W=pW, p1=patch_size, p2=patch_size)
x_arr.append(rearranged_x)
if not x_arr:
return torch.empty((0, self.out_channels, 0, 0), device=x.device, dtype=x.dtype)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these changes for?

Comment on lines 370 to 373
encoder_hidden_states: torch.Tensor = None, # Keep original annotation
pooled_embeds: torch.Tensor = None,
img_sizes: Optional[List[Tuple[int, int]]] = None,
img_ids: Optional[torch.Tensor] = None,
img_sizes: Optional[List[Tuple[int, int]]] = None, # Keep original annotation
img_ids: Optional[torch.Tensor] = None, # Keep original annotation
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM comment spam

)

# spatial forward
logger.warning("...")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wheres the warning now?

@bghira
Copy link

bghira commented Apr 13, 2025

it feels like a lot of unnecessary changes.

@simsim314
Copy link
Author

it feels like a lot of unnecessary changes.

OK sorry thought those changes are superficial - done by LLM prompted to only change the Multi GPU problems regarding torch.cat, I guess those LLMs are pretty stupid still :)

Anyway I have checked manually all the changes now - 4 places that were made sure all tensors are on the same device before cat operation, 3 in the model, 1 in the pipeline, nothing else was touched.

Checked working the full model with 4xRTX 3090 with my runpod jupyther script, it's cloning my branch so don't use it as is. But otherwise it provides a good example how to run the model.

Do I need to open a new PR - or the changes are automated into this one?

@bghira
Copy link

bghira commented Apr 13, 2025

it's not my project, i just follow its development and port fixes to my own project(s). this is probably fine to leave open, but, i don't think the model authors are very responsive. what tends to happen to these repos for new models is pull requests simply pile up and never get merged.

@simsim314
Copy link
Author

Anyway just wanted to share with the community the option to run on several gpus, and you were right some changes didnt make sense...

My branch now is clean of strange llm artifacts, and supports multi gpu - I guess I am not the only one who will need it - this model is huge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants