-
Notifications
You must be signed in to change notification settings - Fork 238
Support for Multi-GPU #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
multi GPU support added with Gemini-2.5, checked working
| adaln_input: Optional[torch.FloatTensor] = None, | ||
| rope: torch.FloatTensor = None, | ||
| ) -> torch.FloatTensor: | ||
| ) -> Tuple[torch.FloatTensor, torch.FloatTensor]: # Use Tuple from typing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these comments look like they were scattered around by an LLM and aren't needed.
| adaln_input, | ||
| rope, | ||
| ) | ||
| ): # Removed return type hint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove return typehint?
| if is_training: | ||
| x = einops.rearrange(x, 'B S (p1 p2 C) -> B C S (p1 p2)', p1=self.config.patch_size, p2=self.config.patch_size) | ||
| x = einops.rearrange(x, 'B S (p1 p2 C) -> B C S (p1 p2)', p1=patch_size, p2=patch_size) | ||
| H = self.config.max_resolution[0] // patch_size | ||
| W = self.config.max_resolution[1] // patch_size | ||
| if x.shape[2] == H * W: | ||
| x = einops.rearrange(x, 'B C (H W) (p1 p2) -> B C (H p1) (W p2)', H=H, W=W, p1=patch_size, p2=patch_size) | ||
| else: | ||
| logger.warning(f"Training unpatchify mismatch: S={x.shape[2]} vs H*W={H*W}") | ||
| else: | ||
| x_arr = [] | ||
| for i, img_size in enumerate(img_sizes): | ||
| pH, pW = img_size | ||
| x_arr.append( | ||
| einops.rearrange(x[i, :pH*pW].reshape(1, pH, pW, -1), 'B H W (p1 p2 C) -> B C (H p1) (W p2)', | ||
| p1=self.config.patch_size, p2=self.config.patch_size) | ||
| ) | ||
| num_patches = pH * pW | ||
| current_x = x[i, :num_patches] | ||
| rearranged_x = einops.rearrange(current_x.unsqueeze(0), 'b (H W) (p1 p2 C) -> b C (H p1) (W p2)', | ||
| H=pH, W=pW, p1=patch_size, p2=patch_size) | ||
| x_arr.append(rearranged_x) | ||
| if not x_arr: | ||
| return torch.empty((0, self.out_channels, 0, 0), device=x.device, dtype=x.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these changes for?
| encoder_hidden_states: torch.Tensor = None, # Keep original annotation | ||
| pooled_embeds: torch.Tensor = None, | ||
| img_sizes: Optional[List[Tuple[int, int]]] = None, | ||
| img_ids: Optional[torch.Tensor] = None, | ||
| img_sizes: Optional[List[Tuple[int, int]]] = None, # Keep original annotation | ||
| img_ids: Optional[torch.Tensor] = None, # Keep original annotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLM comment spam
| ) | ||
|
|
||
| # spatial forward | ||
| logger.warning("...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wheres the warning now?
|
it feels like a lot of unnecessary changes. |
small spaces
unexpected indent fix
bug in name fixed duplicate of row fixed
OK sorry thought those changes are superficial - done by LLM prompted to only change the Multi GPU problems regarding Anyway I have checked manually all the changes now - 4 places that were made sure all tensors are on the same device before cat operation, 3 in the model, 1 in the pipeline, nothing else was touched. Checked working the Do I need to open a new PR - or the changes are automated into this one? |
|
it's not my project, i just follow its development and port fixes to my own project(s). this is probably fine to leave open, but, i don't think the model authors are very responsive. what tends to happen to these repos for new models is pull requests simply pile up and never get merged. |
|
Anyway just wanted to share with the community the option to run on several gpus, and you were right some changes didnt make sense... My branch now is clean of strange llm artifacts, and supports multi gpu - I guess I am not the only one who will need it - this model is huge. |
Multi GPU support added by modifying mainly the tranfsormer_hidream_image.py
Some usage of pytorch in the file (i.e. cat function) was assuming all the model is fully on single GPU, this patch solves the issue by copying the relevant computational results to the same GPU (before applying torch.cat)
The initial distribution of the models among different GPUs is using accelerate library with auto.