Skip to content

[bug] reduce_aux_losses_tracker_across_ranks all_reduce bug with num_layers==pp stage #2418

@lk137095576

Description

@lk137095576

in

def reduce_aux_losses_tracker_across_ranks(track_names: Optional[List[str]] = None):

in reduce_aux_losses_tracker_across_ranks, if pp stage == model block,like num_layers=4, pp=4, all_reduce run error
torch.distributed.all_reduce(
values, group=parallel_state.get_pipeline_model_parallel_group()
)
if num_layers=4, pp=2, run right

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions