RotaryEmbedding applied to the incorrect channel dimension

# 🐛 Bug

Input tensors to attention must be in format `[B, M, H, K]`, where `B` is the batch size, `M` the sequence length, `H` the number of heads, and `K` the embedding size per head as documented [here](https://github.com/facebookresearch/xformers/blob/748c159096d4f9fcfe3eaf22801e5aed4777210b/xformers/ops/fmha/__init__.py#L131).

Hence positional embedding (e.g., rotary embedding) should be applied to `dim=1`. However, in the `RotaryEmbedding` class, `dim=-2` is being passed, which corresponds to `dim=2` as seen [here](https://github.com/facebookresearch/xformers/blob/748c159096d4f9fcfe3eaf22801e5aed4777210b/xformers/components/positional_embedding/rotary.py#L85).

```python
def forward(
        self, q: torch.Tensor, k: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        self._cos_cached, self._sin_cached = self._update_cos_sin_tables(
            k, seq_dimension=-2 # should be seq_dimension=1 or no argument should be passed as the default value is correct
        )

        return (
            apply_rotary_pos_emb(q, self._cos_cached, self._sin_cached),
            apply_rotary_pos_emb(k, self._cos_cached, self._sin_cached),
        )
```

## Additional context

Thanks to @jmercat who found symptoms of this problem downstream of xformers!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RotaryEmbedding applied to the incorrect channel dimension #841

🐛 Bug

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RotaryEmbedding applied to the incorrect channel dimension #841

Description

🐛 Bug

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions