Why call self.attention.forward

I am learning llama, where self.attention.forward is explicitly called. But normally, we don’t write code that explicitly calls forward, but why would llama authors explicitly call it here? Is it just because of habit?

Explicitly called here: h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask)

```
class TransformerBlock(nn.Module):
    def __init__(self, layer_id: int, args: ModelArgs):
        super().__init__()
        self.n_heads = args.n_heads
        self.dim = args.dim
        self.head_dim = args.dim // args.n_heads
        self.attention = Attention(args)
        self.feed_forward = FeedForward(
            dim=args.dim,
            hidden_dim=4 * args.dim,
            multiple_of=args.multiple_of,
            ffn_dim_multiplier=args.ffn_dim_multiplier,
        )
        self.layer_id = layer_id
        self.attention_norm = RMSNorm(args.dim, eps=args.norm_eps)
        self.ffn_norm = RMSNorm(args.dim, eps=args.norm_eps)

    def forward(
        self,
        x: torch.Tensor,
        start_pos: int,
        freqs_cis: torch.Tensor,
        mask: Optional[torch.Tensor],
    ):
        h = x + self.attention.forward(
            self.attention_norm(x), start_pos, freqs_cis, mask
        )
        out = h + self.feed_forward.forward(self.ffn_norm(h))
        return out
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why call self.attention.forward #1055

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why call self.attention.forward #1055

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions