-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Closed
Description
I am learning llama, where self.attention.forward is explicitly called. But normally, we don’t write code that explicitly calls forward, but why would llama authors explicitly call it here? Is it just because of habit?
Explicitly called here: h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask)
class TransformerBlock(nn.Module):
def __init__(self, layer_id: int, args: ModelArgs):
super().__init__()
self.n_heads = args.n_heads
self.dim = args.dim
self.head_dim = args.dim // args.n_heads
self.attention = Attention(args)
self.feed_forward = FeedForward(
dim=args.dim,
hidden_dim=4 * args.dim,
multiple_of=args.multiple_of,
ffn_dim_multiplier=args.ffn_dim_multiplier,
)
self.layer_id = layer_id
self.attention_norm = RMSNorm(args.dim, eps=args.norm_eps)
self.ffn_norm = RMSNorm(args.dim, eps=args.norm_eps)
def forward(
self,
x: torch.Tensor,
start_pos: int,
freqs_cis: torch.Tensor,
mask: Optional[torch.Tensor],
):
h = x + self.attention.forward(
self.attention_norm(x), start_pos, freqs_cis, mask
)
out = h + self.feed_forward.forward(self.ffn_norm(h))
return out
Metadata
Metadata
Assignees
Labels
No labels