```julia @show gradient(X -> loss(X, train_y), train_X) # line 45 in 'example/gat.jl' ``` This line takes more than 200s on CPU, but the forward pass takes less than 1s on the same CPU.