When I used your model to train our dataset, the loss function changed to nan over time.Do you know why?