### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 我在根据transformers官方的教程进行推理加速。 https://huggingface.co/docs/transformers/llm_optims?static-kv=basic+usage%3A+generation_config model.generation_config.cache_implementation = "static" model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)  ### Expected Behavior _No response_ ### Steps To Reproduce with torch.no_grad(): vector_outputs = model( **seq, output_hidden_states=True, return_dict=True ) 在model()运行时报错,没有进入下一层,直接报错 ### Environment ```markdown - OS: - Python:3.8.20 - Transformers: 4.30.2 - PyTorch:2.0.1 - CUDA Support:True ``` ### Anything else? _No response_