Open
Description
I was trying to compile the huggingface Llama 2 model using the following code:
import os
import torch
import torch_tensorrt
import torch.backends.cudnn as cudnn
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch._dynamo as dynamo
from optimum.onnxruntime import ORTModelForCausalLM
base_model = 'llama-2-7b'
comp_method = 'magnitude_unstructured'
comp_degree = 0.2
model_path = f'vita-group/{base_model}_{comp_method}'
model = AutoModelForCausalLM.from_pretrained(
model_path,
revision=f's{comp_degree}',
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
device_map="auto")
model.save_pretrained("model_ckpt/")
model.eval()
# setting
# torch._dynamo.config.suppress_errors = True
enabled_precisions = {torch.float, torch.int, torch.long}
debug = False
workspace_size = 20 << 30
min_block_size = 7
torch_executed_ops = {}
compilation_kwargs = {
"enabled_precisions": enabled_precisions,
"debug": debug,
"workspace_size": workspace_size,
"min_block_size": min_block_size,
"torch_executed_ops": torch_executed_ops,
}
with torch.no_grad():
optimized_model = torch.compile(
model.generate,
backend="torch_tensorrt",
dynamic=True,
options=compilation_kwargs,
)
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.cuda()
#outputs = model.generate(input_ids, max_new_tokens=128)
outputs = optimized_model(input_ids, max_new_tokens=128)
And here is the complete log:
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
WARNING:torch_tensorrt.dynamo.compile:0 supported operations detected in subgraph containing 0 computational nodes. Skipping this subgraph, since min_block_size was detected to be 7
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
WARNING:torch_tensorrt.dynamo.compile:0 supported operations detected in subgraph containing 0 computational nodes. Skipping this subgraph, since min_block_size was detected to be 7
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
WARNING:torch_tensorrt.dynamo.compile:0 supported operations detected in subgraph containing 0 computational nodes. Skipping this subgraph, since min_block_size was detected to be 7
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
WARNING:torch_tensorrt.dynamo.compile:0 supported operations detected in subgraph containing 0 computational nodes. Skipping this subgraph, since min_block_size was detected to be 7
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
WARNING:torch_tensorrt.dynamo.compile:0 supported operations detected in subgraph containing 0 computational nodes. Skipping this subgraph, since min_block_size was detected to be 7
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
WARNING:torch_tensorrt.dynamo.compile:1 supported operations detected in subgraph containing 2 computational nodes. Skipping this subgraph, since min_block_size was detected to be 7
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
/usr/local/lib/python3.10/dist-packages/torch/overrides.py:111: UserWarning: 'has_cuda' is deprecated, please use 'torch.backends.cuda.is_built()'
torch.has_cuda,
/usr/local/lib/python3.10/dist-packages/torch/overrides.py:112: UserWarning: 'has_cudnn' is deprecated, please use 'torch.backends.cudnn.is_available()'
torch.has_cudnn,
/usr/local/lib/python3.10/dist-packages/torch/overrides.py:118: UserWarning: 'has_mps' is deprecated, please use 'torch.backends.mps.is_built()'
torch.has_mps,
/usr/local/lib/python3.10/dist-packages/torch/overrides.py:119: UserWarning: 'has_mkldnn' is deprecated, please use 'torch.backends.mkldnn.is_available()'
torch.has_mkldnn,
Traceback (most recent call last):
File "/workspace/workspace/scripts/vita/test.py", line 59, in <module>
outputs = optimized_model(input_ids, max_new_tokens=128)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 333, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1408, in generate
self._validate_model_class()
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1415, in <resume in generate>
new_generation_config = GenerationConfig.from_model_config(self.config)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1426, in <resume in generate>
generation_config = copy.deepcopy(generation_config)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1429, in <resume in generate>
self._validate_model_kwargs(model_kwargs.copy())
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1429, in <resume in generate>
self._validate_model_kwargs(model_kwargs.copy())
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1432, in <resume in generate>
logits_processor = logits_processor if logits_processor is not None else LogitsProcessorList()
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1433, in <resume in generate>
stopping_criteria = stopping_criteria if stopping_criteria is not None else StoppingCriteriaList()
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1602, in <resume in generate>
return self.greedy_search(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2404, in greedy_search
eos_token_id_tensor = torch.tensor(eos_token_id).to(input_ids.device) if eos_token_id is not None else None
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2450, in <resume in greedy_search>
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 493, in catch_errors
return callback(frame, cache_size, hooks, frame_state)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 624, in _convert_frame
result = inner_convert(frame, cache_size, hooks, frame_state)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 132, in _fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 370, in _convert_frame_assert
return _compile(
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 554, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 180, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 465, in compile_inner
out_code = transform_code_object(code, transform)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object
transformations(instructions, code_options)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/convert_frame.py", line 432, in transform
tracer.run()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2071, in run
super().run()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 724, in run
and self.step()
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 688, in step
getattr(self, inst.opname)(inst)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/symbolic_convert.py", line 2159, in RETURN_VALUE
self.output.compile_subgraph(
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 836, in compile_subgraph
self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
File "/usr/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 936, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 180, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 992, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/output_graph.py", line 988, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 1586, in __call__
return self.compiler_fn(model_, inputs_, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/dynamo/backend/backends.py", line 36, in torch_tensorrt_backend
compiled_mod: torch.nn.Module = DEFAULT_BACKEND(gm, sample_inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/dynamo/backend/backends.py", line 55, in aot_torch_tensorrt_aten_backend
return aot_module_simplified(
File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 3795, in aot_module_simplified
compiled_fn = create_aot_dispatcher_function(
File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/utils.py", line 180, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 3333, in create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 2120, in aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 2300, in aot_wrapper_synthetic_base
return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 1574, in aot_dispatch_base
compiled_fw = compiler(fw_module, flat_args)
File "/usr/local/lib/python3.10/dist-packages/torch/_functorch/aot_autograd.py", line 1492, in f
out_f = compiler(fx_g, inps)
File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/dynamo/backend/backends.py", line 80, in _pretraced_backend
trt_compiled = compile_module(
File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/dynamo/compile.py", line 220, in compile_module
trt_mod = convert_module(
File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/dynamo/conversion/conversion.py", line 40, in convert_module
Input.from_tensors(inputs, disable_memory_format_check=True),
File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 376, in from_tensors
return [
File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 377, in <listcomp>
cls.from_tensor(t, disable_memory_format_check=disable_memory_format_check)
File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 357, in from_tensor
return cls(shape=t.shape, dtype=t.dtype, format=frmt)
torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
AttributeError: 'SymInt' object has no attribute 'shape'
Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
If I add torch._dynamo.config.suppress_errors = True
, it will show the folloowing message:
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT forward /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py line 772
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING] due to:
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last):
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING] File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 357, in from_tensor
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING] return cls(shape=t.shape, dtype=t.dtype, format=frmt)
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING] torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING] AttributeError: 'SymInt' object has no attribute 'shape'
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:30:11,458] torch._dynamo.convert_frame: [WARNING]
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT forward /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py line 614
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING] due to:
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last):
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING] File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 357, in from_tensor
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING] return cls(shape=t.shape, dtype=t.dtype, format=frmt)
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING] torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING] AttributeError: 'SymInt' object has no attribute 'shape'
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:04,803] torch._dynamo.convert_frame: [WARNING]
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT _prepare_decoder_attention_mask /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py line 591
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING] due to:
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last):
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING] File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 357, in from_tensor
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING] return cls(shape=t.shape, dtype=t.dtype, format=frmt)
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING] torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING] AttributeError: 'SymInt' object has no attribute 'shape'
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:05,499] torch._dynamo.convert_frame: [WARNING]
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT _make_causal_mask /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py line 43
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING] due to:
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last):
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING] File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 357, in from_tensor
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING] return cls(shape=t.shape, dtype=t.dtype, format=frmt)
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING] torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING] AttributeError: 'SymInt' object has no attribute 'shape'
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:05,658] torch._dynamo.convert_frame: [WARNING]
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT _expand_mask /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py line 61
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING] due to:
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last):
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING] File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 357, in from_tensor
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING] return cls(shape=t.shape, dtype=t.dtype, format=frmt)
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING] torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING] AttributeError: 'SymInt' object has no attribute 'shape'
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:05,856] torch._dynamo.convert_frame: [WARNING]
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.019075
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:09.090515
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 360710144 bytes of Memory
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT forward /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py line 396
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING] due to:
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last):
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING] File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/dynamo/conversion/conversion.py", line 37, in <listcomp>
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING] output_dtypes = [output.dtype for output in module_outputs]
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING] torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING] AttributeError: 'int' object has no attribute 'dtype'
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:23,793] torch._dynamo.convert_frame: [WARNING]
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT forward /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py line 292
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING] due to:
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last):
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING] File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/dynamo/conversion/conversion.py", line 37, in <listcomp>
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING] output_dtypes = [output.dtype for output in module_outputs]
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING] torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING] AttributeError: 'SymInt' object has no attribute 'dtype'
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:25,133] torch._dynamo.convert_frame: [WARNING]
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
WARNING:torch_tensorrt.dynamo.compile:6 supported operations detected in subgraph containing 6 computational nodes. Skipping this subgraph, since min_block_size was detected to be 7
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT apply_rotary_pos_emb /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py line 180
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING] due to:
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last):
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING] File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 357, in from_tensor
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING] return cls(shape=t.shape, dtype=t.dtype, format=frmt)
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING] torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING] AttributeError: 'SymInt' object has no attribute 'shape'
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:26,538] torch._dynamo.convert_frame: [WARNING]
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
WARNING:torch_tensorrt.dynamo.compile:5 supported operations detected in subgraph containing 6 computational nodes. Skipping this subgraph, since min_block_size was detected to be 7
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
WARNING:torch_tensorrt.dynamo.compile:0 supported operations detected in subgraph containing 0 computational nodes. Skipping this subgraph, since min_block_size was detected to be 7
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
INFO:torch_tensorrt.dynamo.utils:Using Default Torch-TRT Runtime (as requested by user)
INFO:torch_tensorrt.dynamo.utils:Compilation Settings: CompilationSettings(precision=torch.float32, debug=False, workspace_size=21474836480, min_block_size=7, torch_executed_ops={}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=True, enable_experimental_decompositions=False)
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING] WON'T CONVERT forward /usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py line 202
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING] due to:
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING] Traceback (most recent call last):
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING] File "/usr/local/lib/python3.10/dist-packages/torch_tensorrt/_Input.py", line 357, in from_tensor
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING] return cls(shape=t.shape, dtype=t.dtype, format=frmt)
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING] torch._dynamo.exc.BackendCompilerFailed: backend='torch_tensorrt' raised:
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING] AttributeError: 'SymInt' object has no attribute 'shape'
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING]
[2023-10-02 01:31:27,967] torch._dynamo.convert_frame: [WARNING]
I'd like to know which specific line of code causes this problem, and what the error message means. This warning AttributeError: 'SymInt' object has no attribute 'shape'
was present throughout the compilation process and most forward
functions are avoided as the result, which really compromises the performance gain from compilation. It seems this error has something to do with the dynamic shape and possible represent a symbolic variable but I'm not sure of the specifics.