Skip to content

FP16 outputs error of TensorRT 8.6.1.2 when running Roberta #3101

Open
@DayDayupupupup

Description

@DayDayupupupup

Description

Since the INormalization layer was added in TRT8.6, I do some tests with the fp16's accuracy:

  1. First, I use huggingface‘s bert-base-cased, exported it to onnx(opset17). Then using polygraphy to test the accuracy of fp16. Output(last_hidden_state, pooler_output):Difference is within tolerance (rel=1e-05, abs=0.01)
  2. Then, I use roberta-base, and found that the fp16 results still had errors: PASSED | Output: 'pooler_output' | Difference is within tolerance (rel=1e-05, abs=0.01), FAILED | Output: 'last_hidden_state'

Environment

TensorRT Version: 8.6.1.2
NVIDIA GPU: A30
NVIDIA Driver Version: 510.47.03
CUDA Version: 11.6
Operating System: Ubuntu 20.04.2 LTS
Tensorflow Version (if applicable): 1.15.5
Container version: nvcr.io/nvidia/tensorrt:23.05-py3

Steps To Reproduce

Test1: roberta-base

  1. export roberta-base to onnx
from transformers import RobertaTokenizer, RobertaModel
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = RobertaModel.from_pretrained('roberta-base')
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, padding='max_length', max_length=128, return_tensors='pt')
output = model(**encoded_input)

model.eval()
import torch
with torch.no_grad():
    torch.onnx.export(model,               
                      tuple(encoded_input.values()),            
                      "roberta_base_opset17.onnx",   
                      export_params=True,   
                      opset_version=17,      
                      do_constant_folding=True, 
                      input_names=['input_ids','input_mask'],  
                      output_names=['last_hidden_state', 'pooler_output'], 
                      dynamic_axes={'input_ids': {0: 'batch_size'},
                                    'input_mask': {0: 'batch_size'},
                                    'last_hidden_state': {0: 'batch_size'},
                                    'pooler_output': {0: 'batch_size'}})
  1. polygraphy run roberta_base_opset17.onnx --trt --onnxrt --atol 0.01 --pool-limit workspace:10G --fp16
[I]     Comparing Output: 'last_hidden_state' (dtype=float32, shape=(1, 128, 768)) with 'last_hidden_state' (dtype=float32, shape=(1, 128, 768))
[I]         Tolerance: [abs=0.01, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-06/30/23-08:09:48: last_hidden_state | Stats: mean=0.020138, std-dev=0.4103, var=0.16835, median=0.0063438, min=-2.6055 at (0, 0, 453), max=11.375 at (0, 9, 588), avg-magnitude=0.11272
[I]             ---- Histogram ----
                Bin Range      |  Num Elems | Visualization
                (-2.61, -1.21) |        213 |
                (-1.21, 0.191) |      92069 | ########################################
                (0.191, 1.59 ) |       5752 | ##
                (1.59 , 2.99 ) |         71 |
                (2.99 , 4.38 ) |          0 |
                (4.38 , 5.78 ) |         71 |
                (5.78 , 7.18 ) |          0 |
                (7.18 , 8.58 ) |         71 |
                (8.58 , 9.98 ) |          0 |
                (9.98 , 11.4 ) |         57 |
[I]         onnxrt-runner-N0-06/30/23-08:09:48: last_hidden_state | Stats: mean=0.020128, std-dev=0.40961, var=0.16778, median=0.0070637, min=-2.5995 at (0, 0, 453), max=11.349 at (0, 38, 588), avg-magnitude=0.11256
[I]             ---- Histogram ----
                Bin Range      |  Num Elems | Visualization
                (-2.61, -1.21) |        213 |
                (-1.21, 0.191) |      92069 | ########################################
                (0.191, 1.59 ) |       5752 | ##
                (1.59 , 2.99 ) |         71 |
                (2.99 , 4.38 ) |          0 |
                (4.38 , 5.78 ) |         71 |
                (5.78 , 7.18 ) |          0 |
                (7.18 , 8.58 ) |         71 |
                (8.58 , 9.98 ) |          0 |
                (9.98 , 11.4 ) |         57 |
[I]         Error Metrics: last_hidden_state
[I]             Minimum Required Tolerance: elemwise error | [abs=0.049532] OR [rel=2643.4] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.0011686, std-dev=0.0015184, var=2.3056e-06, median=0.00079408, min=1.4901e-08 at (0, 105, 34), max=0.049532 at (0, 69, 588), avg-magnitude=0.0011686
[I]                 ---- Histogram ----
                    Bin Range           |  Num Elems | Visualization
                    (1.49e-08, 0.00495) |      97111 | ########################################
                    (0.00495 , 0.00991) |        923 |
                    (0.00991 , 0.0149 ) |        142 |
                    (0.0149  , 0.0198 ) |         73 |
                    (0.0198  , 0.0248 ) |          2 |
                    (0.0248  , 0.0297 ) |         10 |
                    (0.0297  , 0.0347 ) |          7 |
                    (0.0347  , 0.0396 ) |         10 |
                    (0.0396  , 0.0446 ) |         17 |
                    (0.0446  , 0.0495 ) |          9 |
[I]             Relative Difference | Stats: mean=0.083636, std-dev=8.4975, var=72.207, median=0.01208, min=9.7178e-07 at (0, 20, 249), max=2643.4 at (0, 69, 485), avg-magnitude=0.083636
[I]                 ---- Histogram ----
                    Bin Range            |  Num Elems | Visualization
                    (9.72e-07, 264     ) |      98303 | ########################################
                    (264     , 529     ) |          0 |
                    (529     , 793     ) |          0 |
                    (793     , 1.06e+03) |          0 |
                    (1.06e+03, 1.32e+03) |          0 |
                    (1.32e+03, 1.59e+03) |          0 |
                    (1.59e+03, 1.85e+03) |          0 |
                    (1.85e+03, 2.11e+03) |          0 |
                    (2.11e+03, 2.38e+03) |          0 |
                    (2.38e+03, 2.64e+03) |          1 |
[E]         FAILED | Output: 'last_hidden_state' | Difference exceeds tolerance (rel=1e-05, abs=0.01)
[I]     Comparing Output: 'pooler_output' (dtype=float32, shape=(1, 768)) with 'pooler_output' (dtype=float32, shape=(1, 768))
[I]         Tolerance: [abs=0.01, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-06/30/23-08:09:48: pooler_output | Stats: mean=0.0042347, std-dev=0.21781, var=0.047442, median=0.01236, min=-0.64404 at (0, 165), max=0.58496 at (0, 509), avg-magnitude=0.17412
[I]         onnxrt-runner-N0-06/30/23-08:09:48: pooler_output | Stats: mean=0.0041949, std-dev=0.2177, var=0.047392, median=0.01219, min=-0.64402 at (0, 165), max=0.58522 at (0, 509), avg-magnitude=0.17403
[I]         Error Metrics: pooler_output
[I]             Minimum Required Tolerance: elemwise error | [abs=0.0042159] OR [rel=4.1577] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.00095319, std-dev=0.00070167, var=4.9234e-07, median=0.00081642, min=1.4156e-06 at (0, 245), max=0.0042159 at (0, 591), avg-magnitude=0.00095319
[I]             Relative Difference | Stats: mean=0.033797, std-dev=0.22256, var=0.049531, median=0.0056062, min=6.7266e-06 at (0, 245), max=4.1577 at (0, 167), avg-magnitude=0.033797
[I]         PASSED | Output: 'pooler_output' | Difference is within tolerance (rel=1e-05, abs=0.01)
[E]     FAILED | Mismatched outputs: ['last_hidden_state']

When I use real data, the error is even greater

import numpy as np
from polygraphy.json import save_json

# Option 1: Define a function that will yield feed_dicts (i.e. Dict[str, np.ndarray])
def load_data():
    for _ in range(1):
        yield {"input_ids": encoded_input['input_ids'].numpy(),
               "input_mask": encoded_input['attention_mask'].numpy()}  # Still totally real data

# Option 2: Create a JSON file containing the input data using the `save_json()` helper.
#   The input to `save_json()` should have type: List[Dict[str, np.ndarray]].
#   For convenience, we'll reuse our `load_data()` implementation to generate the list.
input_data = list(load_data())
save_json(input_data, "custom_inputs.json", description="custom input data")

then polygraphy run roberta_base_opset17.onnx --trt --onnxrt --atol 0.01 --pool-limit workspace:10G --fp16 --load-inputs custom_inputs.json

[I]     Comparing Output: 'last_hidden_state' (dtype=float32, shape=(1, 128, 768)) with 'last_hidden_state' (dtype=float32, shape=(1, 128, 768))
[I]         Tolerance: [abs=0.01, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-06/30/23-08:20:22: last_hidden_state | Stats: mean=0.018884, std-dev=0.41145, var=0.16929, median=0.0093536, min=-8.2969 at (0, 9, 77), max=12.07 at (0, 10, 588), avg-magnitude=0.11438
[I]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-8.3  , -6.26 ) |          5 |
                (-6.26 , -4.22 ) |          4 |
                (-4.22 , -2.18 ) |        117 |
                (-2.18 , -0.148) |       8235 | ###
                (-0.148, 1.89  ) |      89815 | ########################################
                (1.89  , 3.93  ) |          0 |
                (3.93  , 5.96  ) |          0 |
                (5.96  , 8     ) |          0 |
                (8     , 10    ) |          6 |
                (10    , 12.1  ) |        122 |
[I]         onnxrt-runner-N0-06/30/23-08:20:22: last_hidden_state | Stats: mean=0.018878, std-dev=0.41122, var=0.1691, median=0.0091678, min=-8.2829 at (0, 9, 77), max=12.076 at (0, 10, 588), avg-magnitude=0.11435
[I]             ---- Histogram ----
                Bin Range        |  Num Elems | Visualization
                (-8.3  , -6.26 ) |          5 |
                (-6.26 , -4.22 ) |          4 |
                (-4.22 , -2.18 ) |        117 |
                (-2.18 , -0.148) |       8235 | ###
                (-0.148, 1.89  ) |      89815 | ########################################
                (1.89  , 3.93  ) |          0 |
                (3.93  , 5.96  ) |          0 |
                (5.96  , 8     ) |          0 |
                (8     , 10    ) |          6 |
                (10    , 12.1  ) |        122 |
[I]         Error Metrics: last_hidden_state
[I]             Minimum Required Tolerance: elemwise error | [abs=0.046174] OR [rel=62.955] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.00074002, std-dev=0.00098861, var=9.7735e-07, median=0.0005722, min=1.1176e-08 at (0, 10, 666), max=0.046174 at (0, 7, 77), avg-magnitude=0.00074002
[I]                 ---- Histogram ----
                    Bin Range           |  Num Elems | Visualization
                    (1.12e-08, 0.00462) |      97874 | ########################################
                    (0.00462 , 0.00923) |        178 |
                    (0.00923 , 0.0139 ) |        124 |
                    (0.0139  , 0.0185 ) |        119 |
                    (0.0185  , 0.0231 ) |          4 |
                    (0.0231  , 0.0277 ) |          3 |
                    (0.0277  , 0.0323 ) |          0 |
                    (0.0323  , 0.0369 ) |          0 |
                    (0.0369  , 0.0416 ) |          1 |
                    (0.0416  , 0.0462 ) |          1 |
[I]             Relative Difference | Stats: mean=0.15335, std-dev=2.3405, var=5.4779, median=0.0082764, min=2.8881e-07 at (0, 10, 666), max=62.955 at (0, 12, 85), avg-magnitude=0.15335
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (2.89e-07, 6.3 ) |      97836 | ########################################
                    (6.3     , 12.6) |        234 |
                    (12.6    , 18.9) |          1 |
                    (18.9    , 25.2) |        117 |
                    (25.2    , 31.5) |          0 |
                    (31.5    , 37.8) |          0 |
                    (37.8    , 44.1) |          0 |
                    (44.1    , 50.4) |          0 |
                    (50.4    , 56.7) |          0 |
                    (56.7    , 63  ) |        116 |
[E]         FAILED | Output: 'last_hidden_state' | Difference exceeds tolerance (rel=1e-05, abs=0.01)
[I]     Comparing Output: 'pooler_output' (dtype=float32, shape=(1, 768)) with 'pooler_output' (dtype=float32, shape=(1, 768))
[I]         Tolerance: [abs=0.01, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-06/30/23-08:20:22: pooler_output | Stats: mean=0.0019211, std-dev=0.22539, var=0.050801, median=-0.0029383, min=-0.58057 at (0, 630), max=0.57764 at (0, 82), avg-magnitude=0.18478
[I]         onnxrt-runner-N0-06/30/23-08:20:22: pooler_output | Stats: mean=0.0019204, std-dev=0.22572, var=0.05095, median=-0.0030782, min=-0.58187 at (0, 630), max=0.57884 at (0, 680), avg-magnitude=0.18506
[I]         Error Metrics: pooler_output
[I]             Minimum Required Tolerance: elemwise error | [abs=0.0013217] OR [rel=0.65804] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.00032893, std-dev=0.00025306, var=6.404e-08, median=0.0002878, min=4.3353e-07 at (0, 567), max=0.0013217 at (0, 472), avg-magnitude=0.00032893
[I]             Relative Difference | Stats: mean=0.005479, std-dev=0.030208, var=0.00091252, median=0.0019026, min=3.8931e-06 at (0, 377), max=0.65804 at (0, 736), avg-magnitude=0.005479
[I]         PASSED | Output: 'pooler_output' | Difference is within tolerance (rel=1e-05, abs=0.01)
[E]     FAILED | Mismatched outputs: ['last_hidden_state']

Test2: chinese-roberta-wwm-ext

Relevant Files: Download tensorflow ckpt at below link:
Model link: chinese-roberta-wwm-ext tensorflow ckpt

As mentioned in this question #2466,bert4keras is still used to process the model

2.1 Create savedmodel

# tf1.15.5(gpu) 
# bert4keras=0.11.4
import os
os.environ['TF_KERAS'] = '1'
import numpy as np
from bert4keras.models import build_transformer_model
from bert4keras.tokenizers import Tokenizer
from bert4keras.backend import keras, K
import tensorflow as tf

# load RoBERTa
model = build_transformer_model(
    config_path="bert_config.json",
    checkpoint_path='bert_model.ckpt',
    sequence_length=128,
    #model='roberta',
    #with_mlm=False,
    return_keras_model=False
)

bert_output = keras.layers.Dense(units=1)(model.output)
bert_output = keras.layers.Lambda(lambda x : K.squeeze(x, axis=2))(bert_output)
model = keras.models.Model(model.input, bert_output)

sess = K.get_session()
print([i.op.name for i in model.input])
print(model.output)
input0 = tf.get_default_graph().get_tensor_by_name("Input-Token:0")
input1 = tf.get_default_graph().get_tensor_by_name("Input-Segment:0")
output1 = tf.get_default_graph().get_tensor_by_name("lambda/Squeeze:0")

inputs = {"Input-Token": input0,"Input-Segment": input1}
# 3. save
tf.saved_model.simple_save(sess,
                         'saved_model',
                         inputs=inputs,
                         outputs=outputs)

2.2 Create onnx model with tf2onnx(1.13.0)

python -m tf2onnx.convert --saved-model saved_model --output roberta_wwm_ext_opset17.onnx --opset 17

2.3 fuse layernorm

Because tf2onnx splits layernorm, it needs to be merged manually. (fp16 result is wrong without fuse layernorm)

import onnx
from onnx import numpy_helper
from onnx import helper
import onnx_graphsurgeon as gs

model_path = "roberta_wwm_ext_opset17.onnx"
onnx_model = onnx.load(model_path)

graph = gs.import_onnx(onnx_model)

# get splited LayerNormalization
ln_inputs = []
betas = []
gamas = []
ln_outputs = []

for node in graph.nodes:
    
    # get epsilon 1e-12
    # if node.op == 'Add' and ('Norm/add' in node.name) and ('add_1' not in node.name):
    #     epsilon = node.inputs[1].values
    #     print(epsilon)
        
    # get B, Scale , ln_output
    if node.op == 'Add' and 'Norm/add_1' in node.name:
        B = node.inputs[1]
        # print(B.name)
        Scale = node.i().inputs[1]
        # print(Scale.name)
        ln_output = node.outputs
        
        gamas.append(Scale)
        betas.append(B)
        ln_outputs.append(ln_output)
        node.inputs.clear()
        
    # get ln_input
    if node.op == 'Sub' and 'Norm/sub' in node.name:
        for inp in node.inputs:
            if 'add' in inp.name:
                ln_input = inp
                # print(ln_input.name)
                ln_inputs.append(ln_input)
                node.outputs.clear()
            
assert len(ln_inputs)==len(betas)==len(gamas)==len(ln_outputs)    
for i in range(len(ln_inputs)):
    fused_node = gs.Node(
        op="LayerNormalization",
        inputs=[
            ln_inputs[i],  # input
            gamas[i],     # gamma
            betas[i],         # beta
        ],
        outputs=ln_outputs[i],
        attrs={'axis':-1, 'epsilon':1e-12})
    
    graph.nodes.append(fused_node)

for node in graph.nodes:
    if not node.inputs:
        node.outputs.clear()
graph.cleanup().toposort()
onnx.save(gs.export_onnx(graph), "roberta_wwm_ext_opset17_fuse_ln.onnx")
print('done')

2.4 polygraphy run roberta_wwm_ext_opset17_fuse_ln.onnx --trt --onnxrt --atol 0.01 --pool-limit workspace:10G --fp16

[I]     Comparing Output: 'lambda' (dtype=float32, shape=(1, 128)) with 'lambda' (dtype=float32, shape=(1, 128))
[I]         Tolerance: [abs=0.01, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-06/30/23-09:12:15: lambda | Stats: mean=0.55965, std-dev=0.15921, var=0.025347, median=0.55591, min=0.245 at (0, 80), max=1.4902 at (0, 0), avg-magnitude=0.55965
[I]             ---- Histogram ----
                Bin Range      |  Num Elems | Visualization
                (0.245, 0.37 ) |         11 | ##########
                (0.37 , 0.494) |         33 | ###############################
                (0.494, 0.619) |         42 | ########################################
                (0.619, 0.744) |         32 | ##############################
                (0.744, 0.869) |          9 | ########
                (0.869, 0.993) |          0 |
                (0.993, 1.12 ) |          0 |
                (1.12 , 1.24 ) |          0 |
                (1.24 , 1.37 ) |          0 |
                (1.37 , 1.49 ) |          1 |
[I]         onnxrt-runner-N0-06/30/23-09:12:15: lambda | Stats: mean=0.56168, std-dev=0.15942, var=0.025416, median=0.55772, min=0.24882 at (0, 80), max=1.4923 at (0, 0), avg-magnitude=0.56168
[I]             ---- Histogram ----
                Bin Range      |  Num Elems | Visualization
                (0.245, 0.37 ) |         11 | ##########
                (0.37 , 0.494) |         32 | ##############################
                (0.494, 0.619) |         42 | ########################################
                (0.619, 0.744) |         33 | ###############################
                (0.744, 0.869) |          8 | #######
                (0.869, 0.993) |          1 |
                (0.993, 1.12 ) |          0 |
                (1.12 , 1.24 ) |          0 |
                (1.24 , 1.37 ) |          0 |
                (1.37 , 1.49 ) |          1 |
[I]         Error Metrics: lambda
[I]             Minimum Required Tolerance: elemwise error | [abs=0.0103] OR [rel=0.018481] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.0029195, std-dev=0.0020367, var=4.1483e-06, median=0.0025767, min=9.1791e-06 at (0, 69), max=0.0103 at (0, 104), avg-magnitude=0.0029195
[I]                 ---- Histogram ----
                    Bin Range           |  Num Elems | Visualization
                    (9.18e-06, 0.00104) |         27 | ########################################
                    (0.00104 , 0.00207) |         23 | ##################################
                    (0.00207 , 0.0031 ) |         25 | #####################################
                    (0.0031  , 0.00413) |         19 | ############################
                    (0.00413 , 0.00515) |         15 | ######################
                    (0.00515 , 0.00618) |         11 | ################
                    (0.00618 , 0.00721) |          5 | #######
                    (0.00721 , 0.00824) |          2 | ##
                    (0.00824 , 0.00927) |          0 |
                    (0.00927 , 0.0103 ) |          1 | #
[I]             Relative Difference | Stats: mean=0.005554, std-dev=0.0040631, var=1.6508e-05, median=0.0051507, min=1.5757e-05 at (0, 69), max=0.018481 at (0, 94), avg-magnitude=0.005554
[I]                 ---- Histogram ----
                    Bin Range           |  Num Elems | Visualization
                    (1.58e-05, 0.00186) |         30 | ########################################
                    (0.00186 , 0.00371) |         20 | ##########################
                    (0.00371 , 0.00556) |         19 | #########################
                    (0.00556 , 0.0074 ) |         24 | ################################
                    (0.0074  , 0.00925) |         16 | #####################
                    (0.00925 , 0.0111 ) |          6 | ########
                    (0.0111  , 0.0129 ) |          5 | ######
                    (0.0129  , 0.0148 ) |          3 | ####
                    (0.0148  , 0.0166 ) |          4 | #####
                    (0.0166  , 0.0185 ) |          1 | #
[E]         FAILED | Output: 'lambda' | Difference exceeds tolerance (rel=1e-05, abs=0.01)
[E]     FAILED | Mismatched outputs: ['lambda']

Question
Bert-base is fine, so I'm not sure if this error was caused by layernorm or roberta.
Because on trt8.5, if I set LayerNorm plugin to fp32, the inference is correct.
However, on trt8.6, I tried to set the INormalization layer to fp32, then the entire model is on fp32, because the visualization engine found only one myelin layer.

What can be done to ensure the accuracy of roberta fp16?

Metadata

Metadata

Assignees

Labels

Module:AccuracyOutput mismatch between TensorRT and other frameworkstriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions