https://github.com/mit-han-lab/omniserve/blob/main/omniserve/modeling/layers/quantized_linear/w4a8_linear.py#L176, it seems we do not quantize the int8 w to range [-119, 119]? And how to caculate the s1_scale? just like the int8 quantization? but use qmin=-119, qmax=119?