Considering only single-batch inference and disregarding multi-GPU scheduling, can the generate_wedlm method in WeDLMForCausalLM class be considered a minimalist version of wedlm's llm_engine? Are there differences in speed and accuracy between the two?