Hi @joecummings, after we discussed in kubeflow/trainer#2410 and Kubeflow WG Training Call, we found that torchtune is an amazing tool for fine-tuning LLMs and decided to adopt it as our low-level runtime for Kubeflow LLM Trainer. For now, we've started the implementation based on torchtune. Thanks for your engagement in the discussion.

However, we want to decouple the data preprocessing and tokenization step from the main fine-tuning phase, so as to:
- Reduce the time for using GPUs: we will wrap
torchtune into a container and request for GPU resource for it (GPU is expensive and paid according to usage time)
- Integrate the data preprocessing / tokenization step with our data initializer: Do these steps ahead of fine-tuning and offload them to CPU
We wonder if torchtune have best practice to achieve these goals. And we'll appreciate it if you could offer some precious suggestions. Thanks!
Also /cc @andreyvelich @tenzen-y @astefanutti @deepanker13 @saileshd1402 @seanlaii
Hi @joecummings, after we discussed in kubeflow/trainer#2410 and Kubeflow WG Training Call, we found that
torchtuneis an amazing tool for fine-tuning LLMs and decided to adopt it as our low-level runtime for Kubeflow LLM Trainer. For now, we've started the implementation based ontorchtune. Thanks for your engagement in the discussion.However, we want to decouple the data preprocessing and tokenization step from the main fine-tuning phase, so as to:
torchtuneinto a container and request for GPU resource for it (GPU is expensive and paid according to usage time)We wonder if
torchtunehave best practice to achieve these goals. And we'll appreciate it if you could offer some precious suggestions. Thanks!Also /cc @andreyvelich @tenzen-y @astefanutti @deepanker13 @saileshd1402 @seanlaii