Switch the order of the to_dtype function and source transform

cccclai · cccclai · commit c8df1ab19739 · 2024-05-28T15:01:26.000-07:00
We're running quantization during source transform and some quantization infra doesn't support bf16 yet. Move to_dtype one stage earlier so we can choose the dtype fp32 before running quantization transform. Differential Revision: [D57883363](https://our.internmc.facebook.com/intern/diff/D57883363/) ghstack-source-id: 228003406 Pull Request resolved: #3757
diff --git a/examples/models/llama2/export_llama_lib.py b/examples/models/llama2/export_llama_lib.py
@@ -366,8 +366,8 @@ def _prepare_for_llama_export(modelname: str, args) -> LlamaEdgeManager:
         )
         .set_output_dir(output_dir_path)
         .set_metadata(args.metadata)
-        .source_transform(transforms)
         .to_dtype(dtype_override)
+        .source_transform(transforms)
     )
 
 

Original file line number	Diff line number	Diff line change
`@@ -366,8 +366,8 @@ def _prepare_for_llama_export(modelname: str, args) -> LlamaEdgeManager:`
`366`	`366`	`)`
`367`	`367`	`.set_output_dir(output_dir_path)`
`368`	`368`	`.set_metadata(args.metadata)`
`369`		`- .source_transform(transforms)`
`370`	`369`	`.to_dtype(dtype_override)`
	`370`	`+ .source_transform(transforms)`
`371`	`371`	`)`
`372`	`372`
`373`	`373`