Switch the order of the to_dtype function and source transform (#3757)

cccclai · facebook-github-bot · commit 2badd7643e24 · 2024-05-29T14:02:13.000-07:00
Summary: Pull Request resolved: #3757 We're running quantization during source transform and some quantization infra doesn't support bf16 yet. Move to_dtype one stage earlier so we can choose the dtype fp32 before running quantization transform. ghstack-source-id: 228125529 Reviewed By: shoumikhin Differential Revision: D57883363 fbshipit-source-id: d74f9b6de09762c5412b48feb16c60abbcc3f9f8
diff --git a/examples/models/llama2/export_llama_lib.py b/examples/models/llama2/export_llama_lib.py
@@ -374,8 +374,8 @@ def _prepare_for_llama_export(modelname: str, args) -> LlamaEdgeManager:
         )
         .set_output_dir(output_dir_path)
         .set_metadata(args.metadata)
-        .source_transform(transforms)
         .to_dtype(dtype_override)
+        .source_transform(transforms)
     )
 
 

Original file line number	Diff line number	Diff line change
`@@ -374,8 +374,8 @@ def _prepare_for_llama_export(modelname: str, args) -> LlamaEdgeManager:`
`374`	`374`	`)`
`375`	`375`	`.set_output_dir(output_dir_path)`
`376`	`376`	`.set_metadata(args.metadata)`
`377`		`- .source_transform(transforms)`
`378`	`377`	`.to_dtype(dtype_override)`
	`378`	`+ .source_transform(transforms)`
`379`	`379`	`)`
`380`	`380`
`381`	`381`