arg : add env variable for parallel (ggml-org#9513)

bertwagner · arthw · commit 09549b7bb3b9 · 2024-11-18T14:08:25.000+08:00
* add env variable for parallel

* Update README.md with env:  LLAMA_ARG_N_PARALLEL
diff --git a/common/arg.cpp b/common/arg.cpp
@@ -1312,7 +1312,7 @@ gpt_params_context gpt_params_parser_init(gpt_params & params, llama_example ex,
         [](gpt_params & params, int value) {
             params.n_parallel = value;
         }
-    ));
+    ).set_env("LLAMA_ARG_N_PARALLEL"));
     add_opt(llama_arg(
         {"-ns", "--sequences"}, "N",
         format("number of sequences to decode (default: %d)", params.n_sequences),
diff --git a/examples/server/README.md b/examples/server/README.md
@@ -87,7 +87,7 @@ The project is under active development, and we are [looking for feedback and co
 | `-ctk, --cache-type-k TYPE` | KV cache data type for K (default: f16) |
 | `-ctv, --cache-type-v TYPE` | KV cache data type for V (default: f16) |
 | `-dt, --defrag-thold N` | KV cache defragmentation threshold (default: -1.0, < 0 - disabled)<br/>(env: LLAMA_ARG_DEFRAG_THOLD) |
-| `-np, --parallel N` | number of parallel sequences to decode (default: 1) |
+| `-np, --parallel N` | number of parallel sequences to decode (default: 1)<br/>(env:  LLAMA_ARG_N_PARALLEL) |
 | `-cb, --cont-batching` | enable continuous batching (a.k.a dynamic batching) (default: enabled)<br/>(env: LLAMA_ARG_CONT_BATCHING) |
 | `-nocb, --no-cont-batching` | disable continuous batching<br/>(env: LLAMA_ARG_NO_CONT_BATCHING) |
 | `--mlock` | force system to keep model in RAM rather than swapping or compressing |

Original file line number	Diff line number	Diff line change
`@@ -1312,7 +1312,7 @@ gpt_params_context gpt_params_parser_init(gpt_params & params, llama_example ex,`
`1312`	`1312`	`[](gpt_params & params, int value) {`
`1313`	`1313`	`params.n_parallel = value;`
`1314`	`1314`	`}`
`1315`		`- ));`
	`1315`	`+ ).set_env("LLAMA_ARG_N_PARALLEL"));`
`1316`	`1316`	`add_opt(llama_arg(`
`1317`	`1317`	`{"-ns", "--sequences"}, "N",`
`1318`	`1318`	`format("number of sequences to decode (default: %d)", params.n_sequences),`