You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/features/structured_outputs.md
+62Lines changed: 62 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -330,3 +330,65 @@ ParsedChatCompletionMessage[Info](content='{"addr": "No.1 Century Avenue, Pudong
330
330
Address: No.1 Century Avenue, Pudong New Area, Shanghai
331
331
Height: 468
332
332
```
333
+
334
+
### Offline Inference
335
+
336
+
Offline inference allows restricting the model's output format by pre-specified constraints. In `FastDeploy`, constraints can be specified through the `GuidedDecodingParams` class in `SamplingParams`. `GuidedDecodingParams` supports the following constraint types, with usage similar to online inference:
337
+
338
+
```python
339
+
json: Optional[Union[str, dict]] =None
340
+
regex: Optional[str] =None
341
+
choice: Optional[List[str]] =None
342
+
grammar: Optional[str] =None
343
+
json_object: Optional[bool] =None
344
+
structural_tag: Optional[str] =None
345
+
```
346
+
347
+
The following example demonstrates how to use offline inference to generate a structured json:
348
+
349
+
```python
350
+
from fastdeploy importLLM, SamplingParams
351
+
from fastdeploy.engine.sampling_params import GuidedDecodingParams
0 commit comments