docs: add instruction for langchain (vllm-project#1162)

mspronesti · jimpang · commit 95ae62ccbc4e · 2023-12-04T10:27:13.000+08:00
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -66,6 +66,7 @@ Documentation
    serving/run_on_sky
    serving/deploying_with_triton
    serving/deploying_with_docker
+   serving/serving_with_langchain
 
 .. toctree::
    :maxdepth: 1
diff --git a/docs/source/serving/serving_with_langchain.rst b/docs/source/serving/serving_with_langchain.rst
@@ -0,0 +1,31 @@
+.. _run_on_langchain:
+
+Serving with Langchain
+============================
+
+vLLM is also available via `Langchain <https://github.com/langchain-ai/langchain>`_ .
+
+To install langchain, run
+
+.. code-block:: console
+
+    $ pip install langchain -q
+
+To run inference on a single or multiple GPUs, use ``VLLM`` class from ``langchain``.
+
+.. code-block:: python
+
+    from langchain.llms import VLLM
+
+    llm = VLLM(model="mosaicml/mpt-7b",
+               trust_remote_code=True,  # mandatory for hf models
+               max_new_tokens=128,
+               top_k=10,
+               top_p=0.95,
+               temperature=0.8,
+               # tensor_parallel_size=... # for distributed inference
+    )
+
+    print(llm("What is the capital of France ?"))
+
+Please refer to this `Tutorial <https://github.com/langchain-ai/langchain/blob/master/docs/extras/integrations/llms/vllm.ipynb>`_ for more details.