added streaming guide

2025-10-15 22:47:59 +00:00 · 2024-11-01 11:41:03 -07:00 · 2024-11-01 11:41:03 -07:00 · ed70e140eb
commit ed70e140eb
parent bf16d7729f
1 changed files with 47 additions and 1 deletions
--- a/docs/source/chat_completion_guide.md
+++ b/docs/source/chat_completion_guide.md
@ -8,7 +8,7 @@ This document provides instructions on how to use Llama Stack's `chat_completion
 2. [Building Effective Prompts](#building-effective-prompts)
 3. [Conversation Loop](#conversation-loop)
 4. [Conversation History](#conversation-history)
-
+5. 

 ## Quickstart

@ -141,6 +141,52 @@ async def chat_loop():
 asyncio.run(chat_loop())
 ```

+## Streaming Responses with Llama Stack
+
+Llama Stack offers a `stream` parameter in the `chat_completion` function, which allows partial responses to be returned progressively as they are generated. This can enhance user experience by providing immediate feedback without waiting for the entire response to be processed.
+
+### Example: Streaming Responses
+
+The following code demonstrates how to use the `stream` parameter to enable response streaming. When `stream=True`, the `chat_completion` function will yield tokens as they are generated. To display these tokens, this example leverages asynchronous streaming with `EventLogger`.
+
+```python
+import asyncio
+from llama_stack_client import LlamaStackClient
+from llama_stack_client.lib.inference.event_logger import EventLogger
+from llama_stack_client.types import UserMessage
+from termcolor import cprint
+
+async def run_main(stream: bool = True):
+    client = LlamaStackClient(
+        base_url="http://localhost:5000",
+    )
+
+    message = UserMessage(
+        content="hello world, write me a 2 sentence poem about the moon", role="user"
+    )
+    print(f"User>{message.content}", "green")
+
+    response = client.inference.chat_completion(
+        messages=[message],
+        model="Llama3.2-11B-Vision-Instruct",
+        stream=stream,
+    )
+
+    if not stream:
+        cprint(f"> Response: {response}", "cyan")
+    else:
+        async for log in EventLogger().log(response):
+            log.print()
+
+    models_response = client.models.list()
+    print(models_response)
+
+if __name__ == "__main__":
+    asyncio.run(run_main())
+```
+
+
+

 ---