added streaming guide

2025-07-30 07:39:38 +00:00 · 2024-11-01 11:41:03 -07:00 · 2024-11-01 11:41:03 -07:00 · ed70e140eb
commit ed70e140eb
parent bf16d7729f
1 changed files with 47 additions and 1 deletions
--- a/docs/source/chat_completion_guide.md
+++ b/docs/source/chat_completion_guide.md
@ -8,7 +8,7 @@ This document provides instructions on how to use Llama Stack's `chat_completion
 2. [Building Effective Prompts](#building-effective-prompts)
 3. [Conversation Loop](#conversation-loop)
 4. [Conversation History](#conversation-history)
-
+5. 
 ## Quickstart
@ -141,6 +141,52 @@ async def chat_loop():
 asyncio.run(chat_loop())
 ```
 ## Streaming Responses with Llama Stack
 Llama Stack offers a `stream` parameter in the `chat_completion` function, which allows partial responses to be returned progressively as they are generated. This can enhance user experience by providing immediate feedback without waiting for the entire response to be processed.
 ### Example: Streaming Responses
 The following code demonstrates how to use the `stream` parameter to enable response streaming. When `stream=True`, the `chat_completion` function will yield tokens as they are generated. To display these tokens, this example leverages asynchronous streaming with `EventLogger`.
 ```python
 import asyncio
 from llama_stack_client import LlamaStackClient
 from llama_stack_client.lib.inference.event_logger import EventLogger
 from llama_stack_client.types import UserMessage
 from termcolor import cprint
 async def run_main(stream: bool = True):
    client = LlamaStackClient(
        base_url="http://localhost:5000",
    )
    message = UserMessage(
        content="hello world, write me a 2 sentence poem about the moon", role="user"
    )
    print(f"User>{message.content}", "green")
    response = client.inference.chat_completion(
        messages=[message],
        model="Llama3.2-11B-Vision-Instruct",
        stream=stream,
    )
    if not stream:
        cprint(f"> Response: {response}", "cyan")
    else:
        async for log in EventLogger().log(response):
            log.print()
    models_response = client.models.list()
    print(models_response)
 if __name__ == "__main__":
    asyncio.run(run_main())
 ```
 ---