diff --git a/docs/source/chat_completion_guide.md b/docs/source/chat_completion_guide.md index 540222771..d6787fd1d 100644 --- a/docs/source/chat_completion_guide.md +++ b/docs/source/chat_completion_guide.md @@ -8,7 +8,8 @@ This document provides instructions on how to use Llama Stack's `chat_completion 2. [Building Effective Prompts](#building-effective-prompts) 3. [Conversation Loop](#conversation-loop) 4. [Conversation History](#conversation-history) -5. +5. [Streaming Responses](#streaming-responses) + ## Quickstart @@ -141,7 +142,7 @@ async def chat_loop(): asyncio.run(chat_loop()) ``` -## Streaming Responses with Llama Stack +## Streaming Responses Llama Stack offers a `stream` parameter in the `chat_completion` function, which allows partial responses to be returned progressively as they are generated. This can enhance user experience by providing immediate feedback without waiting for the entire response to be processed. @@ -186,8 +187,6 @@ if __name__ == "__main__": ``` - - --- With these fundamentals, you should be well on your way to leveraging Llama Stack’s text generation capabilities! For more advanced features, refer to the [Llama Stack Documentation](https://llama-stack-docs.com).