forked from phoenix-oss/llama-stack-mirror
		
	Add documentations for building applications and with some content for agentic loop
This commit is contained in:
		
							parent
							
								
									a29013112f
								
							
						
					
					
						commit
						1274fa4c0d
					
				
					 5 changed files with 424 additions and 16 deletions
				
			
		|  | @ -19,16 +19,17 @@ export LLAMA_STACK_PORT=5001 | |||
| ollama run $OLLAMA_INFERENCE_MODEL --keepalive 60m | ||||
| ``` | ||||
| 
 | ||||
| By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to enspagents/agenure the model remains loaded for sometime. | ||||
| By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to ensure the model remains loaded for sometime. | ||||
| 
 | ||||
| 
 | ||||
| ### 2. Start the Llama Stack server | ||||
| 
 | ||||
| Llama Stack is based on a client-server architecture. It consists of a server which can be configured very flexibly so you can mix-and-match various providers for its individual API components -- beyond Inference, these include Memory, Agents, Telemetry, Evals and so forth. | ||||
| 
 | ||||
| To get started quickly, we provide various Docker images for the server component that work with different inference providers out of the box. For this guide, we will use `llamastack/distribution-ollama` as the Docker image. | ||||
| 
 | ||||
| ```bash | ||||
| docker run \ | ||||
|   -it \ | ||||
| docker run -it \ | ||||
|   -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ | ||||
|   -v ~/.llama:/root/.llama \ | ||||
|   llamastack/distribution-ollama \ | ||||
|  | @ -42,8 +43,7 @@ Configuration for this is available at `distributions/ollama/run.yaml`. | |||
| 
 | ||||
| ### 3. Use the Llama Stack client SDK | ||||
| 
 | ||||
| You can interact with the Llama Stack server using the `llama-stack-client` CLI or via the Python SDK. | ||||
| 
 | ||||
| You can interact with the Llama Stack server using various client SDKs. We will use the Python SDK which you can install using: | ||||
| ```bash | ||||
| pip install llama-stack-client | ||||
| ``` | ||||
|  | @ -123,7 +123,6 @@ async def run_main(): | |||
| 
 | ||||
|     agent = Agent(client, agent_config) | ||||
|     session_id = agent.create_session("test-session") | ||||
|     print(f"Created session_id={session_id} for Agent({agent.agent_id})") | ||||
|     user_prompts = [ | ||||
|         ( | ||||
|             "I am attaching documentation for Torchtune. Help me answer questions I will ask next.", | ||||
|  | @ -154,3 +153,10 @@ if __name__ == "__main__": | |||
| - Learn how to [Build Llama Stacks](../distributions/index.md) | ||||
| - See [References](../references/index.md) for more details about the llama CLI and Python SDK | ||||
| - For example applications and more detailed tutorials, visit our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository. | ||||
| 
 | ||||
| 
 | ||||
| ## Thinking out aloud here in terms of what to write in the docs | ||||
| 
 | ||||
| - how to get a llama stack server running | ||||
| - what are all the different client sdks | ||||
| - what are the components of building agents | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue