forked from phoenix-oss/llama-stack-mirror
Add documentations for building applications and with some content for agentic loop
This commit is contained in:
parent
a29013112f
commit
1274fa4c0d
5 changed files with 424 additions and 16 deletions
|
@ -19,16 +19,17 @@ export LLAMA_STACK_PORT=5001
|
|||
ollama run $OLLAMA_INFERENCE_MODEL --keepalive 60m
|
||||
```
|
||||
|
||||
By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to enspagents/agenure the model remains loaded for sometime.
|
||||
By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to ensure the model remains loaded for sometime.
|
||||
|
||||
|
||||
### 2. Start the Llama Stack server
|
||||
|
||||
Llama Stack is based on a client-server architecture. It consists of a server which can be configured very flexibly so you can mix-and-match various providers for its individual API components -- beyond Inference, these include Memory, Agents, Telemetry, Evals and so forth.
|
||||
|
||||
To get started quickly, we provide various Docker images for the server component that work with different inference providers out of the box. For this guide, we will use `llamastack/distribution-ollama` as the Docker image.
|
||||
|
||||
```bash
|
||||
docker run \
|
||||
-it \
|
||||
docker run -it \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-v ~/.llama:/root/.llama \
|
||||
llamastack/distribution-ollama \
|
||||
|
@ -42,8 +43,7 @@ Configuration for this is available at `distributions/ollama/run.yaml`.
|
|||
|
||||
### 3. Use the Llama Stack client SDK
|
||||
|
||||
You can interact with the Llama Stack server using the `llama-stack-client` CLI or via the Python SDK.
|
||||
|
||||
You can interact with the Llama Stack server using various client SDKs. We will use the Python SDK which you can install using:
|
||||
```bash
|
||||
pip install llama-stack-client
|
||||
```
|
||||
|
@ -123,7 +123,6 @@ async def run_main():
|
|||
|
||||
agent = Agent(client, agent_config)
|
||||
session_id = agent.create_session("test-session")
|
||||
print(f"Created session_id={session_id} for Agent({agent.agent_id})")
|
||||
user_prompts = [
|
||||
(
|
||||
"I am attaching documentation for Torchtune. Help me answer questions I will ask next.",
|
||||
|
@ -154,3 +153,10 @@ if __name__ == "__main__":
|
|||
- Learn how to [Build Llama Stacks](../distributions/index.md)
|
||||
- See [References](../references/index.md) for more details about the llama CLI and Python SDK
|
||||
- For example applications and more detailed tutorials, visit our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository.
|
||||
|
||||
|
||||
## Thinking out aloud here in terms of what to write in the docs
|
||||
|
||||
- how to get a llama stack server running
|
||||
- what are all the different client sdks
|
||||
- what are the components of building agents
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue