mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-06-28 02:53:30 +00:00
Add documentations for building applications and with some content for agentic loop
This commit is contained in:
parent
a29013112f
commit
1274fa4c0d
5 changed files with 424 additions and 16 deletions
|
@ -9,3 +9,4 @@ sphinx-tabs
|
||||||
sphinx-design
|
sphinx-design
|
||||||
sphinxcontrib-openapi
|
sphinxcontrib-openapi
|
||||||
sphinxcontrib-redoc
|
sphinxcontrib-redoc
|
||||||
|
sphinxcontrib-mermaid
|
||||||
|
|
|
@ -1,17 +1,413 @@
|
||||||
# Building Applications
|
# Building AI Applications
|
||||||
|
|
||||||
```{admonition} Work in Progress
|
Llama Stack provides all the building blocks needed to create sophisticated AI applications. This guide will walk you through how to use these components effectively.
|
||||||
:class: warning
|
|
||||||
|
|
||||||
## What can you do with the Stack?
|
## Basic Inference
|
||||||
|
|
||||||
- Agents
|
The foundation of any AI application is the ability to interact with LLM models. Llama Stack provides a simple interface for both completion and chat-based inference:
|
||||||
- what is a turn? session?
|
|
||||||
- inference
|
```python
|
||||||
- memory / RAG; pre-ingesting content or attaching content in a turn
|
from llama_stack_client import LlamaStackClient
|
||||||
- how does tool calling work
|
|
||||||
- can you do evaluation?
|
client = LlamaStackClient(base_url="http://localhost:5001")
|
||||||
|
|
||||||
|
# List available models
|
||||||
|
models = client.models.list()
|
||||||
|
|
||||||
|
# Simple chat completion
|
||||||
|
response = client.inference.chat_completion(
|
||||||
|
model_id="Llama3.2-3B-Instruct",
|
||||||
|
messages=[
|
||||||
|
{"role": "system", "content": "You are a helpful assistant."},
|
||||||
|
{"role": "user", "content": "Write a haiku about coding"}
|
||||||
|
]
|
||||||
|
)
|
||||||
|
print(response.completion_message.content)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Adding Memory & RAG
|
||||||
|
|
||||||
|
Memory enables your applications to reference and recall information from previous interactions or external documents. Llama Stack's memory system is built around the concept of Memory Banks:
|
||||||
|
|
||||||
|
1. **Vector Memory Banks**: For semantic search and retrieval
|
||||||
|
2. **Key-Value Memory Banks**: For structured data storage
|
||||||
|
3. **Keyword Memory Banks**: For basic text search
|
||||||
|
4. **Graph Memory Banks**: For relationship-based retrieval
|
||||||
|
|
||||||
|
Here's how to set up a vector memory bank for RAG:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Register a memory bank
|
||||||
|
bank_id = "my_documents"
|
||||||
|
response = client.memory_banks.register(
|
||||||
|
memory_bank_id=bank_id,
|
||||||
|
params={
|
||||||
|
"memory_bank_type": "vector",
|
||||||
|
"embedding_model": "all-MiniLM-L6-v2",
|
||||||
|
"chunk_size_in_tokens": 512
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Insert documents
|
||||||
|
documents = [
|
||||||
|
{
|
||||||
|
"document_id": "doc1",
|
||||||
|
"content": "Your document text here",
|
||||||
|
"mime_type": "text/plain"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
client.memory.insert(bank_id, documents)
|
||||||
|
|
||||||
|
# Query documents
|
||||||
|
results = client.memory.query(
|
||||||
|
bank_id=bank_id,
|
||||||
|
query="What do you know about...",
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Implementing Safety Guardrails
|
||||||
|
|
||||||
|
Safety is a critical component of any AI application. Llama Stack provides a Shield system that can be applied at multiple touchpoints:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Register a safety shield
|
||||||
|
shield_id = "content_safety"
|
||||||
|
client.shields.register(
|
||||||
|
shield_id=shield_id,
|
||||||
|
provider_shield_id="llama-guard-basic"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Run content through shield
|
||||||
|
response = client.safety.run_shield(
|
||||||
|
shield_id=shield_id,
|
||||||
|
messages=[{"role": "user", "content": "User message here"}]
|
||||||
|
)
|
||||||
|
|
||||||
|
if response.violation:
|
||||||
|
print(f"Safety violation detected: {response.violation.user_message}")
|
||||||
|
```
|
||||||
|
|
||||||
|
## Building Agents
|
||||||
|
|
||||||
|
Agents are the heart of complex AI applications. They combine inference, memory, safety, and tool usage into coherent workflows. At its core, an agent follows a sophisticated execution loop that enables multi-step reasoning, tool usage, and safety checks.
|
||||||
|
|
||||||
|
### The Agent Execution Loop
|
||||||
|
|
||||||
|
Each agent turn follows these key steps:
|
||||||
|
|
||||||
|
1. **Initial Safety Check**: The user's input is first screened through configured safety shields
|
||||||
|
|
||||||
|
2. **Context Retrieval**:
|
||||||
|
- If RAG is enabled, the agent queries relevant documents from memory banks
|
||||||
|
- For new documents, they are first inserted into the memory bank
|
||||||
|
- Retrieved context is augmented to the user's prompt
|
||||||
|
|
||||||
|
3. **Inference Loop**: The agent enters its main execution loop:
|
||||||
|
- The LLM receives the augmented prompt (with context and/or previous tool outputs)
|
||||||
|
- The LLM generates a response, potentially with tool calls
|
||||||
|
- If tool calls are present:
|
||||||
|
- Tool inputs are safety-checked
|
||||||
|
- Tools are executed (e.g., web search, code execution)
|
||||||
|
- Tool responses are fed back to the LLM for synthesis
|
||||||
|
- The loop continues until:
|
||||||
|
- The LLM provides a final response without tool calls
|
||||||
|
- Maximum iterations are reached
|
||||||
|
- Token limit is exceeded
|
||||||
|
|
||||||
|
4. **Final Safety Check**: The agent's final response is screened through safety shields
|
||||||
|
|
||||||
|
```{mermaid}
|
||||||
|
sequenceDiagram
|
||||||
|
participant U as User
|
||||||
|
participant E as Executor
|
||||||
|
participant M as Memory Bank
|
||||||
|
participant L as LLM
|
||||||
|
participant T as Tools
|
||||||
|
participant S as Safety Shield
|
||||||
|
|
||||||
|
Note over U,S: Agent Turn Start
|
||||||
|
U->>S: 1. Submit Prompt
|
||||||
|
activate S
|
||||||
|
S->>E: Input Safety Check
|
||||||
|
deactivate S
|
||||||
|
|
||||||
|
E->>M: 2.1 Query Context
|
||||||
|
M-->>E: 2.2 Retrieved Documents
|
||||||
|
|
||||||
|
loop Inference Loop
|
||||||
|
E->>L: 3.1 Augment with Context
|
||||||
|
L-->>E: 3.2 Response (with/without tool calls)
|
||||||
|
|
||||||
|
alt Has Tool Calls
|
||||||
|
E->>S: Check Tool Input
|
||||||
|
S->>T: 4.1 Execute Tool
|
||||||
|
T-->>E: 4.2 Tool Response
|
||||||
|
E->>L: 5.1 Tool Response
|
||||||
|
L-->>E: 5.2 Synthesized Response
|
||||||
|
end
|
||||||
|
|
||||||
|
opt Stop Conditions
|
||||||
|
Note over E: Break if:
|
||||||
|
Note over E: - No tool calls
|
||||||
|
Note over E: - Max iterations reached
|
||||||
|
Note over E: - Token limit exceeded
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
E->>S: Output Safety Check
|
||||||
|
S->>U: 6. Final Response
|
||||||
|
```
|
||||||
|
|
||||||
|
Each step in this process can be monitored and controlled through configurations. Here's an example that demonstrates monitoring the agent's execution:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from llama_stack_client.lib.agents.event_logger import EventLogger
|
||||||
|
|
||||||
|
agent_config = AgentConfig(
|
||||||
|
model="Llama3.2-3B-Instruct",
|
||||||
|
instructions="You are a helpful assistant",
|
||||||
|
# Enable both RAG and tool usage
|
||||||
|
tools=[
|
||||||
|
{
|
||||||
|
"type": "memory",
|
||||||
|
"memory_bank_configs": [{
|
||||||
|
"type": "vector",
|
||||||
|
"bank_id": "my_docs"
|
||||||
|
}],
|
||||||
|
"max_tokens_in_context": 4096
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "code_interpreter",
|
||||||
|
"enable_inline_code_execution": True
|
||||||
|
}
|
||||||
|
],
|
||||||
|
# Configure safety
|
||||||
|
input_shields=["content_safety"],
|
||||||
|
output_shields=["content_safety"],
|
||||||
|
# Control the inference loop
|
||||||
|
max_infer_iters=5,
|
||||||
|
sampling_params={
|
||||||
|
"temperature": 0.7,
|
||||||
|
"max_tokens": 2048
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
agent = Agent(client, agent_config)
|
||||||
|
session_id = agent.create_session("monitored_session")
|
||||||
|
|
||||||
|
# Stream the agent's execution steps
|
||||||
|
response = agent.create_turn(
|
||||||
|
messages=[{"role": "user", "content": "Analyze this code and run it"}],
|
||||||
|
attachments=[{
|
||||||
|
"content": "https://raw.githubusercontent.com/example/code.py",
|
||||||
|
"mime_type": "text/plain"
|
||||||
|
}],
|
||||||
|
session_id=session_id
|
||||||
|
)
|
||||||
|
|
||||||
|
# Monitor each step of execution
|
||||||
|
for log in EventLogger().log(response):
|
||||||
|
if log.event.step_type == "memory_retrieval":
|
||||||
|
print("Retrieved context:", log.event.retrieved_context)
|
||||||
|
elif log.event.step_type == "inference":
|
||||||
|
print("LLM output:", log.event.model_response)
|
||||||
|
elif log.event.step_type == "tool_execution":
|
||||||
|
print("Tool call:", log.event.tool_call)
|
||||||
|
print("Tool response:", log.event.tool_response)
|
||||||
|
elif log.event.step_type == "shield_call":
|
||||||
|
if log.event.violation:
|
||||||
|
print("Safety violation:", log.event.violation)
|
||||||
|
```
|
||||||
|
|
||||||
|
This example shows how an agent can: Llama Stack provides a high-level agent framework:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from llama_stack_client.lib.agents.agent import Agent
|
||||||
|
from llama_stack_client.types.agent_create_params import AgentConfig
|
||||||
|
|
||||||
|
# Configure an agent
|
||||||
|
agent_config = AgentConfig(
|
||||||
|
model="Llama3.2-3B-Instruct",
|
||||||
|
instructions="You are a helpful assistant",
|
||||||
|
tools=[
|
||||||
|
{
|
||||||
|
"type": "memory",
|
||||||
|
"memory_bank_configs": [],
|
||||||
|
"query_generator_config": {
|
||||||
|
"type": "default",
|
||||||
|
"sep": " "
|
||||||
|
}
|
||||||
|
}
|
||||||
|
],
|
||||||
|
input_shields=["content_safety"],
|
||||||
|
output_shields=["content_safety"],
|
||||||
|
enable_session_persistence=True
|
||||||
|
)
|
||||||
|
|
||||||
|
# Create an agent
|
||||||
|
agent = Agent(client, agent_config)
|
||||||
|
session_id = agent.create_session("my_session")
|
||||||
|
|
||||||
|
# Run agent turns
|
||||||
|
response = agent.create_turn(
|
||||||
|
messages=[{"role": "user", "content": "Your question here"}],
|
||||||
|
session_id=session_id
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adding Tools to Agents
|
||||||
|
|
||||||
|
Agents can be enhanced with various tools:
|
||||||
|
|
||||||
|
1. **Search**: Web search capabilities through providers like Brave
|
||||||
|
2. **Code Interpreter**: Execute code snippets
|
||||||
|
3. **RAG**: Memory and document retrieval
|
||||||
|
4. **Function Calling**: Custom function execution
|
||||||
|
5. **WolframAlpha**: Mathematical computations
|
||||||
|
6. **Photogen**: Image generation
|
||||||
|
|
||||||
|
Example of configuring an agent with tools:
|
||||||
|
|
||||||
|
```python
|
||||||
|
agent_config = AgentConfig(
|
||||||
|
model="Llama3.2-3B-Instruct",
|
||||||
|
tools=[
|
||||||
|
{
|
||||||
|
"type": "brave_search",
|
||||||
|
"api_key": "YOUR_API_KEY",
|
||||||
|
"engine": "brave"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"type": "code_interpreter",
|
||||||
|
"enable_inline_code_execution": True
|
||||||
|
}
|
||||||
|
],
|
||||||
|
tool_choice="auto",
|
||||||
|
tool_prompt_format="json"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Building RAG-Enhanced Agents
|
||||||
|
|
||||||
|
One of the most powerful patterns is combining agents with RAG capabilities. Here's a complete example:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from llama_stack_client.types import Attachment
|
||||||
|
|
||||||
|
# Create attachments from documents
|
||||||
|
attachments = [
|
||||||
|
Attachment(
|
||||||
|
content="https://raw.githubusercontent.com/example/doc.rst",
|
||||||
|
mime_type="text/plain"
|
||||||
|
)
|
||||||
|
]
|
||||||
|
|
||||||
|
# Configure agent with memory
|
||||||
|
agent_config = AgentConfig(
|
||||||
|
model="Llama3.2-3B-Instruct",
|
||||||
|
instructions="You are a helpful assistant",
|
||||||
|
tools=[{
|
||||||
|
"type": "memory",
|
||||||
|
"memory_bank_configs": [],
|
||||||
|
"query_generator_config": {"type": "default", "sep": " "},
|
||||||
|
"max_tokens_in_context": 4096,
|
||||||
|
"max_chunks": 10
|
||||||
|
}],
|
||||||
|
enable_session_persistence=True
|
||||||
|
)
|
||||||
|
|
||||||
|
agent = Agent(client, agent_config)
|
||||||
|
session_id = agent.create_session("rag_session")
|
||||||
|
|
||||||
|
# Initial document ingestion
|
||||||
|
response = agent.create_turn(
|
||||||
|
messages=[{
|
||||||
|
"role": "user",
|
||||||
|
"content": "I am providing some documents for reference."
|
||||||
|
}],
|
||||||
|
attachments=attachments,
|
||||||
|
session_id=session_id
|
||||||
|
)
|
||||||
|
|
||||||
|
# Query with RAG
|
||||||
|
response = agent.create_turn(
|
||||||
|
messages=[{
|
||||||
|
"role": "user",
|
||||||
|
"content": "What are the key topics in the documents?"
|
||||||
|
}],
|
||||||
|
session_id=session_id
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing & Evaluation
|
||||||
|
|
||||||
|
Llama Stack provides built-in tools for evaluating your applications:
|
||||||
|
|
||||||
|
1. **Benchmarking**: Test against standard datasets
|
||||||
|
2. **Application Evaluation**: Score your application's outputs
|
||||||
|
3. **Custom Metrics**: Define your own evaluation criteria
|
||||||
|
|
||||||
|
Here's how to set up basic evaluation:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Create an evaluation task
|
||||||
|
response = client.eval_tasks.register(
|
||||||
|
eval_task_id="my_eval",
|
||||||
|
dataset_id="my_dataset",
|
||||||
|
scoring_functions=["accuracy", "relevance"]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Run evaluation
|
||||||
|
job = client.eval.run_eval(
|
||||||
|
task_id="my_eval",
|
||||||
|
task_config={
|
||||||
|
"type": "app",
|
||||||
|
"eval_candidate": {
|
||||||
|
"type": "agent",
|
||||||
|
"config": agent_config
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Get results
|
||||||
|
result = client.eval.job_result(
|
||||||
|
task_id="my_eval",
|
||||||
|
job_id=job.job_id
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Debugging & Monitoring
|
||||||
|
|
||||||
|
Llama Stack includes comprehensive telemetry for debugging and monitoring your applications:
|
||||||
|
|
||||||
|
1. **Tracing**: Track request flows across components
|
||||||
|
2. **Metrics**: Measure performance and usage
|
||||||
|
3. **Logging**: Debug issues and track behavior
|
||||||
|
|
||||||
|
The telemetry system supports multiple output formats:
|
||||||
|
|
||||||
|
- OpenTelemetry for visualization in tools like Jaeger
|
||||||
|
- SQLite for local storage and querying
|
||||||
|
- Console output for development
|
||||||
|
|
||||||
|
Example of querying traces:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Query traces for a session
|
||||||
|
traces = client.telemetry.query_traces(
|
||||||
|
attribute_filters=[{
|
||||||
|
"key": "session_id",
|
||||||
|
"op": "eq",
|
||||||
|
"value": session_id
|
||||||
|
}]
|
||||||
|
)
|
||||||
|
|
||||||
|
# Get detailed span information
|
||||||
|
span_tree = client.telemetry.get_span_tree(
|
||||||
|
span_id=traces[0].root_span_id
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
For details on how to use the telemetry system to debug your applications, export traces to a dataset, and run evaluations, see the [Telemetry](telemetry) section.
|
For details on how to use the telemetry system to debug your applications, export traces to a dataset, and run evaluations, see the [Telemetry](telemetry) section.
|
||||||
|
|
||||||
```{toctree}
|
```{toctree}
|
||||||
|
|
|
@ -28,6 +28,7 @@ extensions = [
|
||||||
"sphinx_tabs.tabs",
|
"sphinx_tabs.tabs",
|
||||||
"sphinx_design",
|
"sphinx_design",
|
||||||
"sphinxcontrib.redoc",
|
"sphinxcontrib.redoc",
|
||||||
|
"sphinxcontrib.mermaid",
|
||||||
]
|
]
|
||||||
myst_enable_extensions = ["colon_fence"]
|
myst_enable_extensions = ["colon_fence"]
|
||||||
|
|
||||||
|
@ -47,6 +48,7 @@ exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
|
||||||
myst_enable_extensions = [
|
myst_enable_extensions = [
|
||||||
"amsmath",
|
"amsmath",
|
||||||
"attrs_inline",
|
"attrs_inline",
|
||||||
|
"attrs_block",
|
||||||
"colon_fence",
|
"colon_fence",
|
||||||
"deflist",
|
"deflist",
|
||||||
"dollarmath",
|
"dollarmath",
|
||||||
|
@ -65,6 +67,7 @@ myst_substitutions = {
|
||||||
"docker_hub": "https://hub.docker.com/repository/docker/llamastack",
|
"docker_hub": "https://hub.docker.com/repository/docker/llamastack",
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
# Copy button settings
|
# Copy button settings
|
||||||
copybutton_prompt_text = "$ " # for bash prompts
|
copybutton_prompt_text = "$ " # for bash prompts
|
||||||
copybutton_prompt_is_regexp = True
|
copybutton_prompt_is_regexp = True
|
||||||
|
|
|
@ -81,6 +81,8 @@ A few things to note:
|
||||||
- The configuration dictionary is provider-specific. Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server (via docker or via `llama stack run`), you can specify `--env OLLAMA_URL=http://my-server:11434` to override the default value.
|
- The configuration dictionary is provider-specific. Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server (via docker or via `llama stack run`), you can specify `--env OLLAMA_URL=http://my-server:11434` to override the default value.
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
```
|
||||||
|
|
||||||
Finally, let's look at the `models` section:
|
Finally, let's look at the `models` section:
|
||||||
```yaml
|
```yaml
|
||||||
models:
|
models:
|
||||||
|
|
|
@ -19,16 +19,17 @@ export LLAMA_STACK_PORT=5001
|
||||||
ollama run $OLLAMA_INFERENCE_MODEL --keepalive 60m
|
ollama run $OLLAMA_INFERENCE_MODEL --keepalive 60m
|
||||||
```
|
```
|
||||||
|
|
||||||
By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to enspagents/agenure the model remains loaded for sometime.
|
By default, Ollama keeps the model loaded in memory for 5 minutes which can be too short. We set the `--keepalive` flag to 60 minutes to ensure the model remains loaded for sometime.
|
||||||
|
|
||||||
|
|
||||||
### 2. Start the Llama Stack server
|
### 2. Start the Llama Stack server
|
||||||
|
|
||||||
Llama Stack is based on a client-server architecture. It consists of a server which can be configured very flexibly so you can mix-and-match various providers for its individual API components -- beyond Inference, these include Memory, Agents, Telemetry, Evals and so forth.
|
Llama Stack is based on a client-server architecture. It consists of a server which can be configured very flexibly so you can mix-and-match various providers for its individual API components -- beyond Inference, these include Memory, Agents, Telemetry, Evals and so forth.
|
||||||
|
|
||||||
|
To get started quickly, we provide various Docker images for the server component that work with different inference providers out of the box. For this guide, we will use `llamastack/distribution-ollama` as the Docker image.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker run \
|
docker run -it \
|
||||||
-it \
|
|
||||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||||
-v ~/.llama:/root/.llama \
|
-v ~/.llama:/root/.llama \
|
||||||
llamastack/distribution-ollama \
|
llamastack/distribution-ollama \
|
||||||
|
@ -42,8 +43,7 @@ Configuration for this is available at `distributions/ollama/run.yaml`.
|
||||||
|
|
||||||
### 3. Use the Llama Stack client SDK
|
### 3. Use the Llama Stack client SDK
|
||||||
|
|
||||||
You can interact with the Llama Stack server using the `llama-stack-client` CLI or via the Python SDK.
|
You can interact with the Llama Stack server using various client SDKs. We will use the Python SDK which you can install using:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install llama-stack-client
|
pip install llama-stack-client
|
||||||
```
|
```
|
||||||
|
@ -123,7 +123,6 @@ async def run_main():
|
||||||
|
|
||||||
agent = Agent(client, agent_config)
|
agent = Agent(client, agent_config)
|
||||||
session_id = agent.create_session("test-session")
|
session_id = agent.create_session("test-session")
|
||||||
print(f"Created session_id={session_id} for Agent({agent.agent_id})")
|
|
||||||
user_prompts = [
|
user_prompts = [
|
||||||
(
|
(
|
||||||
"I am attaching documentation for Torchtune. Help me answer questions I will ask next.",
|
"I am attaching documentation for Torchtune. Help me answer questions I will ask next.",
|
||||||
|
@ -154,3 +153,10 @@ if __name__ == "__main__":
|
||||||
- Learn how to [Build Llama Stacks](../distributions/index.md)
|
- Learn how to [Build Llama Stacks](../distributions/index.md)
|
||||||
- See [References](../references/index.md) for more details about the llama CLI and Python SDK
|
- See [References](../references/index.md) for more details about the llama CLI and Python SDK
|
||||||
- For example applications and more detailed tutorials, visit our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository.
|
- For example applications and more detailed tutorials, visit our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository.
|
||||||
|
|
||||||
|
|
||||||
|
## Thinking out aloud here in terms of what to write in the docs
|
||||||
|
|
||||||
|
- how to get a llama stack server running
|
||||||
|
- what are all the different client sdks
|
||||||
|
- what are the components of building agents
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue