mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
Merge branch 'main' into add-mongodb-vector_io
This commit is contained in:
commit
5e9d28f0b4
1791 changed files with 125464 additions and 386541 deletions
|
|
@ -35,9 +35,6 @@ Here are the key topics that will help you build effective AI applications:
|
|||
- **[Telemetry](./telemetry.mdx)** - Monitor and analyze your agents' performance and behavior
|
||||
- **[Safety](./safety.mdx)** - Implement guardrails and safety measures to ensure responsible AI behavior
|
||||
|
||||
### 🎮 **Interactive Development**
|
||||
- **[Playground](./playground.mdx)** - Interactive environment for testing and developing applications
|
||||
|
||||
## Application Patterns
|
||||
|
||||
### 🤖 **Conversational Agents**
|
||||
|
|
|
|||
|
|
@ -1,298 +0,0 @@
|
|||
---
|
||||
title: Llama Stack Playground
|
||||
description: Interactive interface to explore and experiment with Llama Stack capabilities
|
||||
sidebar_label: Playground
|
||||
sidebar_position: 10
|
||||
---
|
||||
|
||||
import Tabs from '@theme/Tabs';
|
||||
import TabItem from '@theme/TabItem';
|
||||
|
||||
# Llama Stack Playground
|
||||
|
||||
:::note[Experimental Feature]
|
||||
The Llama Stack Playground is currently experimental and subject to change. We welcome feedback and contributions to help improve it.
|
||||
:::
|
||||
|
||||
The Llama Stack Playground is a simple interface that aims to:
|
||||
- **Showcase capabilities and concepts** of Llama Stack in an interactive environment
|
||||
- **Demo end-to-end application code** to help users get started building their own applications
|
||||
- **Provide a UI** to help users inspect and understand Llama Stack API providers and resources
|
||||
|
||||
## Key Features
|
||||
|
||||
### Interactive Playground Pages
|
||||
|
||||
The playground provides interactive pages for users to explore Llama Stack API capabilities:
|
||||
|
||||
#### Chatbot Interface
|
||||
|
||||
<video
|
||||
controls
|
||||
autoPlay
|
||||
playsInline
|
||||
muted
|
||||
loop
|
||||
style={{width: '100%'}}
|
||||
>
|
||||
<source src="https://github.com/user-attachments/assets/8d2ef802-5812-4a28-96e1-316038c84cbf" type="video/mp4" />
|
||||
Your browser does not support the video tag.
|
||||
</video>
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="chat" label="Chat">
|
||||
|
||||
**Simple Chat Interface**
|
||||
- Chat directly with Llama models through an intuitive interface
|
||||
- Uses the `/chat/completions` streaming API under the hood
|
||||
- Real-time message streaming for responsive interactions
|
||||
- Perfect for testing model capabilities and prompt engineering
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="rag" label="RAG Chat">
|
||||
|
||||
**Document-Aware Conversations**
|
||||
- Upload documents to create memory banks
|
||||
- Chat with a RAG-enabled agent that can query your documents
|
||||
- Uses Llama Stack's `/agents` API to create and manage RAG sessions
|
||||
- Ideal for exploring knowledge-enhanced AI applications
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
#### Evaluation Interface
|
||||
|
||||
<video
|
||||
controls
|
||||
autoPlay
|
||||
playsInline
|
||||
muted
|
||||
loop
|
||||
style={{width: '100%'}}
|
||||
>
|
||||
<source src="https://github.com/user-attachments/assets/6cc1659f-eba4-49ca-a0a5-7c243557b4f5" type="video/mp4" />
|
||||
Your browser does not support the video tag.
|
||||
</video>
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="scoring" label="Scoring Evaluations">
|
||||
|
||||
**Custom Dataset Evaluation**
|
||||
- Upload your own evaluation datasets
|
||||
- Run evaluations using available scoring functions
|
||||
- Uses Llama Stack's `/scoring` API for flexible evaluation workflows
|
||||
- Great for testing application performance on custom metrics
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="benchmarks" label="Benchmark Evaluations">
|
||||
|
||||
<video
|
||||
controls
|
||||
autoPlay
|
||||
playsInline
|
||||
muted
|
||||
loop
|
||||
style={{width: '100%', marginBottom: '1rem'}}
|
||||
>
|
||||
<source src="https://github.com/user-attachments/assets/345845c7-2a2b-4095-960a-9ae40f6a93cf" type="video/mp4" />
|
||||
Your browser does not support the video tag.
|
||||
</video>
|
||||
|
||||
**Pre-registered Evaluation Tasks**
|
||||
- Evaluate models or agents on pre-defined tasks
|
||||
- Uses Llama Stack's `/eval` API for comprehensive evaluation
|
||||
- Combines datasets and scoring functions for standardized testing
|
||||
|
||||
**Setup Requirements:**
|
||||
Register evaluation datasets and benchmarks first:
|
||||
|
||||
```bash
|
||||
# Register evaluation dataset
|
||||
llama-stack-client datasets register \
|
||||
--dataset-id "mmlu" \
|
||||
--provider-id "huggingface" \
|
||||
--url "https://huggingface.co/datasets/llamastack/evals" \
|
||||
--metadata '{"path": "llamastack/evals", "name": "evals__mmlu__details", "split": "train"}' \
|
||||
--schema '{"input_query": {"type": "string"}, "expected_answer": {"type": "string"}, "chat_completion_input": {"type": "string"}}'
|
||||
|
||||
# Register benchmark task
|
||||
llama-stack-client benchmarks register \
|
||||
--eval-task-id meta-reference-mmlu \
|
||||
--provider-id meta-reference \
|
||||
--dataset-id mmlu \
|
||||
--scoring-functions basic::regex_parser_multiple_choice_answer
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
#### Inspection Interface
|
||||
|
||||
<video
|
||||
controls
|
||||
autoPlay
|
||||
playsInline
|
||||
muted
|
||||
loop
|
||||
style={{width: '100%'}}
|
||||
>
|
||||
<source src="https://github.com/user-attachments/assets/01d52b2d-92af-4e3a-b623-a9b8ba22ba99" type="video/mp4" />
|
||||
Your browser does not support the video tag.
|
||||
</video>
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="providers" label="API Providers">
|
||||
|
||||
**Provider Management**
|
||||
- Inspect available Llama Stack API providers
|
||||
- View provider configurations and capabilities
|
||||
- Uses the `/providers` API for real-time provider information
|
||||
- Essential for understanding your deployment's capabilities
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="resources" label="API Resources">
|
||||
|
||||
**Resource Exploration**
|
||||
- Inspect Llama Stack API resources including:
|
||||
- **Models**: Available language models
|
||||
- **Datasets**: Registered evaluation datasets
|
||||
- **Memory Banks**: Vector databases and knowledge stores
|
||||
- **Benchmarks**: Evaluation tasks and scoring functions
|
||||
- **Shields**: Safety and content moderation tools
|
||||
- Uses `/<resources>/list` APIs for comprehensive resource visibility
|
||||
- For detailed information about resources, see [Core Concepts](/docs/concepts)
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Quick Start Guide
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="setup" label="Setup">
|
||||
|
||||
**1. Start the Llama Stack API Server**
|
||||
|
||||
```bash
|
||||
llama stack list-deps together | xargs -L1 uv pip install
|
||||
llama stack run together
|
||||
```
|
||||
|
||||
**2. Start the Streamlit UI**
|
||||
|
||||
```bash
|
||||
# Launch the playground interface
|
||||
uv run --with ".[ui]" streamlit run llama_stack.core/ui/app.py
|
||||
```
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="usage" label="Usage Tips">
|
||||
|
||||
**Making the Most of the Playground:**
|
||||
|
||||
- **Start with Chat**: Test basic model interactions and prompt engineering
|
||||
- **Explore RAG**: Upload sample documents to see knowledge-enhanced responses
|
||||
- **Try Evaluations**: Use the scoring interface to understand evaluation metrics
|
||||
- **Inspect Resources**: Check what providers and resources are available
|
||||
- **Experiment with Settings**: Adjust parameters to see how they affect results
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
### Available Distributions
|
||||
|
||||
The playground works with any Llama Stack distribution. Popular options include:
|
||||
|
||||
<Tabs>
|
||||
<TabItem value="together" label="Together AI">
|
||||
|
||||
```bash
|
||||
llama stack list-deps together | xargs -L1 uv pip install
|
||||
llama stack run together
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Cloud-hosted models
|
||||
- Fast inference
|
||||
- Multiple model options
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="ollama" label="Ollama (Local)">
|
||||
|
||||
```bash
|
||||
llama stack list-deps ollama | xargs -L1 uv pip install
|
||||
llama stack run ollama
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Local model execution
|
||||
- Privacy-focused
|
||||
- No internet required
|
||||
|
||||
</TabItem>
|
||||
<TabItem value="meta-reference" label="Meta Reference">
|
||||
|
||||
```bash
|
||||
llama stack list-deps meta-reference | xargs -L1 uv pip install
|
||||
llama stack run meta-reference
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Reference implementation
|
||||
- All API features available
|
||||
- Best for development
|
||||
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
||||
## Use Cases & Examples
|
||||
|
||||
### Educational Use Cases
|
||||
- **Learning Llama Stack**: Hands-on exploration of API capabilities
|
||||
- **Prompt Engineering**: Interactive testing of different prompting strategies
|
||||
- **RAG Experimentation**: Understanding how document retrieval affects responses
|
||||
- **Evaluation Understanding**: See how different metrics evaluate model performance
|
||||
|
||||
### Development Use Cases
|
||||
- **Prototype Testing**: Quick validation of application concepts
|
||||
- **API Exploration**: Understanding available endpoints and parameters
|
||||
- **Integration Planning**: Seeing how different components work together
|
||||
- **Demo Creation**: Showcasing Llama Stack capabilities to stakeholders
|
||||
|
||||
### Research Use Cases
|
||||
- **Model Comparison**: Side-by-side testing of different models
|
||||
- **Evaluation Design**: Understanding how scoring functions work
|
||||
- **Safety Testing**: Exploring shield effectiveness with different inputs
|
||||
- **Performance Analysis**: Measuring model behavior across different scenarios
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 🚀 **Getting Started**
|
||||
- Begin with simple chat interactions to understand basic functionality
|
||||
- Gradually explore more advanced features like RAG and evaluations
|
||||
- Use the inspection tools to understand your deployment's capabilities
|
||||
|
||||
### 🔧 **Development Workflow**
|
||||
- Use the playground to prototype before writing application code
|
||||
- Test different parameter settings interactively
|
||||
- Validate evaluation approaches before implementing them programmatically
|
||||
|
||||
### 📊 **Evaluation & Testing**
|
||||
- Start with simple scoring functions before trying complex evaluations
|
||||
- Use the playground to understand evaluation results before automation
|
||||
- Test safety features with various input types
|
||||
|
||||
### 🎯 **Production Preparation**
|
||||
- Use playground insights to inform your production API usage
|
||||
- Test edge cases and error conditions interactively
|
||||
- Validate resource configurations before deployment
|
||||
|
||||
## Related Resources
|
||||
|
||||
- **[Getting Started Guide](../getting_started/quickstart)** - Complete setup and introduction
|
||||
- **[Core Concepts](/docs/concepts)** - Understanding Llama Stack fundamentals
|
||||
- **[Agents](./agent)** - Building intelligent agents
|
||||
- **[RAG (Retrieval Augmented Generation)](./rag)** - Knowledge-enhanced applications
|
||||
- **[Evaluations](./evals)** - Comprehensive evaluation framework
|
||||
- **[API Reference](/docs/api/llama-stack-specification)** - Complete API documentation
|
||||
|
|
@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem';
|
|||
|
||||
# Kubernetes Deployment Guide
|
||||
|
||||
Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers both local development with Kind and production deployment on AWS EKS.
|
||||
Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers deployment using the Kubernetes operator to manage the Llama Stack server with Kind. The vLLM inference server is deployed manually.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
|
|
@ -110,115 +110,176 @@ spec:
|
|||
EOF
|
||||
```
|
||||
|
||||
### Step 3: Configure Llama Stack
|
||||
### Step 3: Install Kubernetes Operator
|
||||
|
||||
Update your run configuration:
|
||||
|
||||
```yaml
|
||||
providers:
|
||||
inference:
|
||||
- provider_id: vllm
|
||||
provider_type: remote::vllm
|
||||
config:
|
||||
url: http://vllm-server.default.svc.cluster.local:8000/v1
|
||||
max_tokens: 4096
|
||||
api_token: fake
|
||||
```
|
||||
|
||||
Build container image:
|
||||
Install the Llama Stack Kubernetes operator to manage Llama Stack deployments:
|
||||
|
||||
```bash
|
||||
tmp_dir=$(mktemp -d) && cat >$tmp_dir/Containerfile.llama-stack-run-k8s <<EOF
|
||||
FROM distribution-myenv:dev
|
||||
RUN apt-get update && apt-get install -y git
|
||||
RUN git clone https://github.com/meta-llama/llama-stack.git /app/llama-stack-source
|
||||
ADD ./vllm-llama-stack-run-k8s.yaml /app/config.yaml
|
||||
EOF
|
||||
podman build -f $tmp_dir/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s $tmp_dir
|
||||
# Install from the latest main branch
|
||||
kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/main/release/operator.yaml
|
||||
|
||||
# Or install a specific version (e.g., v0.4.0)
|
||||
# kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/v0.4.0/release/operator.yaml
|
||||
```
|
||||
|
||||
### Step 4: Deploy Llama Stack Server
|
||||
Verify the operator is running:
|
||||
|
||||
```bash
|
||||
kubectl get pods -n llama-stack-operator-system
|
||||
```
|
||||
|
||||
For more information about the operator, see the [llama-stack-k8s-operator repository](https://github.com/llamastack/llama-stack-k8s-operator).
|
||||
|
||||
### Step 4: Deploy Llama Stack Server using Operator
|
||||
|
||||
Create a `LlamaStackDistribution` custom resource to deploy the Llama Stack server. The operator will automatically create the necessary Deployment, Service, and other resources.
|
||||
You can optionally override the default `run.yaml` using `spec.server.userConfig` with a ConfigMap (see [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec)).
|
||||
|
||||
```yaml
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
apiVersion: llamastack.io/v1alpha1
|
||||
kind: LlamaStackDistribution
|
||||
metadata:
|
||||
name: llama-pvc
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
resources:
|
||||
requests:
|
||||
storage: 1Gi
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: llama-stack-server
|
||||
name: llamastack-vllm
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: llama-stack
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: llama-stack
|
||||
spec:
|
||||
containers:
|
||||
- name: llama-stack
|
||||
image: localhost/llama-stack-run-k8s:latest
|
||||
imagePullPolicy: IfNotPresent
|
||||
command: ["llama", "stack", "run", "/app/config.yaml"]
|
||||
ports:
|
||||
- containerPort: 5000
|
||||
volumeMounts:
|
||||
- name: llama-storage
|
||||
mountPath: /root/.llama
|
||||
volumes:
|
||||
- name: llama-storage
|
||||
persistentVolumeClaim:
|
||||
claimName: llama-pvc
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: llama-stack-service
|
||||
spec:
|
||||
selector:
|
||||
app.kubernetes.io/name: llama-stack
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 5000
|
||||
targetPort: 5000
|
||||
type: ClusterIP
|
||||
server:
|
||||
distribution:
|
||||
name: starter
|
||||
containerSpec:
|
||||
port: 8321
|
||||
env:
|
||||
- name: VLLM_URL
|
||||
value: "http://vllm-server.default.svc.cluster.local:8000/v1"
|
||||
- name: VLLM_MAX_TOKENS
|
||||
value: "4096"
|
||||
- name: VLLM_API_TOKEN
|
||||
value: "fake"
|
||||
# Optional: override run.yaml from a ConfigMap using userConfig
|
||||
userConfig:
|
||||
configMap:
|
||||
name: llama-stack-config
|
||||
storage:
|
||||
size: "20Gi"
|
||||
mountPath: "/home/lls/.lls"
|
||||
EOF
|
||||
```
|
||||
|
||||
**Configuration Options:**
|
||||
|
||||
- `replicas`: Number of Llama Stack server instances to run
|
||||
- `server.distribution.name`: The distribution to use (e.g., `starter` for the starter distribution). See the [list of supported distributions](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/distributions.json) in the operator repository.
|
||||
- `server.distribution.image`: (Optional) Custom container image for non-supported distributions. Use this field when deploying a distribution that is not in the supported list. If specified, this takes precedence over `name`.
|
||||
- `server.containerSpec.port`: Port on which the Llama Stack server listens (default: 8321)
|
||||
- `server.containerSpec.env`: Environment variables to configure providers:
|
||||
- `server.userConfig`: (Optional) Override the default `run.yaml` using a ConfigMap. See [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec).
|
||||
- `server.storage.size`: Size of the persistent volume for model and data storage
|
||||
- `server.storage.mountPath`: Where to mount the storage in the container
|
||||
|
||||
**Note:** For a complete list of supported distributions, see [distributions.json](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/distributions.json) in the operator repository. To use a custom or non-supported distribution, set the `server.distribution.image` field with your container image instead of `server.distribution.name`.
|
||||
|
||||
The operator automatically creates:
|
||||
- A Deployment for the Llama Stack server
|
||||
- A Service to access the server
|
||||
- A PersistentVolumeClaim for storage
|
||||
- All necessary RBAC resources
|
||||
|
||||
|
||||
Check the status of your deployment:
|
||||
|
||||
```bash
|
||||
kubectl get llamastackdistribution
|
||||
kubectl describe llamastackdistribution llamastack-vllm
|
||||
```
|
||||
|
||||
### Step 5: Test Deployment
|
||||
|
||||
Wait for the Llama Stack server pod to be ready:
|
||||
|
||||
```bash
|
||||
# Port forward and test
|
||||
kubectl port-forward service/llama-stack-service 5000:5000
|
||||
llama-stack-client --endpoint http://localhost:5000 inference chat-completion --message "hello, what model are you?"
|
||||
# Check the status of the LlamaStackDistribution
|
||||
kubectl get llamastackdistribution llamastack-vllm
|
||||
|
||||
# Check the pods created by the operator
|
||||
kubectl get pods -l app.kubernetes.io/name=llama-stack
|
||||
|
||||
# Wait for the pod to be ready
|
||||
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=llama-stack --timeout=300s
|
||||
```
|
||||
|
||||
Get the service name created by the operator (it typically follows the pattern `<llamastackdistribution-name>-service`):
|
||||
|
||||
```bash
|
||||
# List services to find the service name
|
||||
kubectl get services | grep llamastack
|
||||
|
||||
# Port forward and test (replace SERVICE_NAME with the actual service name)
|
||||
kubectl port-forward service/llamastack-vllm-service 8321:8321
|
||||
```
|
||||
|
||||
In another terminal, test the deployment:
|
||||
|
||||
```bash
|
||||
llama-stack-client --endpoint http://localhost:8321 inference chat-completion --message "hello, what model are you?"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Check pod status:**
|
||||
### vLLM Server Issues
|
||||
|
||||
**Check vLLM pod status:**
|
||||
```bash
|
||||
kubectl get pods -l app.kubernetes.io/name=vllm
|
||||
kubectl logs -l app.kubernetes.io/name=vllm
|
||||
```
|
||||
|
||||
**Test service connectivity:**
|
||||
**Test vLLM service connectivity:**
|
||||
```bash
|
||||
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://vllm-server:8000/v1/models
|
||||
```
|
||||
|
||||
### Llama Stack Server Issues
|
||||
|
||||
**Check LlamaStackDistribution status:**
|
||||
```bash
|
||||
# Get detailed status
|
||||
kubectl describe llamastackdistribution llamastack-vllm
|
||||
|
||||
# Check for events
|
||||
kubectl get events --sort-by='.lastTimestamp' | grep llamastack-vllm
|
||||
```
|
||||
|
||||
**Check operator-managed pods:**
|
||||
```bash
|
||||
# List all pods managed by the operator
|
||||
kubectl get pods -l app.kubernetes.io/name=llama-stack
|
||||
|
||||
# Check pod logs (replace POD_NAME with actual pod name)
|
||||
kubectl logs -l app.kubernetes.io/name=llama-stack
|
||||
```
|
||||
|
||||
**Check operator status:**
|
||||
```bash
|
||||
# Verify the operator is running
|
||||
kubectl get pods -n llama-stack-operator-system
|
||||
|
||||
# Check operator logs if issues persist
|
||||
kubectl logs -n llama-stack-operator-system -l control-plane=controller-manager
|
||||
```
|
||||
|
||||
**Verify service connectivity:**
|
||||
```bash
|
||||
# Get the service endpoint
|
||||
kubectl get svc llamastack-vllm-service
|
||||
|
||||
# Test connectivity from within the cluster
|
||||
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://llamastack-vllm-service:8321/health
|
||||
```
|
||||
|
||||
## Related Resources
|
||||
|
||||
- **[Deployment Overview](/docs/deploying/)** - Overview of deployment options
|
||||
- **[Distributions](/docs/distributions)** - Understanding Llama Stack distributions
|
||||
- **[Configuration](/docs/distributions/configuration)** - Detailed configuration options
|
||||
- **[LlamaStack Operator](https://github.com/llamastack/llama-stack-k8s-operator)** - Overview of llama-stack kubernetes operator
|
||||
- **[LlamaStackDistribution](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md)** - API Spec of the llama-stack operator Custom Resource.
|
||||
|
|
|
|||
|
|
@ -11,7 +11,7 @@ If you are planning to use an external service for Inference (even Ollama or TGI
|
|||
This avoids the overhead of setting up a server.
|
||||
```bash
|
||||
# setup
|
||||
uv pip install llama-stack
|
||||
uv pip install llama-stack llama-stack-client
|
||||
llama stack list-deps starter | xargs -L1 uv pip install
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -19,3 +19,4 @@ This section provides an overview of the distributions available in Llama Stack.
|
|||
- **[Starting Llama Stack Server](./starting_llama_stack_server.mdx)** - How to run distributions
|
||||
- **[Importing as Library](./importing_as_library.mdx)** - Use distributions in your code
|
||||
- **[Configuration Reference](./configuration.mdx)** - Configuration file format details
|
||||
- **[Llama Stack UI](./llama_stack_ui.mdx)** - Web-based user interface for interacting with Llama Stack servers
|
||||
|
|
|
|||
|
|
@ -44,7 +44,7 @@ spec:
|
|||
|
||||
# Navigate to the UI directory
|
||||
echo "Navigating to UI directory..."
|
||||
cd /app/llama_stack/ui
|
||||
cd /app/llama_stack_ui
|
||||
|
||||
# Check if package.json exists
|
||||
if [ ! -f "package.json" ]; then
|
||||
|
|
|
|||
109
docs/docs/distributions/llama_stack_ui.mdx
Normal file
109
docs/docs/distributions/llama_stack_ui.mdx
Normal file
|
|
@ -0,0 +1,109 @@
|
|||
---
|
||||
title: Llama Stack UI
|
||||
description: Web-based user interface for interacting with Llama Stack servers
|
||||
sidebar_label: Llama Stack UI
|
||||
sidebar_position: 8
|
||||
---
|
||||
|
||||
# Llama Stack UI
|
||||
|
||||
The Llama Stack UI is a web-based interface for interacting with Llama Stack servers. Built with Next.js and React, it provides a visual way to work with agents, manage resources, and view logs.
|
||||
|
||||
## Features
|
||||
|
||||
- **Logs & Monitoring**: View chat completions, agent responses, and vector store activity
|
||||
- **Vector Stores**: Create and manage vector databases for RAG (Retrieval-Augmented Generation) workflows
|
||||
- **Prompt Management**: Create and manage reusable prompts
|
||||
|
||||
## Prerequisites
|
||||
|
||||
You need a running Llama Stack server. The UI is a client that connects to the Llama Stack backend.
|
||||
|
||||
If you don't have a Llama Stack server running yet, see the [Starting Llama Stack Server](../getting_started/starting_llama_stack_server.mdx) guide.
|
||||
|
||||
## Running the UI
|
||||
|
||||
### Option 1: Using npx (Recommended for Quick Start)
|
||||
|
||||
The fastest way to get started is using `npx`:
|
||||
|
||||
```bash
|
||||
npx llama-stack-ui
|
||||
```
|
||||
|
||||
This will start the UI server on `http://localhost:8322` (default port).
|
||||
|
||||
### Option 2: Using Docker
|
||||
|
||||
Run the UI in a container:
|
||||
|
||||
```bash
|
||||
docker run -p 8322:8322 llamastack/ui
|
||||
```
|
||||
|
||||
Access the UI at `http://localhost:8322`.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
The UI can be configured using the following environment variables:
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `LLAMA_STACK_BACKEND_URL` | URL of your Llama Stack server | `http://localhost:8321` |
|
||||
| `LLAMA_STACK_UI_PORT` | Port for the UI server | `8322` |
|
||||
|
||||
If the Llama Stack server is running with authentication enabled, you can configure the UI to use it by setting the following environment variables:
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `NEXTAUTH_URL` | NextAuth URL for authentication | `http://localhost:8322` |
|
||||
| `GITHUB_CLIENT_ID` | GitHub OAuth client ID (optional, for authentication) | - |
|
||||
| `GITHUB_CLIENT_SECRET` | GitHub OAuth client secret (optional, for authentication) | - |
|
||||
|
||||
### Setting Environment Variables
|
||||
|
||||
#### For npx:
|
||||
|
||||
```bash
|
||||
LLAMA_STACK_BACKEND_URL=http://localhost:8321 \
|
||||
LLAMA_STACK_UI_PORT=8080 \
|
||||
npx llama-stack-ui
|
||||
```
|
||||
|
||||
#### For Docker:
|
||||
|
||||
```bash
|
||||
docker run -p 8080:8080 \
|
||||
-e LLAMA_STACK_BACKEND_URL=http://localhost:8321 \
|
||||
-e LLAMA_STACK_UI_PORT=8080 \
|
||||
llamastack/ui
|
||||
```
|
||||
|
||||
## Using the UI
|
||||
|
||||
### Managing Resources
|
||||
|
||||
- **Vector Stores**: Create vector databases for RAG workflows, view stored documents and embeddings
|
||||
- **Prompts**: Create and manage reusable prompt templates
|
||||
- **Chat Completions**: View history of chat interactions
|
||||
- **Responses**: Browse detailed agent responses and tool calls
|
||||
|
||||
## Development
|
||||
|
||||
If you want to run the UI from source for development:
|
||||
|
||||
```bash
|
||||
# From the project root
|
||||
cd src/llama_stack_ui
|
||||
|
||||
# Install dependencies
|
||||
npm install
|
||||
|
||||
# Set environment variables
|
||||
export LLAMA_STACK_BACKEND_URL=http://localhost:8321
|
||||
|
||||
# Start the development server
|
||||
npm run dev
|
||||
```
|
||||
|
||||
The development server will start on `http://localhost:8322` with hot reloading enabled.
|
||||
143
docs/docs/distributions/remote_hosted_distro/oci.md
Normal file
143
docs/docs/distributions/remote_hosted_distro/oci.md
Normal file
|
|
@ -0,0 +1,143 @@
|
|||
---
|
||||
orphan: true
|
||||
---
|
||||
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
|
||||
# OCI Distribution
|
||||
|
||||
The `llamastack/distribution-oci` distribution consists of the following provider configurations.
|
||||
|
||||
| API | Provider(s) |
|
||||
|-----|-------------|
|
||||
| agents | `inline::meta-reference` |
|
||||
| datasetio | `remote::huggingface`, `inline::localfs` |
|
||||
| eval | `inline::meta-reference` |
|
||||
| files | `inline::localfs` |
|
||||
| inference | `remote::oci` |
|
||||
| safety | `inline::llama-guard` |
|
||||
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
|
||||
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
|
||||
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
|
||||
|
||||
|
||||
### Environment Variables
|
||||
|
||||
The following environment variables can be configured:
|
||||
|
||||
- `OCI_AUTH_TYPE`: OCI authentication type (instance_principal or config_file) (default: `instance_principal`)
|
||||
- `OCI_REGION`: OCI region (e.g., us-ashburn-1, us-chicago-1, us-phoenix-1, eu-frankfurt-1) (default: ``)
|
||||
- `OCI_COMPARTMENT_OCID`: OCI compartment ID for the Generative AI service (default: ``)
|
||||
- `OCI_CONFIG_FILE_PATH`: OCI config file path (required if OCI_AUTH_TYPE is config_file) (default: `~/.oci/config`)
|
||||
- `OCI_CLI_PROFILE`: OCI CLI profile name to use from config file (default: `DEFAULT`)
|
||||
|
||||
|
||||
## Prerequisites
|
||||
### Oracle Cloud Infrastructure Setup
|
||||
|
||||
Before using the OCI Generative AI distribution, ensure you have:
|
||||
|
||||
1. **Oracle Cloud Infrastructure Account**: Sign up at [Oracle Cloud Infrastructure](https://cloud.oracle.com/)
|
||||
2. **Generative AI Service Access**: Enable the Generative AI service in your OCI tenancy
|
||||
3. **Compartment**: Create or identify a compartment where you'll deploy Generative AI models
|
||||
4. **Authentication**: Configure authentication using either:
|
||||
- **Instance Principal** (recommended for cloud-hosted deployments)
|
||||
- **API Key** (for on-premises or development environments)
|
||||
|
||||
### Authentication Methods
|
||||
|
||||
#### Instance Principal Authentication (Recommended)
|
||||
Instance Principal authentication allows OCI resources to authenticate using the identity of the compute instance they're running on. This is the most secure method for production deployments.
|
||||
|
||||
Requirements:
|
||||
- Instance must be running in an Oracle Cloud Infrastructure compartment
|
||||
- Instance must have appropriate IAM policies to access Generative AI services
|
||||
|
||||
#### API Key Authentication
|
||||
For development or on-premises deployments, follow [this doc](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm) to learn how to create your API signing key for your config file.
|
||||
|
||||
### Required IAM Policies
|
||||
|
||||
Ensure your OCI user or instance has the following policy statements:
|
||||
|
||||
```
|
||||
Allow group <group_name> to use generative-ai-inference-endpoints in compartment <compartment_name>
|
||||
Allow group <group_name> to manage generative-ai-inference-endpoints in compartment <compartment_name>
|
||||
```
|
||||
|
||||
## Supported Services
|
||||
|
||||
### Inference: OCI Generative AI
|
||||
Oracle Cloud Infrastructure Generative AI provides access to high-performance AI models through OCI's Platform-as-a-Service offering. The service supports:
|
||||
|
||||
- **Chat Completions**: Conversational AI with context awareness
|
||||
- **Text Generation**: Complete prompts and generate text content
|
||||
|
||||
#### Available Models
|
||||
Common OCI Generative AI models include access to Meta, Cohere, OpenAI, Grok, and more models.
|
||||
|
||||
### Safety: Llama Guard
|
||||
For content safety and moderation, this distribution uses Meta's LlamaGuard model through the OCI Generative AI service to provide:
|
||||
- Content filtering and moderation
|
||||
- Policy compliance checking
|
||||
- Harmful content detection
|
||||
|
||||
### Vector Storage: Multiple Options
|
||||
The distribution supports several vector storage providers:
|
||||
- **FAISS**: Local in-memory vector search
|
||||
- **ChromaDB**: Distributed vector database
|
||||
- **PGVector**: PostgreSQL with vector extensions
|
||||
|
||||
### Additional Services
|
||||
- **Dataset I/O**: Local filesystem and Hugging Face integration
|
||||
- **Tool Runtime**: Web search (Brave, Tavily) and RAG capabilities
|
||||
- **Evaluation**: Meta reference evaluation framework
|
||||
|
||||
## Running Llama Stack with OCI
|
||||
|
||||
You can run the OCI distribution via Docker or local virtual environment.
|
||||
|
||||
### Via venv
|
||||
|
||||
If you've set up your local development environment, you can also build the image using your local virtual environment.
|
||||
|
||||
```bash
|
||||
OCI_AUTH=$OCI_AUTH_TYPE OCI_REGION=$OCI_REGION OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID llama stack run --port 8321 oci
|
||||
```
|
||||
|
||||
### Configuration Examples
|
||||
|
||||
#### Using Instance Principal (Recommended for Production)
|
||||
```bash
|
||||
export OCI_AUTH_TYPE=instance_principal
|
||||
export OCI_REGION=us-chicago-1
|
||||
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..<your-compartment-id>
|
||||
```
|
||||
|
||||
#### Using API Key Authentication (Development)
|
||||
```bash
|
||||
export OCI_AUTH_TYPE=config_file
|
||||
export OCI_CONFIG_FILE_PATH=~/.oci/config
|
||||
export OCI_CLI_PROFILE=DEFAULT
|
||||
export OCI_REGION=us-chicago-1
|
||||
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..your-compartment-id
|
||||
```
|
||||
|
||||
## Regional Endpoints
|
||||
|
||||
OCI Generative AI is available in multiple regions. The service automatically routes to the appropriate regional endpoint based on your configuration. For a full list of regional model availability, visit:
|
||||
|
||||
https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Authentication Errors**: Verify your OCI credentials and IAM policies
|
||||
2. **Model Not Found**: Ensure the model OCID is correct and the model is available in your region
|
||||
3. **Permission Denied**: Check compartment permissions and Generative AI service access
|
||||
4. **Region Unavailable**: Verify the specified region supports Generative AI services
|
||||
|
||||
### Getting Help
|
||||
|
||||
For additional support:
|
||||
- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm)
|
||||
- [Llama Stack Issues](https://github.com/meta-llama/llama-stack/issues)
|
||||
|
|
@ -163,7 +163,41 @@ docker run \
|
|||
--port $LLAMA_STACK_PORT
|
||||
```
|
||||
|
||||
### Via venv
|
||||
The container will run the distribution with a SQLite store by default. This store is used for the following components:
|
||||
|
||||
- Metadata store: store metadata about the models, providers, etc.
|
||||
- Inference store: collect of responses from the inference provider
|
||||
- Agents store: store agent configurations (sessions, turns, etc.)
|
||||
- Agents Responses store: store responses from the agents
|
||||
|
||||
However, you can use PostgreSQL instead by running the `starter::run-with-postgres-store.yaml` configuration:
|
||||
|
||||
```bash
|
||||
docker run \
|
||||
-it \
|
||||
--pull always \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-e OPENAI_API_KEY=your_openai_key \
|
||||
-e FIREWORKS_API_KEY=your_fireworks_key \
|
||||
-e TOGETHER_API_KEY=your_together_key \
|
||||
-e POSTGRES_HOST=your_postgres_host \
|
||||
-e POSTGRES_PORT=your_postgres_port \
|
||||
-e POSTGRES_DB=your_postgres_db \
|
||||
-e POSTGRES_USER=your_postgres_user \
|
||||
-e POSTGRES_PASSWORD=your_postgres_password \
|
||||
llamastack/distribution-starter \
|
||||
starter::run-with-postgres-store.yaml
|
||||
```
|
||||
|
||||
Postgres environment variables:
|
||||
|
||||
- `POSTGRES_HOST`: Postgres host (default: `localhost`)
|
||||
- `POSTGRES_PORT`: Postgres port (default: `5432`)
|
||||
- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
|
||||
- `POSTGRES_USER`: Postgres username (default: `llamastack`)
|
||||
- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)
|
||||
|
||||
### Via Conda or venv
|
||||
|
||||
Ensure you have configured the starter distribution using the environment variables explained above.
|
||||
|
||||
|
|
@ -171,8 +205,11 @@ Ensure you have configured the starter distribution using the environment variab
|
|||
# Install dependencies for the starter distribution
|
||||
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install
|
||||
|
||||
# Run the server
|
||||
# Run the server (with SQLite - default)
|
||||
uv run --with llama-stack llama stack run starter
|
||||
|
||||
# Or run with PostgreSQL
|
||||
uv run --with llama-stack llama stack run starter::run-with-postgres-store.yaml
|
||||
```
|
||||
|
||||
## Example Usage
|
||||
|
|
|
|||
|
|
@ -144,7 +144,7 @@ source .venv/bin/activate
|
|||
```bash
|
||||
uv venv client --python 3.12
|
||||
source client/bin/activate
|
||||
pip install llama-stack-client
|
||||
uv pip install llama-stack-client
|
||||
```
|
||||
</TabItem>
|
||||
</Tabs>
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
---
|
||||
description: "AWS Bedrock inference provider for accessing various AI models through AWS's managed service."
|
||||
description: "AWS Bedrock inference provider using OpenAI compatible endpoint."
|
||||
sidebar_label: Remote - Bedrock
|
||||
title: remote::bedrock
|
||||
---
|
||||
|
|
@ -8,7 +8,7 @@ title: remote::bedrock
|
|||
|
||||
## Description
|
||||
|
||||
AWS Bedrock inference provider for accessing various AI models through AWS's managed service.
|
||||
AWS Bedrock inference provider using OpenAI compatible endpoint.
|
||||
|
||||
## Configuration
|
||||
|
||||
|
|
@ -16,19 +16,12 @@ AWS Bedrock inference provider for accessing various AI models through AWS's man
|
|||
|-------|------|----------|---------|-------------|
|
||||
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `aws_access_key_id` | `str \| None` | No | | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
|
||||
| `aws_secret_access_key` | `str \| None` | No | | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
|
||||
| `aws_session_token` | `str \| None` | No | | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
|
||||
| `region_name` | `str \| None` | No | | The default AWS Region to use, for example, us-west-1 or us-west-2.Default use environment variable: AWS_DEFAULT_REGION |
|
||||
| `profile_name` | `str \| None` | No | | The profile name that contains credentials to use.Default use environment variable: AWS_PROFILE |
|
||||
| `total_max_attempts` | `int \| None` | No | | An integer representing the maximum number of attempts that will be made for a single request, including the initial attempt. Default use environment variable: AWS_MAX_ATTEMPTS |
|
||||
| `retry_mode` | `str \| None` | No | | A string representing the type of retries Boto3 will perform.Default use environment variable: AWS_RETRY_MODE |
|
||||
| `connect_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds. |
|
||||
| `read_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to read from a connection.The default is 60 seconds. |
|
||||
| `session_ttl` | `int \| None` | No | 3600 | The time in seconds till a session expires. The default is 3600 seconds (1 hour). |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `region_name` | `<class 'str'>` | No | us-east-2 | AWS Region for the Bedrock Runtime endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
{}
|
||||
api_key: ${env.AWS_BEDROCK_API_KEY:=}
|
||||
region_name: ${env.AWS_DEFAULT_REGION:=us-east-2}
|
||||
```
|
||||
|
|
|
|||
41
docs/docs/providers/inference/remote_oci.mdx
Normal file
41
docs/docs/providers/inference/remote_oci.mdx
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
---
|
||||
description: |
|
||||
Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
|
||||
Provider documentation
|
||||
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
|
||||
sidebar_label: Remote - Oci
|
||||
title: remote::oci
|
||||
---
|
||||
|
||||
# remote::oci
|
||||
|
||||
## Description
|
||||
|
||||
|
||||
Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
|
||||
Provider documentation
|
||||
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
|
||||
|
||||
|
||||
## Configuration
|
||||
|
||||
| Field | Type | Required | Default | Description |
|
||||
|-------|------|----------|---------|-------------|
|
||||
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `oci_auth_type` | `<class 'str'>` | No | instance_principal | OCI authentication type (must be one of: instance_principal, config_file) |
|
||||
| `oci_region` | `<class 'str'>` | No | us-ashburn-1 | OCI region (e.g., us-ashburn-1) |
|
||||
| `oci_compartment_id` | `<class 'str'>` | No | | OCI compartment ID for the Generative AI service |
|
||||
| `oci_config_file_path` | `<class 'str'>` | No | ~/.oci/config | OCI config file path (required if oci_auth_type is config_file) |
|
||||
| `oci_config_profile` | `<class 'str'>` | No | DEFAULT | OCI config profile (required if oci_auth_type is config_file) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
oci_auth_type: ${env.OCI_AUTH_TYPE:=instance_principal}
|
||||
oci_config_file_path: ${env.OCI_CONFIG_FILE_PATH:=~/.oci/config}
|
||||
oci_config_profile: ${env.OCI_CLI_PROFILE:=DEFAULT}
|
||||
oci_region: ${env.OCI_REGION:=us-ashburn-1}
|
||||
oci_compartment_id: ${env.OCI_COMPARTMENT_OCID:=}
|
||||
```
|
||||
|
|
@ -16,7 +16,7 @@ Passthrough inference provider for connecting to any external inference service
|
|||
|-------|------|----------|---------|-------------|
|
||||
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
||||
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | API Key for the passthrouth endpoint |
|
||||
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
|
||||
| `url` | `<class 'str'>` | No | | The URL for the passthrough endpoint |
|
||||
|
||||
## Sample Configuration
|
||||
|
|
|
|||
|
|
@ -48,11 +48,9 @@ Both OpenAI and Llama Stack support a web-search built-in tool. The [OpenAI doc
|
|||
|
||||
> The type of the web search tool. One of `web_search` or `web_search_2025_08_26`.
|
||||
|
||||
In contrast, the [Llama Stack documentation](https://llamastack.github.io/docs/api/create-a-new-open-ai-response) says that the allowed values for `type` for web search are `MOD1`, `MOD2` and `MOD3`.
|
||||
Is that correct? If so, what are the meanings of each of them? It might make sense for the allowed values for OpenAI map to some values for Llama Stack so that code written to the OpenAI specification
|
||||
also work with Llama Stack.
|
||||
Llama Stack now supports both `web_search` and `web_search_2025_08_26` types, matching OpenAI's API. For backward compatibility, Llama Stack also supports `web_search_preview` and `web_search_preview_2025_03_11` types.
|
||||
|
||||
The OpenAI web search tool also has fields for `filters` and `user_location` which are not documented as options for Llama Stack. If feasible, it would be good to support these too.
|
||||
The OpenAI web search tool also has fields for `filters` and `user_location` which are not yet implemented in Llama Stack. If feasible, it would be good to support these too.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -37,7 +37,7 @@
|
|||
"outputs": [],
|
||||
"source": [
|
||||
"# NBVAL_SKIP\n",
|
||||
"!pip install -U llama-stack\n",
|
||||
"!pip install -U llama-stack llama-stack-client\n",
|
||||
"llama stack list-deps fireworks | xargs -L1 uv pip install\n"
|
||||
]
|
||||
},
|
||||
|
|
|
|||
|
|
@ -44,7 +44,7 @@
|
|||
"outputs": [],
|
||||
"source": [
|
||||
"# NBVAL_SKIP\n",
|
||||
"!pip install -U llama-stack"
|
||||
"!pip install -U llama-stack llama-stack-client\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
|||
|
|
@ -74,6 +74,7 @@
|
|||
"source": [
|
||||
"```bash\n",
|
||||
"uv sync --extra dev\n",
|
||||
"uv pip install -U llama-stack-client\n",
|
||||
"uv pip install -e .\n",
|
||||
"source .venv/bin/activate\n",
|
||||
"```"
|
||||
|
|
|
|||
|
|
@ -170,7 +170,7 @@ def _get_endpoint_functions(
|
|||
for webmethod in webmethods:
|
||||
print(f"Processing {colored(func_name, 'white')}...")
|
||||
operation_name = func_name
|
||||
|
||||
|
||||
if webmethod.method == "GET":
|
||||
prefix = "get"
|
||||
elif webmethod.method == "DELETE":
|
||||
|
|
@ -196,16 +196,10 @@ def _get_endpoint_functions(
|
|||
def _get_defining_class(member_fn: str, derived_cls: type) -> type:
|
||||
"Find the class in which a member function is first defined in a class inheritance hierarchy."
|
||||
|
||||
# This import must be dynamic here
|
||||
from llama_stack.apis.tools import RAGToolRuntime, ToolRuntime
|
||||
|
||||
# iterate in reverse member resolution order to find most specific class first
|
||||
for cls in reversed(inspect.getmro(derived_cls)):
|
||||
for name, _ in inspect.getmembers(cls, inspect.isfunction):
|
||||
if name == member_fn:
|
||||
# HACK ALERT
|
||||
if cls == RAGToolRuntime:
|
||||
return ToolRuntime
|
||||
return cls
|
||||
|
||||
raise ValidationError(
|
||||
|
|
|
|||
|
|
@ -57,6 +57,7 @@ const sidebars: SidebarsConfig = {
|
|||
'distributions/importing_as_library',
|
||||
'distributions/configuration',
|
||||
'distributions/starting_llama_stack_server',
|
||||
'distributions/llama_stack_ui',
|
||||
{
|
||||
type: 'category',
|
||||
label: 'Self-Hosted Distributions',
|
||||
|
|
|
|||
1094
docs/static/deprecated-llama-stack-spec.yaml
vendored
1094
docs/static/deprecated-llama-stack-spec.yaml
vendored
File diff suppressed because it is too large
Load diff
214
docs/static/experimental-llama-stack-spec.yaml
vendored
214
docs/static/experimental-llama-stack-spec.yaml
vendored
|
|
@ -162,7 +162,7 @@ paths:
|
|||
schema:
|
||||
$ref: '#/components/schemas/RegisterDatasetRequest'
|
||||
required: true
|
||||
deprecated: false
|
||||
deprecated: true
|
||||
/v1beta/datasets/{dataset_id}:
|
||||
get:
|
||||
responses:
|
||||
|
|
@ -219,7 +219,7 @@ paths:
|
|||
required: true
|
||||
schema:
|
||||
type: string
|
||||
deprecated: false
|
||||
deprecated: true
|
||||
/v1alpha/eval/benchmarks:
|
||||
get:
|
||||
responses:
|
||||
|
|
@ -270,7 +270,7 @@ paths:
|
|||
schema:
|
||||
$ref: '#/components/schemas/RegisterBenchmarkRequest'
|
||||
required: true
|
||||
deprecated: false
|
||||
deprecated: true
|
||||
/v1alpha/eval/benchmarks/{benchmark_id}:
|
||||
get:
|
||||
responses:
|
||||
|
|
@ -327,7 +327,7 @@ paths:
|
|||
required: true
|
||||
schema:
|
||||
type: string
|
||||
deprecated: false
|
||||
deprecated: true
|
||||
/v1alpha/eval/benchmarks/{benchmark_id}/evaluations:
|
||||
post:
|
||||
responses:
|
||||
|
|
@ -936,68 +936,6 @@ components:
|
|||
- data
|
||||
title: ListDatasetsResponse
|
||||
description: Response from listing datasets.
|
||||
DataSource:
|
||||
oneOf:
|
||||
- $ref: '#/components/schemas/URIDataSource'
|
||||
- $ref: '#/components/schemas/RowsDataSource'
|
||||
discriminator:
|
||||
propertyName: type
|
||||
mapping:
|
||||
uri: '#/components/schemas/URIDataSource'
|
||||
rows: '#/components/schemas/RowsDataSource'
|
||||
RegisterDatasetRequest:
|
||||
type: object
|
||||
properties:
|
||||
purpose:
|
||||
type: string
|
||||
enum:
|
||||
- post-training/messages
|
||||
- eval/question-answer
|
||||
- eval/messages-answer
|
||||
description: >-
|
||||
The purpose of the dataset. One of: - "post-training/messages": The dataset
|
||||
contains a messages column with list of messages for post-training. {
|
||||
"messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant",
|
||||
"content": "Hello, world!"}, ] } - "eval/question-answer": The dataset
|
||||
contains a question column and an answer column for evaluation. { "question":
|
||||
"What is the capital of France?", "answer": "Paris" } - "eval/messages-answer":
|
||||
The dataset contains a messages column with list of messages and an answer
|
||||
column for evaluation. { "messages": [ {"role": "user", "content": "Hello,
|
||||
my name is John Doe."}, {"role": "assistant", "content": "Hello, John
|
||||
Doe. How can I help you today?"}, {"role": "user", "content": "What's
|
||||
my name?"}, ], "answer": "John Doe" }
|
||||
source:
|
||||
$ref: '#/components/schemas/DataSource'
|
||||
description: >-
|
||||
The data source of the dataset. Ensure that the data source schema is
|
||||
compatible with the purpose of the dataset. Examples: - { "type": "uri",
|
||||
"uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri":
|
||||
"lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}"
|
||||
} - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train"
|
||||
} - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content":
|
||||
"Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ]
|
||||
} ] }
|
||||
metadata:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
The metadata for the dataset. - E.g. {"description": "My dataset"}.
|
||||
dataset_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the dataset. If not provided, an ID will be generated.
|
||||
additionalProperties: false
|
||||
required:
|
||||
- purpose
|
||||
- source
|
||||
title: RegisterDatasetRequest
|
||||
Benchmark:
|
||||
type: object
|
||||
properties:
|
||||
|
|
@ -1065,47 +1003,6 @@ components:
|
|||
required:
|
||||
- data
|
||||
title: ListBenchmarksResponse
|
||||
RegisterBenchmarkRequest:
|
||||
type: object
|
||||
properties:
|
||||
benchmark_id:
|
||||
type: string
|
||||
description: The ID of the benchmark to register.
|
||||
dataset_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the dataset to use for the benchmark.
|
||||
scoring_functions:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: >-
|
||||
The scoring functions to use for the benchmark.
|
||||
provider_benchmark_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the provider benchmark to use for the benchmark.
|
||||
provider_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the provider to use for the benchmark.
|
||||
metadata:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: The metadata to use for the benchmark.
|
||||
additionalProperties: false
|
||||
required:
|
||||
- benchmark_id
|
||||
- dataset_id
|
||||
- scoring_functions
|
||||
title: RegisterBenchmarkRequest
|
||||
AggregationFunctionType:
|
||||
type: string
|
||||
enum:
|
||||
|
|
@ -2254,6 +2151,109 @@ components:
|
|||
- hyperparam_search_config
|
||||
- logger_config
|
||||
title: SupervisedFineTuneRequest
|
||||
DataSource:
|
||||
oneOf:
|
||||
- $ref: '#/components/schemas/URIDataSource'
|
||||
- $ref: '#/components/schemas/RowsDataSource'
|
||||
discriminator:
|
||||
propertyName: type
|
||||
mapping:
|
||||
uri: '#/components/schemas/URIDataSource'
|
||||
rows: '#/components/schemas/RowsDataSource'
|
||||
RegisterDatasetRequest:
|
||||
type: object
|
||||
properties:
|
||||
purpose:
|
||||
type: string
|
||||
enum:
|
||||
- post-training/messages
|
||||
- eval/question-answer
|
||||
- eval/messages-answer
|
||||
description: >-
|
||||
The purpose of the dataset. One of: - "post-training/messages": The dataset
|
||||
contains a messages column with list of messages for post-training. {
|
||||
"messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant",
|
||||
"content": "Hello, world!"}, ] } - "eval/question-answer": The dataset
|
||||
contains a question column and an answer column for evaluation. { "question":
|
||||
"What is the capital of France?", "answer": "Paris" } - "eval/messages-answer":
|
||||
The dataset contains a messages column with list of messages and an answer
|
||||
column for evaluation. { "messages": [ {"role": "user", "content": "Hello,
|
||||
my name is John Doe."}, {"role": "assistant", "content": "Hello, John
|
||||
Doe. How can I help you today?"}, {"role": "user", "content": "What's
|
||||
my name?"}, ], "answer": "John Doe" }
|
||||
source:
|
||||
$ref: '#/components/schemas/DataSource'
|
||||
description: >-
|
||||
The data source of the dataset. Ensure that the data source schema is
|
||||
compatible with the purpose of the dataset. Examples: - { "type": "uri",
|
||||
"uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri":
|
||||
"lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}"
|
||||
} - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train"
|
||||
} - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content":
|
||||
"Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ]
|
||||
} ] }
|
||||
metadata:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: >-
|
||||
The metadata for the dataset. - E.g. {"description": "My dataset"}.
|
||||
dataset_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the dataset. If not provided, an ID will be generated.
|
||||
additionalProperties: false
|
||||
required:
|
||||
- purpose
|
||||
- source
|
||||
title: RegisterDatasetRequest
|
||||
RegisterBenchmarkRequest:
|
||||
type: object
|
||||
properties:
|
||||
benchmark_id:
|
||||
type: string
|
||||
description: The ID of the benchmark to register.
|
||||
dataset_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the dataset to use for the benchmark.
|
||||
scoring_functions:
|
||||
type: array
|
||||
items:
|
||||
type: string
|
||||
description: >-
|
||||
The scoring functions to use for the benchmark.
|
||||
provider_benchmark_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the provider benchmark to use for the benchmark.
|
||||
provider_id:
|
||||
type: string
|
||||
description: >-
|
||||
The ID of the provider to use for the benchmark.
|
||||
metadata:
|
||||
type: object
|
||||
additionalProperties:
|
||||
oneOf:
|
||||
- type: 'null'
|
||||
- type: boolean
|
||||
- type: number
|
||||
- type: string
|
||||
- type: array
|
||||
- type: object
|
||||
description: The metadata to use for the benchmark.
|
||||
additionalProperties: false
|
||||
required:
|
||||
- benchmark_id
|
||||
- dataset_id
|
||||
- scoring_functions
|
||||
title: RegisterBenchmarkRequest
|
||||
responses:
|
||||
BadRequest400:
|
||||
description: The request was invalid or malformed
|
||||
|
|
|
|||
13724
docs/static/llama-stack-spec.html
vendored
13724
docs/static/llama-stack-spec.html
vendored
File diff suppressed because it is too large
Load diff
929
docs/static/llama-stack-spec.yaml
vendored
929
docs/static/llama-stack-spec.yaml
vendored
File diff suppressed because it is too large
Load diff
1143
docs/static/stainless-llama-stack-spec.yaml
vendored
1143
docs/static/stainless-llama-stack-spec.yaml
vendored
File diff suppressed because it is too large
Load diff
Loading…
Add table
Add a link
Reference in a new issue