Merge branch 'main' into add-mongodb-vector_io

This commit is contained in:
Young Han 2025-11-11 11:13:23 -08:00 committed by GitHub
commit 5e9d28f0b4
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
1791 changed files with 125464 additions and 386541 deletions

View file

@ -35,9 +35,6 @@ Here are the key topics that will help you build effective AI applications:
- **[Telemetry](./telemetry.mdx)** - Monitor and analyze your agents' performance and behavior
- **[Safety](./safety.mdx)** - Implement guardrails and safety measures to ensure responsible AI behavior
### 🎮 **Interactive Development**
- **[Playground](./playground.mdx)** - Interactive environment for testing and developing applications
## Application Patterns
### 🤖 **Conversational Agents**

View file

@ -1,298 +0,0 @@
---
title: Llama Stack Playground
description: Interactive interface to explore and experiment with Llama Stack capabilities
sidebar_label: Playground
sidebar_position: 10
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Llama Stack Playground
:::note[Experimental Feature]
The Llama Stack Playground is currently experimental and subject to change. We welcome feedback and contributions to help improve it.
:::
The Llama Stack Playground is a simple interface that aims to:
- **Showcase capabilities and concepts** of Llama Stack in an interactive environment
- **Demo end-to-end application code** to help users get started building their own applications
- **Provide a UI** to help users inspect and understand Llama Stack API providers and resources
## Key Features
### Interactive Playground Pages
The playground provides interactive pages for users to explore Llama Stack API capabilities:
#### Chatbot Interface
<video
controls
autoPlay
playsInline
muted
loop
style={{width: '100%'}}
>
<source src="https://github.com/user-attachments/assets/8d2ef802-5812-4a28-96e1-316038c84cbf" type="video/mp4" />
Your browser does not support the video tag.
</video>
<Tabs>
<TabItem value="chat" label="Chat">
**Simple Chat Interface**
- Chat directly with Llama models through an intuitive interface
- Uses the `/chat/completions` streaming API under the hood
- Real-time message streaming for responsive interactions
- Perfect for testing model capabilities and prompt engineering
</TabItem>
<TabItem value="rag" label="RAG Chat">
**Document-Aware Conversations**
- Upload documents to create memory banks
- Chat with a RAG-enabled agent that can query your documents
- Uses Llama Stack's `/agents` API to create and manage RAG sessions
- Ideal for exploring knowledge-enhanced AI applications
</TabItem>
</Tabs>
#### Evaluation Interface
<video
controls
autoPlay
playsInline
muted
loop
style={{width: '100%'}}
>
<source src="https://github.com/user-attachments/assets/6cc1659f-eba4-49ca-a0a5-7c243557b4f5" type="video/mp4" />
Your browser does not support the video tag.
</video>
<Tabs>
<TabItem value="scoring" label="Scoring Evaluations">
**Custom Dataset Evaluation**
- Upload your own evaluation datasets
- Run evaluations using available scoring functions
- Uses Llama Stack's `/scoring` API for flexible evaluation workflows
- Great for testing application performance on custom metrics
</TabItem>
<TabItem value="benchmarks" label="Benchmark Evaluations">
<video
controls
autoPlay
playsInline
muted
loop
style={{width: '100%', marginBottom: '1rem'}}
>
<source src="https://github.com/user-attachments/assets/345845c7-2a2b-4095-960a-9ae40f6a93cf" type="video/mp4" />
Your browser does not support the video tag.
</video>
**Pre-registered Evaluation Tasks**
- Evaluate models or agents on pre-defined tasks
- Uses Llama Stack's `/eval` API for comprehensive evaluation
- Combines datasets and scoring functions for standardized testing
**Setup Requirements:**
Register evaluation datasets and benchmarks first:
```bash
# Register evaluation dataset
llama-stack-client datasets register \
--dataset-id "mmlu" \
--provider-id "huggingface" \
--url "https://huggingface.co/datasets/llamastack/evals" \
--metadata '{"path": "llamastack/evals", "name": "evals__mmlu__details", "split": "train"}' \
--schema '{"input_query": {"type": "string"}, "expected_answer": {"type": "string"}, "chat_completion_input": {"type": "string"}}'
# Register benchmark task
llama-stack-client benchmarks register \
--eval-task-id meta-reference-mmlu \
--provider-id meta-reference \
--dataset-id mmlu \
--scoring-functions basic::regex_parser_multiple_choice_answer
```
</TabItem>
</Tabs>
#### Inspection Interface
<video
controls
autoPlay
playsInline
muted
loop
style={{width: '100%'}}
>
<source src="https://github.com/user-attachments/assets/01d52b2d-92af-4e3a-b623-a9b8ba22ba99" type="video/mp4" />
Your browser does not support the video tag.
</video>
<Tabs>
<TabItem value="providers" label="API Providers">
**Provider Management**
- Inspect available Llama Stack API providers
- View provider configurations and capabilities
- Uses the `/providers` API for real-time provider information
- Essential for understanding your deployment's capabilities
</TabItem>
<TabItem value="resources" label="API Resources">
**Resource Exploration**
- Inspect Llama Stack API resources including:
- **Models**: Available language models
- **Datasets**: Registered evaluation datasets
- **Memory Banks**: Vector databases and knowledge stores
- **Benchmarks**: Evaluation tasks and scoring functions
- **Shields**: Safety and content moderation tools
- Uses `/<resources>/list` APIs for comprehensive resource visibility
- For detailed information about resources, see [Core Concepts](/docs/concepts)
</TabItem>
</Tabs>
## Getting Started
### Quick Start Guide
<Tabs>
<TabItem value="setup" label="Setup">
**1. Start the Llama Stack API Server**
```bash
llama stack list-deps together | xargs -L1 uv pip install
llama stack run together
```
**2. Start the Streamlit UI**
```bash
# Launch the playground interface
uv run --with ".[ui]" streamlit run llama_stack.core/ui/app.py
```
</TabItem>
<TabItem value="usage" label="Usage Tips">
**Making the Most of the Playground:**
- **Start with Chat**: Test basic model interactions and prompt engineering
- **Explore RAG**: Upload sample documents to see knowledge-enhanced responses
- **Try Evaluations**: Use the scoring interface to understand evaluation metrics
- **Inspect Resources**: Check what providers and resources are available
- **Experiment with Settings**: Adjust parameters to see how they affect results
</TabItem>
</Tabs>
### Available Distributions
The playground works with any Llama Stack distribution. Popular options include:
<Tabs>
<TabItem value="together" label="Together AI">
```bash
llama stack list-deps together | xargs -L1 uv pip install
llama stack run together
```
**Features:**
- Cloud-hosted models
- Fast inference
- Multiple model options
</TabItem>
<TabItem value="ollama" label="Ollama (Local)">
```bash
llama stack list-deps ollama | xargs -L1 uv pip install
llama stack run ollama
```
**Features:**
- Local model execution
- Privacy-focused
- No internet required
</TabItem>
<TabItem value="meta-reference" label="Meta Reference">
```bash
llama stack list-deps meta-reference | xargs -L1 uv pip install
llama stack run meta-reference
```
**Features:**
- Reference implementation
- All API features available
- Best for development
</TabItem>
</Tabs>
## Use Cases & Examples
### Educational Use Cases
- **Learning Llama Stack**: Hands-on exploration of API capabilities
- **Prompt Engineering**: Interactive testing of different prompting strategies
- **RAG Experimentation**: Understanding how document retrieval affects responses
- **Evaluation Understanding**: See how different metrics evaluate model performance
### Development Use Cases
- **Prototype Testing**: Quick validation of application concepts
- **API Exploration**: Understanding available endpoints and parameters
- **Integration Planning**: Seeing how different components work together
- **Demo Creation**: Showcasing Llama Stack capabilities to stakeholders
### Research Use Cases
- **Model Comparison**: Side-by-side testing of different models
- **Evaluation Design**: Understanding how scoring functions work
- **Safety Testing**: Exploring shield effectiveness with different inputs
- **Performance Analysis**: Measuring model behavior across different scenarios
## Best Practices
### 🚀 **Getting Started**
- Begin with simple chat interactions to understand basic functionality
- Gradually explore more advanced features like RAG and evaluations
- Use the inspection tools to understand your deployment's capabilities
### 🔧 **Development Workflow**
- Use the playground to prototype before writing application code
- Test different parameter settings interactively
- Validate evaluation approaches before implementing them programmatically
### 📊 **Evaluation & Testing**
- Start with simple scoring functions before trying complex evaluations
- Use the playground to understand evaluation results before automation
- Test safety features with various input types
### 🎯 **Production Preparation**
- Use playground insights to inform your production API usage
- Test edge cases and error conditions interactively
- Validate resource configurations before deployment
## Related Resources
- **[Getting Started Guide](../getting_started/quickstart)** - Complete setup and introduction
- **[Core Concepts](/docs/concepts)** - Understanding Llama Stack fundamentals
- **[Agents](./agent)** - Building intelligent agents
- **[RAG (Retrieval Augmented Generation)](./rag)** - Knowledge-enhanced applications
- **[Evaluations](./evals)** - Comprehensive evaluation framework
- **[API Reference](/docs/api/llama-stack-specification)** - Complete API documentation

View file

@ -10,7 +10,7 @@ import TabItem from '@theme/TabItem';
# Kubernetes Deployment Guide
Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers both local development with Kind and production deployment on AWS EKS.
Deploy Llama Stack and vLLM servers in a Kubernetes cluster instead of running them locally. This guide covers deployment using the Kubernetes operator to manage the Llama Stack server with Kind. The vLLM inference server is deployed manually.
## Prerequisites
@ -110,115 +110,176 @@ spec:
EOF
```
### Step 3: Configure Llama Stack
### Step 3: Install Kubernetes Operator
Update your run configuration:
```yaml
providers:
inference:
- provider_id: vllm
provider_type: remote::vllm
config:
url: http://vllm-server.default.svc.cluster.local:8000/v1
max_tokens: 4096
api_token: fake
```
Build container image:
Install the Llama Stack Kubernetes operator to manage Llama Stack deployments:
```bash
tmp_dir=$(mktemp -d) && cat >$tmp_dir/Containerfile.llama-stack-run-k8s <<EOF
FROM distribution-myenv:dev
RUN apt-get update && apt-get install -y git
RUN git clone https://github.com/meta-llama/llama-stack.git /app/llama-stack-source
ADD ./vllm-llama-stack-run-k8s.yaml /app/config.yaml
EOF
podman build -f $tmp_dir/Containerfile.llama-stack-run-k8s -t llama-stack-run-k8s $tmp_dir
# Install from the latest main branch
kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/main/release/operator.yaml
# Or install a specific version (e.g., v0.4.0)
# kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/v0.4.0/release/operator.yaml
```
### Step 4: Deploy Llama Stack Server
Verify the operator is running:
```bash
kubectl get pods -n llama-stack-operator-system
```
For more information about the operator, see the [llama-stack-k8s-operator repository](https://github.com/llamastack/llama-stack-k8s-operator).
### Step 4: Deploy Llama Stack Server using Operator
Create a `LlamaStackDistribution` custom resource to deploy the Llama Stack server. The operator will automatically create the necessary Deployment, Service, and other resources.
You can optionally override the default `run.yaml` using `spec.server.userConfig` with a ConfigMap (see [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec)).
```yaml
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: llama-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: llama-stack-server
name: llamastack-vllm
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: llama-stack
template:
metadata:
labels:
app.kubernetes.io/name: llama-stack
spec:
containers:
- name: llama-stack
image: localhost/llama-stack-run-k8s:latest
imagePullPolicy: IfNotPresent
command: ["llama", "stack", "run", "/app/config.yaml"]
ports:
- containerPort: 5000
volumeMounts:
- name: llama-storage
mountPath: /root/.llama
volumes:
- name: llama-storage
persistentVolumeClaim:
claimName: llama-pvc
---
apiVersion: v1
kind: Service
metadata:
name: llama-stack-service
spec:
selector:
app.kubernetes.io/name: llama-stack
ports:
- protocol: TCP
port: 5000
targetPort: 5000
type: ClusterIP
server:
distribution:
name: starter
containerSpec:
port: 8321
env:
- name: VLLM_URL
value: "http://vllm-server.default.svc.cluster.local:8000/v1"
- name: VLLM_MAX_TOKENS
value: "4096"
- name: VLLM_API_TOKEN
value: "fake"
# Optional: override run.yaml from a ConfigMap using userConfig
userConfig:
configMap:
name: llama-stack-config
storage:
size: "20Gi"
mountPath: "/home/lls/.lls"
EOF
```
**Configuration Options:**
- `replicas`: Number of Llama Stack server instances to run
- `server.distribution.name`: The distribution to use (e.g., `starter` for the starter distribution). See the [list of supported distributions](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/distributions.json) in the operator repository.
- `server.distribution.image`: (Optional) Custom container image for non-supported distributions. Use this field when deploying a distribution that is not in the supported list. If specified, this takes precedence over `name`.
- `server.containerSpec.port`: Port on which the Llama Stack server listens (default: 8321)
- `server.containerSpec.env`: Environment variables to configure providers:
- `server.userConfig`: (Optional) Override the default `run.yaml` using a ConfigMap. See [userConfig spec](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md#userconfigspec).
- `server.storage.size`: Size of the persistent volume for model and data storage
- `server.storage.mountPath`: Where to mount the storage in the container
**Note:** For a complete list of supported distributions, see [distributions.json](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/distributions.json) in the operator repository. To use a custom or non-supported distribution, set the `server.distribution.image` field with your container image instead of `server.distribution.name`.
The operator automatically creates:
- A Deployment for the Llama Stack server
- A Service to access the server
- A PersistentVolumeClaim for storage
- All necessary RBAC resources
Check the status of your deployment:
```bash
kubectl get llamastackdistribution
kubectl describe llamastackdistribution llamastack-vllm
```
### Step 5: Test Deployment
Wait for the Llama Stack server pod to be ready:
```bash
# Port forward and test
kubectl port-forward service/llama-stack-service 5000:5000
llama-stack-client --endpoint http://localhost:5000 inference chat-completion --message "hello, what model are you?"
# Check the status of the LlamaStackDistribution
kubectl get llamastackdistribution llamastack-vllm
# Check the pods created by the operator
kubectl get pods -l app.kubernetes.io/name=llama-stack
# Wait for the pod to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=llama-stack --timeout=300s
```
Get the service name created by the operator (it typically follows the pattern `<llamastackdistribution-name>-service`):
```bash
# List services to find the service name
kubectl get services | grep llamastack
# Port forward and test (replace SERVICE_NAME with the actual service name)
kubectl port-forward service/llamastack-vllm-service 8321:8321
```
In another terminal, test the deployment:
```bash
llama-stack-client --endpoint http://localhost:8321 inference chat-completion --message "hello, what model are you?"
```
## Troubleshooting
**Check pod status:**
### vLLM Server Issues
**Check vLLM pod status:**
```bash
kubectl get pods -l app.kubernetes.io/name=vllm
kubectl logs -l app.kubernetes.io/name=vllm
```
**Test service connectivity:**
**Test vLLM service connectivity:**
```bash
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://vllm-server:8000/v1/models
```
### Llama Stack Server Issues
**Check LlamaStackDistribution status:**
```bash
# Get detailed status
kubectl describe llamastackdistribution llamastack-vllm
# Check for events
kubectl get events --sort-by='.lastTimestamp' | grep llamastack-vllm
```
**Check operator-managed pods:**
```bash
# List all pods managed by the operator
kubectl get pods -l app.kubernetes.io/name=llama-stack
# Check pod logs (replace POD_NAME with actual pod name)
kubectl logs -l app.kubernetes.io/name=llama-stack
```
**Check operator status:**
```bash
# Verify the operator is running
kubectl get pods -n llama-stack-operator-system
# Check operator logs if issues persist
kubectl logs -n llama-stack-operator-system -l control-plane=controller-manager
```
**Verify service connectivity:**
```bash
# Get the service endpoint
kubectl get svc llamastack-vllm-service
# Test connectivity from within the cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- curl http://llamastack-vllm-service:8321/health
```
## Related Resources
- **[Deployment Overview](/docs/deploying/)** - Overview of deployment options
- **[Distributions](/docs/distributions)** - Understanding Llama Stack distributions
- **[Configuration](/docs/distributions/configuration)** - Detailed configuration options
- **[LlamaStack Operator](https://github.com/llamastack/llama-stack-k8s-operator)** - Overview of llama-stack kubernetes operator
- **[LlamaStackDistribution](https://github.com/llamastack/llama-stack-k8s-operator/blob/main/docs/api-overview.md)** - API Spec of the llama-stack operator Custom Resource.

View file

@ -11,7 +11,7 @@ If you are planning to use an external service for Inference (even Ollama or TGI
This avoids the overhead of setting up a server.
```bash
# setup
uv pip install llama-stack
uv pip install llama-stack llama-stack-client
llama stack list-deps starter | xargs -L1 uv pip install
```

View file

@ -19,3 +19,4 @@ This section provides an overview of the distributions available in Llama Stack.
- **[Starting Llama Stack Server](./starting_llama_stack_server.mdx)** - How to run distributions
- **[Importing as Library](./importing_as_library.mdx)** - Use distributions in your code
- **[Configuration Reference](./configuration.mdx)** - Configuration file format details
- **[Llama Stack UI](./llama_stack_ui.mdx)** - Web-based user interface for interacting with Llama Stack servers

View file

@ -44,7 +44,7 @@ spec:
# Navigate to the UI directory
echo "Navigating to UI directory..."
cd /app/llama_stack/ui
cd /app/llama_stack_ui
# Check if package.json exists
if [ ! -f "package.json" ]; then

View file

@ -0,0 +1,109 @@
---
title: Llama Stack UI
description: Web-based user interface for interacting with Llama Stack servers
sidebar_label: Llama Stack UI
sidebar_position: 8
---
# Llama Stack UI
The Llama Stack UI is a web-based interface for interacting with Llama Stack servers. Built with Next.js and React, it provides a visual way to work with agents, manage resources, and view logs.
## Features
- **Logs & Monitoring**: View chat completions, agent responses, and vector store activity
- **Vector Stores**: Create and manage vector databases for RAG (Retrieval-Augmented Generation) workflows
- **Prompt Management**: Create and manage reusable prompts
## Prerequisites
You need a running Llama Stack server. The UI is a client that connects to the Llama Stack backend.
If you don't have a Llama Stack server running yet, see the [Starting Llama Stack Server](../getting_started/starting_llama_stack_server.mdx) guide.
## Running the UI
### Option 1: Using npx (Recommended for Quick Start)
The fastest way to get started is using `npx`:
```bash
npx llama-stack-ui
```
This will start the UI server on `http://localhost:8322` (default port).
### Option 2: Using Docker
Run the UI in a container:
```bash
docker run -p 8322:8322 llamastack/ui
```
Access the UI at `http://localhost:8322`.
## Environment Variables
The UI can be configured using the following environment variables:
| Variable | Description | Default |
|----------|-------------|---------|
| `LLAMA_STACK_BACKEND_URL` | URL of your Llama Stack server | `http://localhost:8321` |
| `LLAMA_STACK_UI_PORT` | Port for the UI server | `8322` |
If the Llama Stack server is running with authentication enabled, you can configure the UI to use it by setting the following environment variables:
| Variable | Description | Default |
|----------|-------------|---------|
| `NEXTAUTH_URL` | NextAuth URL for authentication | `http://localhost:8322` |
| `GITHUB_CLIENT_ID` | GitHub OAuth client ID (optional, for authentication) | - |
| `GITHUB_CLIENT_SECRET` | GitHub OAuth client secret (optional, for authentication) | - |
### Setting Environment Variables
#### For npx:
```bash
LLAMA_STACK_BACKEND_URL=http://localhost:8321 \
LLAMA_STACK_UI_PORT=8080 \
npx llama-stack-ui
```
#### For Docker:
```bash
docker run -p 8080:8080 \
-e LLAMA_STACK_BACKEND_URL=http://localhost:8321 \
-e LLAMA_STACK_UI_PORT=8080 \
llamastack/ui
```
## Using the UI
### Managing Resources
- **Vector Stores**: Create vector databases for RAG workflows, view stored documents and embeddings
- **Prompts**: Create and manage reusable prompt templates
- **Chat Completions**: View history of chat interactions
- **Responses**: Browse detailed agent responses and tool calls
## Development
If you want to run the UI from source for development:
```bash
# From the project root
cd src/llama_stack_ui
# Install dependencies
npm install
# Set environment variables
export LLAMA_STACK_BACKEND_URL=http://localhost:8321
# Start the development server
npm run dev
```
The development server will start on `http://localhost:8322` with hot reloading enabled.

View file

@ -0,0 +1,143 @@
---
orphan: true
---
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
# OCI Distribution
The `llamastack/distribution-oci` distribution consists of the following provider configurations.
| API | Provider(s) |
|-----|-------------|
| agents | `inline::meta-reference` |
| datasetio | `remote::huggingface`, `inline::localfs` |
| eval | `inline::meta-reference` |
| files | `inline::localfs` |
| inference | `remote::oci` |
| safety | `inline::llama-guard` |
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
| vector_io | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
### Environment Variables
The following environment variables can be configured:
- `OCI_AUTH_TYPE`: OCI authentication type (instance_principal or config_file) (default: `instance_principal`)
- `OCI_REGION`: OCI region (e.g., us-ashburn-1, us-chicago-1, us-phoenix-1, eu-frankfurt-1) (default: ``)
- `OCI_COMPARTMENT_OCID`: OCI compartment ID for the Generative AI service (default: ``)
- `OCI_CONFIG_FILE_PATH`: OCI config file path (required if OCI_AUTH_TYPE is config_file) (default: `~/.oci/config`)
- `OCI_CLI_PROFILE`: OCI CLI profile name to use from config file (default: `DEFAULT`)
## Prerequisites
### Oracle Cloud Infrastructure Setup
Before using the OCI Generative AI distribution, ensure you have:
1. **Oracle Cloud Infrastructure Account**: Sign up at [Oracle Cloud Infrastructure](https://cloud.oracle.com/)
2. **Generative AI Service Access**: Enable the Generative AI service in your OCI tenancy
3. **Compartment**: Create or identify a compartment where you'll deploy Generative AI models
4. **Authentication**: Configure authentication using either:
- **Instance Principal** (recommended for cloud-hosted deployments)
- **API Key** (for on-premises or development environments)
### Authentication Methods
#### Instance Principal Authentication (Recommended)
Instance Principal authentication allows OCI resources to authenticate using the identity of the compute instance they're running on. This is the most secure method for production deployments.
Requirements:
- Instance must be running in an Oracle Cloud Infrastructure compartment
- Instance must have appropriate IAM policies to access Generative AI services
#### API Key Authentication
For development or on-premises deployments, follow [this doc](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm) to learn how to create your API signing key for your config file.
### Required IAM Policies
Ensure your OCI user or instance has the following policy statements:
```
Allow group <group_name> to use generative-ai-inference-endpoints in compartment <compartment_name>
Allow group <group_name> to manage generative-ai-inference-endpoints in compartment <compartment_name>
```
## Supported Services
### Inference: OCI Generative AI
Oracle Cloud Infrastructure Generative AI provides access to high-performance AI models through OCI's Platform-as-a-Service offering. The service supports:
- **Chat Completions**: Conversational AI with context awareness
- **Text Generation**: Complete prompts and generate text content
#### Available Models
Common OCI Generative AI models include access to Meta, Cohere, OpenAI, Grok, and more models.
### Safety: Llama Guard
For content safety and moderation, this distribution uses Meta's LlamaGuard model through the OCI Generative AI service to provide:
- Content filtering and moderation
- Policy compliance checking
- Harmful content detection
### Vector Storage: Multiple Options
The distribution supports several vector storage providers:
- **FAISS**: Local in-memory vector search
- **ChromaDB**: Distributed vector database
- **PGVector**: PostgreSQL with vector extensions
### Additional Services
- **Dataset I/O**: Local filesystem and Hugging Face integration
- **Tool Runtime**: Web search (Brave, Tavily) and RAG capabilities
- **Evaluation**: Meta reference evaluation framework
## Running Llama Stack with OCI
You can run the OCI distribution via Docker or local virtual environment.
### Via venv
If you've set up your local development environment, you can also build the image using your local virtual environment.
```bash
OCI_AUTH=$OCI_AUTH_TYPE OCI_REGION=$OCI_REGION OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID llama stack run --port 8321 oci
```
### Configuration Examples
#### Using Instance Principal (Recommended for Production)
```bash
export OCI_AUTH_TYPE=instance_principal
export OCI_REGION=us-chicago-1
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..<your-compartment-id>
```
#### Using API Key Authentication (Development)
```bash
export OCI_AUTH_TYPE=config_file
export OCI_CONFIG_FILE_PATH=~/.oci/config
export OCI_CLI_PROFILE=DEFAULT
export OCI_REGION=us-chicago-1
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..your-compartment-id
```
## Regional Endpoints
OCI Generative AI is available in multiple regions. The service automatically routes to the appropriate regional endpoint based on your configuration. For a full list of regional model availability, visit:
https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions
## Troubleshooting
### Common Issues
1. **Authentication Errors**: Verify your OCI credentials and IAM policies
2. **Model Not Found**: Ensure the model OCID is correct and the model is available in your region
3. **Permission Denied**: Check compartment permissions and Generative AI service access
4. **Region Unavailable**: Verify the specified region supports Generative AI services
### Getting Help
For additional support:
- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm)
- [Llama Stack Issues](https://github.com/meta-llama/llama-stack/issues)

View file

@ -163,7 +163,41 @@ docker run \
--port $LLAMA_STACK_PORT
```
### Via venv
The container will run the distribution with a SQLite store by default. This store is used for the following components:
- Metadata store: store metadata about the models, providers, etc.
- Inference store: collect of responses from the inference provider
- Agents store: store agent configurations (sessions, turns, etc.)
- Agents Responses store: store responses from the agents
However, you can use PostgreSQL instead by running the `starter::run-with-postgres-store.yaml` configuration:
```bash
docker run \
-it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-e OPENAI_API_KEY=your_openai_key \
-e FIREWORKS_API_KEY=your_fireworks_key \
-e TOGETHER_API_KEY=your_together_key \
-e POSTGRES_HOST=your_postgres_host \
-e POSTGRES_PORT=your_postgres_port \
-e POSTGRES_DB=your_postgres_db \
-e POSTGRES_USER=your_postgres_user \
-e POSTGRES_PASSWORD=your_postgres_password \
llamastack/distribution-starter \
starter::run-with-postgres-store.yaml
```
Postgres environment variables:
- `POSTGRES_HOST`: Postgres host (default: `localhost`)
- `POSTGRES_PORT`: Postgres port (default: `5432`)
- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
- `POSTGRES_USER`: Postgres username (default: `llamastack`)
- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)
### Via Conda or venv
Ensure you have configured the starter distribution using the environment variables explained above.
@ -171,8 +205,11 @@ Ensure you have configured the starter distribution using the environment variab
# Install dependencies for the starter distribution
uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install
# Run the server
# Run the server (with SQLite - default)
uv run --with llama-stack llama stack run starter
# Or run with PostgreSQL
uv run --with llama-stack llama stack run starter::run-with-postgres-store.yaml
```
## Example Usage

View file

@ -144,7 +144,7 @@ source .venv/bin/activate
```bash
uv venv client --python 3.12
source client/bin/activate
pip install llama-stack-client
uv pip install llama-stack-client
```
</TabItem>
</Tabs>

View file

@ -1,5 +1,5 @@
---
description: "AWS Bedrock inference provider for accessing various AI models through AWS's managed service."
description: "AWS Bedrock inference provider using OpenAI compatible endpoint."
sidebar_label: Remote - Bedrock
title: remote::bedrock
---
@ -8,7 +8,7 @@ title: remote::bedrock
## Description
AWS Bedrock inference provider for accessing various AI models through AWS's managed service.
AWS Bedrock inference provider using OpenAI compatible endpoint.
## Configuration
@ -16,19 +16,12 @@ AWS Bedrock inference provider for accessing various AI models through AWS's man
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `aws_access_key_id` | `str \| None` | No | | The AWS access key to use. Default use environment variable: AWS_ACCESS_KEY_ID |
| `aws_secret_access_key` | `str \| None` | No | | The AWS secret access key to use. Default use environment variable: AWS_SECRET_ACCESS_KEY |
| `aws_session_token` | `str \| None` | No | | The AWS session token to use. Default use environment variable: AWS_SESSION_TOKEN |
| `region_name` | `str \| None` | No | | The default AWS Region to use, for example, us-west-1 or us-west-2.Default use environment variable: AWS_DEFAULT_REGION |
| `profile_name` | `str \| None` | No | | The profile name that contains credentials to use.Default use environment variable: AWS_PROFILE |
| `total_max_attempts` | `int \| None` | No | | An integer representing the maximum number of attempts that will be made for a single request, including the initial attempt. Default use environment variable: AWS_MAX_ATTEMPTS |
| `retry_mode` | `str \| None` | No | | A string representing the type of retries Boto3 will perform.Default use environment variable: AWS_RETRY_MODE |
| `connect_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to make a connection. The default is 60 seconds. |
| `read_timeout` | `float \| None` | No | 60.0 | The time in seconds till a timeout exception is thrown when attempting to read from a connection.The default is 60 seconds. |
| `session_ttl` | `int \| None` | No | 3600 | The time in seconds till a session expires. The default is 3600 seconds (1 hour). |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `region_name` | `<class 'str'>` | No | us-east-2 | AWS Region for the Bedrock Runtime endpoint |
## Sample Configuration
```yaml
{}
api_key: ${env.AWS_BEDROCK_API_KEY:=}
region_name: ${env.AWS_DEFAULT_REGION:=us-east-2}
```

View file

@ -0,0 +1,41 @@
---
description: |
Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
Provider documentation
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
sidebar_label: Remote - Oci
title: remote::oci
---
# remote::oci
## Description
Oracle Cloud Infrastructure (OCI) Generative AI inference provider for accessing OCI's Generative AI Platform-as-a-Service models.
Provider documentation
https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm
## Configuration
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `oci_auth_type` | `<class 'str'>` | No | instance_principal | OCI authentication type (must be one of: instance_principal, config_file) |
| `oci_region` | `<class 'str'>` | No | us-ashburn-1 | OCI region (e.g., us-ashburn-1) |
| `oci_compartment_id` | `<class 'str'>` | No | | OCI compartment ID for the Generative AI service |
| `oci_config_file_path` | `<class 'str'>` | No | ~/.oci/config | OCI config file path (required if oci_auth_type is config_file) |
| `oci_config_profile` | `<class 'str'>` | No | DEFAULT | OCI config profile (required if oci_auth_type is config_file) |
## Sample Configuration
```yaml
oci_auth_type: ${env.OCI_AUTH_TYPE:=instance_principal}
oci_config_file_path: ${env.OCI_CONFIG_FILE_PATH:=~/.oci/config}
oci_config_profile: ${env.OCI_CLI_PROFILE:=DEFAULT}
oci_region: ${env.OCI_REGION:=us-ashburn-1}
oci_compartment_id: ${env.OCI_COMPARTMENT_OCID:=}
```

View file

@ -16,7 +16,7 @@ Passthrough inference provider for connecting to any external inference service
|-------|------|----------|---------|-------------|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | API Key for the passthrouth endpoint |
| `api_key` | `pydantic.types.SecretStr \| None` | No | | Authentication credential for the provider |
| `url` | `<class 'str'>` | No | | The URL for the passthrough endpoint |
## Sample Configuration

View file

@ -48,11 +48,9 @@ Both OpenAI and Llama Stack support a web-search built-in tool. The [OpenAI doc
> The type of the web search tool. One of `web_search` or `web_search_2025_08_26`.
In contrast, the [Llama Stack documentation](https://llamastack.github.io/docs/api/create-a-new-open-ai-response) says that the allowed values for `type` for web search are `MOD1`, `MOD2` and `MOD3`.
Is that correct? If so, what are the meanings of each of them? It might make sense for the allowed values for OpenAI map to some values for Llama Stack so that code written to the OpenAI specification
also work with Llama Stack.
Llama Stack now supports both `web_search` and `web_search_2025_08_26` types, matching OpenAI's API. For backward compatibility, Llama Stack also supports `web_search_preview` and `web_search_preview_2025_03_11` types.
The OpenAI web search tool also has fields for `filters` and `user_location` which are not documented as options for Llama Stack. If feasible, it would be good to support these too.
The OpenAI web search tool also has fields for `filters` and `user_location` which are not yet implemented in Llama Stack. If feasible, it would be good to support these too.
---

View file

@ -37,7 +37,7 @@
"outputs": [],
"source": [
"# NBVAL_SKIP\n",
"!pip install -U llama-stack\n",
"!pip install -U llama-stack llama-stack-client\n",
"llama stack list-deps fireworks | xargs -L1 uv pip install\n"
]
},

View file

@ -44,7 +44,7 @@
"outputs": [],
"source": [
"# NBVAL_SKIP\n",
"!pip install -U llama-stack"
"!pip install -U llama-stack llama-stack-client\n"
]
},
{

View file

@ -74,6 +74,7 @@
"source": [
"```bash\n",
"uv sync --extra dev\n",
"uv pip install -U llama-stack-client\n",
"uv pip install -e .\n",
"source .venv/bin/activate\n",
"```"

View file

@ -170,7 +170,7 @@ def _get_endpoint_functions(
for webmethod in webmethods:
print(f"Processing {colored(func_name, 'white')}...")
operation_name = func_name
if webmethod.method == "GET":
prefix = "get"
elif webmethod.method == "DELETE":
@ -196,16 +196,10 @@ def _get_endpoint_functions(
def _get_defining_class(member_fn: str, derived_cls: type) -> type:
"Find the class in which a member function is first defined in a class inheritance hierarchy."
# This import must be dynamic here
from llama_stack.apis.tools import RAGToolRuntime, ToolRuntime
# iterate in reverse member resolution order to find most specific class first
for cls in reversed(inspect.getmro(derived_cls)):
for name, _ in inspect.getmembers(cls, inspect.isfunction):
if name == member_fn:
# HACK ALERT
if cls == RAGToolRuntime:
return ToolRuntime
return cls
raise ValidationError(

View file

@ -57,6 +57,7 @@ const sidebars: SidebarsConfig = {
'distributions/importing_as_library',
'distributions/configuration',
'distributions/starting_llama_stack_server',
'distributions/llama_stack_ui',
{
type: 'category',
label: 'Self-Hosted Distributions',

File diff suppressed because it is too large Load diff

View file

@ -162,7 +162,7 @@ paths:
schema:
$ref: '#/components/schemas/RegisterDatasetRequest'
required: true
deprecated: false
deprecated: true
/v1beta/datasets/{dataset_id}:
get:
responses:
@ -219,7 +219,7 @@ paths:
required: true
schema:
type: string
deprecated: false
deprecated: true
/v1alpha/eval/benchmarks:
get:
responses:
@ -270,7 +270,7 @@ paths:
schema:
$ref: '#/components/schemas/RegisterBenchmarkRequest'
required: true
deprecated: false
deprecated: true
/v1alpha/eval/benchmarks/{benchmark_id}:
get:
responses:
@ -327,7 +327,7 @@ paths:
required: true
schema:
type: string
deprecated: false
deprecated: true
/v1alpha/eval/benchmarks/{benchmark_id}/evaluations:
post:
responses:
@ -936,68 +936,6 @@ components:
- data
title: ListDatasetsResponse
description: Response from listing datasets.
DataSource:
oneOf:
- $ref: '#/components/schemas/URIDataSource'
- $ref: '#/components/schemas/RowsDataSource'
discriminator:
propertyName: type
mapping:
uri: '#/components/schemas/URIDataSource'
rows: '#/components/schemas/RowsDataSource'
RegisterDatasetRequest:
type: object
properties:
purpose:
type: string
enum:
- post-training/messages
- eval/question-answer
- eval/messages-answer
description: >-
The purpose of the dataset. One of: - "post-training/messages": The dataset
contains a messages column with list of messages for post-training. {
"messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant",
"content": "Hello, world!"}, ] } - "eval/question-answer": The dataset
contains a question column and an answer column for evaluation. { "question":
"What is the capital of France?", "answer": "Paris" } - "eval/messages-answer":
The dataset contains a messages column with list of messages and an answer
column for evaluation. { "messages": [ {"role": "user", "content": "Hello,
my name is John Doe."}, {"role": "assistant", "content": "Hello, John
Doe. How can I help you today?"}, {"role": "user", "content": "What's
my name?"}, ], "answer": "John Doe" }
source:
$ref: '#/components/schemas/DataSource'
description: >-
The data source of the dataset. Ensure that the data source schema is
compatible with the purpose of the dataset. Examples: - { "type": "uri",
"uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri":
"lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}"
} - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train"
} - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content":
"Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ]
} ] }
metadata:
type: object
additionalProperties:
oneOf:
- type: 'null'
- type: boolean
- type: number
- type: string
- type: array
- type: object
description: >-
The metadata for the dataset. - E.g. {"description": "My dataset"}.
dataset_id:
type: string
description: >-
The ID of the dataset. If not provided, an ID will be generated.
additionalProperties: false
required:
- purpose
- source
title: RegisterDatasetRequest
Benchmark:
type: object
properties:
@ -1065,47 +1003,6 @@ components:
required:
- data
title: ListBenchmarksResponse
RegisterBenchmarkRequest:
type: object
properties:
benchmark_id:
type: string
description: The ID of the benchmark to register.
dataset_id:
type: string
description: >-
The ID of the dataset to use for the benchmark.
scoring_functions:
type: array
items:
type: string
description: >-
The scoring functions to use for the benchmark.
provider_benchmark_id:
type: string
description: >-
The ID of the provider benchmark to use for the benchmark.
provider_id:
type: string
description: >-
The ID of the provider to use for the benchmark.
metadata:
type: object
additionalProperties:
oneOf:
- type: 'null'
- type: boolean
- type: number
- type: string
- type: array
- type: object
description: The metadata to use for the benchmark.
additionalProperties: false
required:
- benchmark_id
- dataset_id
- scoring_functions
title: RegisterBenchmarkRequest
AggregationFunctionType:
type: string
enum:
@ -2254,6 +2151,109 @@ components:
- hyperparam_search_config
- logger_config
title: SupervisedFineTuneRequest
DataSource:
oneOf:
- $ref: '#/components/schemas/URIDataSource'
- $ref: '#/components/schemas/RowsDataSource'
discriminator:
propertyName: type
mapping:
uri: '#/components/schemas/URIDataSource'
rows: '#/components/schemas/RowsDataSource'
RegisterDatasetRequest:
type: object
properties:
purpose:
type: string
enum:
- post-training/messages
- eval/question-answer
- eval/messages-answer
description: >-
The purpose of the dataset. One of: - "post-training/messages": The dataset
contains a messages column with list of messages for post-training. {
"messages": [ {"role": "user", "content": "Hello, world!"}, {"role": "assistant",
"content": "Hello, world!"}, ] } - "eval/question-answer": The dataset
contains a question column and an answer column for evaluation. { "question":
"What is the capital of France?", "answer": "Paris" } - "eval/messages-answer":
The dataset contains a messages column with list of messages and an answer
column for evaluation. { "messages": [ {"role": "user", "content": "Hello,
my name is John Doe."}, {"role": "assistant", "content": "Hello, John
Doe. How can I help you today?"}, {"role": "user", "content": "What's
my name?"}, ], "answer": "John Doe" }
source:
$ref: '#/components/schemas/DataSource'
description: >-
The data source of the dataset. Ensure that the data source schema is
compatible with the purpose of the dataset. Examples: - { "type": "uri",
"uri": "https://mywebsite.com/mydata.jsonl" } - { "type": "uri", "uri":
"lsfs://mydata.jsonl" } - { "type": "uri", "uri": "data:csv;base64,{base64_content}"
} - { "type": "uri", "uri": "huggingface://llamastack/simpleqa?split=train"
} - { "type": "rows", "rows": [ { "messages": [ {"role": "user", "content":
"Hello, world!"}, {"role": "assistant", "content": "Hello, world!"}, ]
} ] }
metadata:
type: object
additionalProperties:
oneOf:
- type: 'null'
- type: boolean
- type: number
- type: string
- type: array
- type: object
description: >-
The metadata for the dataset. - E.g. {"description": "My dataset"}.
dataset_id:
type: string
description: >-
The ID of the dataset. If not provided, an ID will be generated.
additionalProperties: false
required:
- purpose
- source
title: RegisterDatasetRequest
RegisterBenchmarkRequest:
type: object
properties:
benchmark_id:
type: string
description: The ID of the benchmark to register.
dataset_id:
type: string
description: >-
The ID of the dataset to use for the benchmark.
scoring_functions:
type: array
items:
type: string
description: >-
The scoring functions to use for the benchmark.
provider_benchmark_id:
type: string
description: >-
The ID of the provider benchmark to use for the benchmark.
provider_id:
type: string
description: >-
The ID of the provider to use for the benchmark.
metadata:
type: object
additionalProperties:
oneOf:
- type: 'null'
- type: boolean
- type: number
- type: string
- type: array
- type: object
description: The metadata to use for the benchmark.
additionalProperties: false
required:
- benchmark_id
- dataset_id
- scoring_functions
title: RegisterBenchmarkRequest
responses:
BadRequest400:
description: The request was invalid or malformed

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff