mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

feat: add oci genai service as chat inference provider (#3876 )

# What does this PR do?
Adds OCI GenAI PaaS models for openai chat completion endpoints.

## Test Plan
In an OCI tenancy with access to GenAI PaaS, perform the following
steps:

1. Ensure you have IAM policies in place to use service (check docs
included in this PR)
2. For local development, [setup OCI
cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm)
and configure the CLI with your region, tenancy, and auth
[here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm)
3. Once configured, go through llama-stack setup and run llama-stack
(uses config based auth) like:
```bash
OCI_AUTH_TYPE=config_file \
OCI_CLI_PROFILE=CHICAGO \
OCI_REGION=us-chicago-1 \
OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \
llama stack run oci
```
4. Hit the `models` endpoint to list models after server is running:
```bash
curl http://localhost:8321/v1/models | jq
...
{
      "identifier": "meta.llama-4-scout-17b-16e-instruct",
      "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q",
      "provider_id": "oci",
      "type": "model",
      "metadata": {
        "display_name": "meta.llama-4-scout-17b-16e-instruct",
        "capabilities": [
          "CHAT"
        ],
        "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q"
      },
      "model_type": "llm"
},
   ...
```
5. Use the "display_name" field to use the model in a
`/chat/completions` request:
```bash
# Streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": true,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'

# Non-streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": false,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'
```
6. Try out other models from the `/models` endpoint.

2025-11-10 16:16:24 -05:00

5.8 KiB

Raw Blame History

orphan
true

OCI Distribution

The llamastack/distribution-oci distribution consists of the following provider configurations.

API	Provider(s)
agents	`inline::meta-reference`
datasetio	`remote::huggingface`, `inline::localfs`
eval	`inline::meta-reference`
files	`inline::localfs`
inference	`remote::oci`
safety	`inline::llama-guard`
scoring	`inline::basic`, `inline::llm-as-judge`, `inline::braintrust`
tool_runtime	`remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol`
vector_io	`inline::faiss`, `remote::chromadb`, `remote::pgvector`

Environment Variables

The following environment variables can be configured:

OCI_AUTH_TYPE: OCI authentication type (instance_principal or config_file) (default: instance_principal)
OCI_REGION: OCI region (e.g., us-ashburn-1, us-chicago-1, us-phoenix-1, eu-frankfurt-1) (default: ``)
OCI_COMPARTMENT_OCID: OCI compartment ID for the Generative AI service (default: ``)
OCI_CONFIG_FILE_PATH: OCI config file path (required if OCI_AUTH_TYPE is config_file) (default: ~/.oci/config)
OCI_CLI_PROFILE: OCI CLI profile name to use from config file (default: DEFAULT)

Prerequisites

Oracle Cloud Infrastructure Setup

Before using the OCI Generative AI distribution, ensure you have:

Oracle Cloud Infrastructure Account: Sign up at Oracle Cloud Infrastructure
Generative AI Service Access: Enable the Generative AI service in your OCI tenancy
Compartment: Create or identify a compartment where you'll deploy Generative AI models
Authentication: Configure authentication using either:
- Instance Principal (recommended for cloud-hosted deployments)
- API Key (for on-premises or development environments)

Authentication Methods

Instance Principal Authentication (Recommended)

Instance Principal authentication allows OCI resources to authenticate using the identity of the compute instance they're running on. This is the most secure method for production deployments.

Requirements:

Instance must be running in an Oracle Cloud Infrastructure compartment
Instance must have appropriate IAM policies to access Generative AI services

API Key Authentication

For development or on-premises deployments, follow this doc to learn how to create your API signing key for your config file.

Required IAM Policies

Ensure your OCI user or instance has the following policy statements:

Allow group <group_name> to use generative-ai-inference-endpoints in compartment <compartment_name>
Allow group <group_name> to manage generative-ai-inference-endpoints in compartment <compartment_name>

Supported Services

Inference: OCI Generative AI

Oracle Cloud Infrastructure Generative AI provides access to high-performance AI models through OCI's Platform-as-a-Service offering. The service supports:

Chat Completions: Conversational AI with context awareness
Text Generation: Complete prompts and generate text content

Available Models

Common OCI Generative AI models include access to Meta, Cohere, OpenAI, Grok, and more models.

Safety: Llama Guard

For content safety and moderation, this distribution uses Meta's LlamaGuard model through the OCI Generative AI service to provide:

Content filtering and moderation
Policy compliance checking
Harmful content detection

Vector Storage: Multiple Options

The distribution supports several vector storage providers:

FAISS: Local in-memory vector search
ChromaDB: Distributed vector database
PGVector: PostgreSQL with vector extensions

Additional Services

Dataset I/O: Local filesystem and Hugging Face integration
Tool Runtime: Web search (Brave, Tavily) and RAG capabilities
Evaluation: Meta reference evaluation framework

Running Llama Stack with OCI

You can run the OCI distribution via Docker or local virtual environment.

Via venv

If you've set up your local development environment, you can also build the image using your local virtual environment.

OCI_AUTH=$OCI_AUTH_TYPE OCI_REGION=$OCI_REGION OCI_COMPARTMENT_OCID=$OCI_COMPARTMENT_OCID llama stack run --port 8321 oci

Configuration Examples

Using Instance Principal (Recommended for Production)

export OCI_AUTH_TYPE=instance_principal
export OCI_REGION=us-chicago-1
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..<your-compartment-id>

Using API Key Authentication (Development)

export OCI_AUTH_TYPE=config_file
export OCI_CONFIG_FILE_PATH=~/.oci/config
export OCI_CLI_PROFILE=DEFAULT
export OCI_REGION=us-chicago-1
export OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..your-compartment-id

Regional Endpoints

OCI Generative AI is available in multiple regions. The service automatically routes to the appropriate regional endpoint based on your configuration. For a full list of regional model availability, visit:

https://docs.oracle.com/en-us/iaas/Content/generative-ai/overview.htm#regions

Troubleshooting

Common Issues

Authentication Errors: Verify your OCI credentials and IAM policies
Model Not Found: Ensure the model OCID is correct and the model is available in your region
Permission Denied: Check compartment permissions and Generative AI service access
Region Unavailable: Verify the specified region supports Generative AI services

Getting Help

For additional support:

5.8 KiB Raw Blame History