Composable building blocks to build Llama Apps
Find a file
Ben Browning dd1a366347
fix: logprobs support in remote-vllm provider (#1074)
# What does this PR do?

The remote-vllm provider was not passing logprobs options from
CompletionRequest or ChatCompletionRequests through to the OpenAI client
parameters. I manually verified this, as well as observed this provider
failing `TestInference::test_completion_logprobs`. This was filed as
issue #1073.

This fixes that by passing the `logprobs.top_k` value through to the
parameters we pass into the OpenAI client.

Additionally, this fixes a bug in `test_text_inference.py` where it
mistakenly assumed chunk.delta were of type `ContentDelta` for
completion requests. The deltas are of type `ContentDelta` for chat
completion requests, but for basic completion requests the deltas are of
type string. This test was likely failing for other providers that did
properly support logprobs because of this latter issue in the test,
which was hit while fixing the above issue with the remote-vllm
provider.

(Closes #1073)

## Test Plan

First, you need a vllm running. I ran one locally like this:
```
vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8001 --enable-auto-tool-choice --tool-call-parser llama3_json
```

Next, run test_text_inference.py against this vllm using the remote vllm
provider like this:
```
VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote"
```

Before my change, the test failed with this error:
```
llama_stack/providers/tests/inference/test_text_inference.py:155: in test_completion_logprobs
    assert 1 <= len(response.logprobs) <= 5
E   TypeError: object of type 'NoneType' has no len()
```

After my change, the test passes.

[//]: # (## Documentation)

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-02-13 11:00:00 -05:00
.github docs: remove changelog mention from PR template (#1049) 2025-02-11 13:24:53 -05:00
distributions fix: Gaps in doc codegen (#1035) 2025-02-10 13:24:15 -08:00
docs feat: add support for running in a venv (#1018) 2025-02-12 11:13:04 -05:00
llama_stack fix: logprobs support in remote-vllm provider (#1074) 2025-02-13 11:00:00 -05:00
rfcs Update RFC-0001-llama-stack.md (#134) 2024-09-27 09:14:36 -07:00
tests/client-sdk feat: Adding sqlite-vec as a vectordb (#1040) 2025-02-12 10:50:03 -08:00
.gitignore github: ignore non-hidden python virtual environments (#939) 2025-02-03 11:53:05 -08:00
.gitmodules impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00
.pre-commit-config.yaml build: update uv lock to sync package versions (#1026) 2025-02-10 11:42:30 -05:00
.readthedocs.yaml first version of readthedocs (#278) 2024-10-22 10:15:58 +05:30
.ruff.toml Fix precommit check after moving to ruff (#927) 2025-02-02 06:46:45 -08:00
CODE_OF_CONDUCT.md Initial commit 2024-07-23 08:32:33 -07:00
CONTRIBUTING.md docs: Mention convential commits format in CONTRIBUTING.md (#1075) 2025-02-13 10:57:30 -05:00
LICENSE Update LICENSE (#47) 2024-08-29 07:39:50 -07:00
MANIFEST.in Move to use pyproject.toml so it is uv compatible 2025-01-31 21:28:08 -08:00
pyproject.toml Bump version to 0.1.2 2025-02-07 21:52:50 +00:00
README.md docs: Updating wording and nits in the README.md (#992) 2025-02-11 09:53:26 -05:00
requirements.txt chore: Updated requirements.txt (#1017) 2025-02-08 11:50:35 -08:00
SECURITY.md Create SECURITY.md 2024-10-08 13:30:40 -04:00
uv.lock build: update uv lock to sync package versions (#1026) 2025-02-10 11:42:30 -05:00

Llama Stack

PyPI version PyPI - Downloads License Discord

Quick Start | Documentation | Colab Notebook

Llama Stack standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides

  • Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
  • Plugin architecture to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
  • Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment.
  • Multiple developer interfaces like CLI and SDKs for Python, Typescript, iOS, and Android.
  • Standalone applications as examples for how to build production-grade AI applications with Llama Stack.
Llama Stack

Llama Stack Benefits

  • Flexible Options: Developers can choose their preferred infrastructure without changing APIs and enjoy flexible deployment choices.
  • Consistent Experience: With its unified APIs, Llama Stack makes it easier to build, test, and deploy AI applications with consistent application behavior.
  • Robust Ecosystem: Llama Stack is already integrated with distribution partners (cloud providers, hardware vendors, and AI-focused companies) that offer tailored infrastructure, software, and services for deploying Llama models.

By reducing friction and complexity, Llama Stack empowers developers to focus on what they do best: building transformative generative AI applications.

API Providers

Here is a list of the various API providers and available distributions that can help developers get started easily with Llama Stack.

API Provider Builder Environments Agents Inference Memory Safety Telemetry
Meta Reference Single Node
SambaNova Hosted
Cerebras Hosted
Fireworks Hosted
AWS Bedrock Hosted
Together Hosted
Groq Hosted
Ollama Single Node
TGI Hosted and Single Node
NVIDIA NIM Hosted and Single Node
Chroma Single Node
PG Vector Single Node
PyTorch ExecuTorch On-device iOS
vLLM Hosted and Single Node

Distributions

A Llama Stack Distribution (or "distro") is a pre-configured bundle of provider implementations for each API component. Distributions make it easy to get started with a specific deployment scenario - you can begin with a local development setup (eg. ollama) and seamlessly transition to production (eg. Fireworks) without changing your application code. Here are some of the distributions we support:

Distribution Llama Stack Docker Start This Distribution
Meta Reference llamastack/distribution-meta-reference-gpu Guide
Meta Reference Quantized llamastack/distribution-meta-reference-quantized-gpu Guide
SambaNova llamastack/distribution-sambanova Guide
Cerebras llamastack/distribution-cerebras Guide
Ollama llamastack/distribution-ollama Guide
TGI llamastack/distribution-tgi Guide
Together llamastack/distribution-together Guide
Fireworks llamastack/distribution-fireworks Guide
vLLM llamastack/distribution-remote-vllm Guide

Installation

You have two ways to install this repository:

  • Install as a package: You can install the repository directly from PyPI by running the following command:

    pip install llama-stack
    
  • Install from source: If you prefer to install from the source code, make sure you have conda installed. Then, run the following commands:

     mkdir -p ~/local
     cd ~/local
     git clone git@github.com:meta-llama/llama-stack.git
    
     conda create -n stack python=3.10
     conda activate stack
    
     cd llama-stack
     pip install -e .
    

Documentation

Please checkout our Documentation page for more details.

Llama Stack Client SDKs

Language Client SDK Package
Python llama-stack-client-python PyPI version
Swift llama-stack-client-swift Swift Package Index
Typescript llama-stack-client-typescript NPM version
Kotlin llama-stack-client-kotlin Maven version

Check out our client SDKs for connecting to a Llama Stack server in your preferred language, you can choose from python, typescript, swift, and kotlin programming languages to quickly build your applications.

You can find more example scripts with client SDKs to talk with the Llama Stack server in our llama-stack-apps repo.