Commit graph

334 commits

Author SHA1 Message Date
Nathan Weinberg
854c2ad264
fix: misleading help text for 'llama stack build' and 'llama stack run' (#1910)
# What does this PR do?
current text for 'llama stack build' and 'llama stack run' says that if
no argument is passed to '--image-name' that the active Conda
environment will be used

in reality, the active enviroment is used whether it is from conda,
virtualenv, etc.

## Test Plan
N/A

## Documentation
N/A

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-04-12 01:19:11 -07:00
Aidan Reilly
51492bd9b6
docs: Update docs and fix warning in start-stack.sh (#1937)
Small docs update and an update for `start-stack.sh` with missing color
and if statment logic.

# What does this PR do?
1. Makes a small change to start-stack.sh to resolve this error:
```cmd
/home/aireilly/.local/lib/python3.13/site-packages/llama_stack/distribution/start_stack.sh: line 76: [: missing ]'
```
2. Adds a missing $GREEN colour to start-stack.sh
3. Updated `docs/source/getting_started/detailed_tutorial.md` with some
small changes and corrections.

## Test Plan
Procedures described in
`docs/source/getting_started/detailed_tutorial.md` were verified on
Linux Fedora 41.
2025-04-11 16:26:17 -07:00
raghotham
ed58a94b30
docs: fixes to quick start (#1943)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Co-authored-by: Francisco Arceo <farceo@redhat.com>
2025-04-11 13:41:23 -07:00
Francisco Arceo
24d70cedca
docs: Updated docs to show minimal RAG example and some other minor changes (#1935)
# What does this PR do?
Incorporating some feedback into the docs.

- **`docs/source/getting_started/index.md`:**
    - Demo actually does RAG now
    - Simplified the installation command for dependencies.
    - Updated demo script examples to align with the latest API changes.
- Replaced manual document manipulation with `RAGDocument` for clarity
and maintainability.
- Introduced new logic for model and embedding selection using the Llama
Stack Client SDK.
- Enhanced examples to showcase proper agent initialization and logging.
- **`docs/source/getting_started/detailed_tutorial.md`:**
- Updated the section for listing models to include proper code
formatting with `bash`.
    - Removed and reorganized the "Run the Demos" section for clarity.
- Adjusted tab-item structures and added new instructions for demo
scripts.
- **`docs/_static/css/my_theme.css`:**
- Updated heading styles to include `h2`, `h3`, and `h4` for consistent
font weight.
- Added a new style for `pre` tags to wrap text and break long words,
this is particularly useful for rendering long output from generation.

    
## Test Plan
Tested locally. Screenshot for reference:

<img width="1250" alt="Screenshot 2025-04-10 at 10 12 12 PM"
src="https://github.com/user-attachments/assets/ce1c8986-e072-4c6f-a697-ed0d8fb75b34"
/>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-04-11 11:50:36 -07:00
Mark Campbell
6aa459b00c
docs: fix errors in kubernetes deployment guide (#1914)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Fixes a couple of errors in PVC/Secret setup and adds context for
expected Hugging Face token
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
2025-04-11 13:04:13 +02:00
Francisco Arceo
49955a06b1
docs: Update quickstart page to structure things a little more for the novices (#1873)
# What does this PR do?
Another doc enhancement for
https://github.com/meta-llama/llama-stack/issues/1818

Summary of changes:
- `docs/source/distributions/configuration.md`
   - Updated dropdown title to include a more user-friendly description.

- `docs/_static/css/my_theme.css`
   - Added styling for `<h3>` elements to set a normal font weight.

- `docs/source/distributions/starting_llama_stack_server.md`
- Changed section headers from bold text to proper markdown headers
(e.g., `##`).
- Improved descriptions for starting Llama Stack server using different
methods (library, container, conda, Kubernetes).
- Enhanced clarity and structure by converting instructions into
markdown headers and improved formatting.

- `docs/source/getting_started/index.md`
   - Major restructuring of the "Quick Start" guide:
- Added new introductory section for Llama Stack and its capabilities.
- Reorganized steps into clearer subsections with proper markdown
headers.
- Replaced dropdowns with tabbed content for OS-specific instructions.
- Added detailed steps for setting up and running the Llama Stack server
and client.
- Introduced new sections for running basic inference and building
agents.
- Enhanced readability and visual structure with emojis, admonitions,
and examples.

- `docs/source/providers/index.md`
   - Updated the list of LLM inference providers to include "Ollama."
   - Expanded the list of vector databases to include "SQLite-Vec."

Let me know if you need further details!

## Test Plan
Renders locally, included screenshot.

# Documentation

For https://github.com/meta-llama/llama-stack/issues/1818

<img width="1332" alt="Screenshot 2025-04-09 at 11 07 12 AM"
src="https://github.com/user-attachments/assets/c106efb9-076c-4059-a4e0-a30fa738585b"
/>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-04-10 14:09:00 -07:00
Sébastien Han
1f2df59ece
docs: fix model name (#1926)
# What does this PR do?

Use llama3.2:3b for consistency.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-10 09:37:48 -07:00
Yuan Tang
1be66d754e
docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923)
# What does this PR do?

vLLM website just added a [new index page for installing for different
hardware
accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html).
This PR adds a link to that page with additional edits to make sure
readers are aware that the use of GPUs on this page are for
demonstration purposes only.

This closes https://github.com/meta-llama/llama-stack/issues/1813.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-04-10 10:04:17 +02:00
Yuan Tang
712c6758c6
docs: Avoid bash script syntax highlighting for dark mode (#1918)
See
https://github.com/meta-llama/llama-stack/pull/1913#issuecomment-2790153778

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-04-09 15:43:43 -07:00
Sébastien Han
770b38f8b5
chore: simplify running the demo UI (#1907)
# What does this PR do?

* Manage UI deps in pyproject
* Use a new "ui" dep group to pull the deps with "uv"
* Simplify the run command
* Bump versions in requirements.txt

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-09 11:22:29 -07:00
Francisco Arceo
b93318e40b
chore: Detect browser setting for dark/light mode and set default to light mode (#1913)
# What does this PR do?

1. Adding some lightweight JS to detect the default browser setting for
dark/light mode
3. Setting default screen setting to light mode as to not change default
behavior.

From the docs: https://github.com/MrDogeBro/sphinx_rtd_dark_mode

>This lets you choose which theme the user sees when they load the docs
for the first time ever. After the first time however, this setting has
no effect as the users preference is stored in local storage within
their browser. This option accepts a boolean for the value. If this
option is true (the default option), users will start in dark mode when
first visiting the site. If this option is false, users will start in
light mode when they first visit the site.

# Closes #1915 

## Test Plan
Tested locally on my Mac on Safari and Chrome.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-04-09 12:40:56 -04:00
Matthew Farrellee
a2cf299906
fix: update getting started guide to use ollama pull (#1855)
# What does this PR do?

download the getting started w/ ollama model instead of downloading and
running it.

directly running it was necessary before
https://github.com/meta-llama/llama-stack/pull/1854

## Test Plan

run the code on the page
2025-04-09 10:35:19 +02:00
Sébastien Han
389767010b
feat: ability to execute external providers (#1672)
# What does this PR do?

Providers that live outside of the llama-stack codebase are now
supported.
A new property `external_providers_dir` has been added to the main
config and can be configured as follow:

```
external_providers_dir: /etc/llama-stack/providers.d/
```

Where the expected structure is:

```
providers.d/
  inference/
    custom_ollama.yaml
    vllm.yaml
  vector_io/
    qdrant.yaml
```

Where `custom_ollama.yaml` is:

```
adapter:
  adapter_type: custom_ollama
  pip_packages: ["ollama", "aiohttp"]
  config_class: llama_stack_ollama_provider.config.OllamaImplConfig
  module: llama_stack_ollama_provider
api_dependencies: []
optional_api_dependencies: []
```

Obviously the package must be installed on the system, here is the
`llama_stack_ollama_provider` example:

```
$ uv pip show llama-stack-ollama-provider
Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv
Name: llama-stack-ollama-provider
Version: 0.1.0
Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages
Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider
Requires:
Required-by:
```

Closes: https://github.com/meta-llama/llama-stack/issues/658

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-09 10:30:41 +02:00
AlexHe99
983f6feeb8
docs: Update remote-vllm.md with AMD GPU vLLM server supported. (#1858)
Add the content to use AMD GPU as the vLLM server. Split the original
part to two sub chapters,
1. AMD vLLM server
2. NVIDIA vLLM server (orignal)

# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: Alex He <alehe@amd.com>
2025-04-08 21:35:32 -07:00
ehhuang
7b4eb0967e
test: verification on provider's OAI endpoints (#1893)
# What does this PR do?


## Test Plan
export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic;
LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference
--vision-model $MODEL --text-model $MODEL
2025-04-07 23:06:28 -07:00
Matthew Farrellee
c52ccc4bbd
docs: update importing_as_library.md (#1863)
LlamaStackAsLibraryClient.initialize is not async, cannot be await'd
2025-04-07 12:31:04 +02:00
ehhuang
378f0de439
docs: llama4 getting started nb (#1878)
# What does this PR do?


## Test Plan
2025-04-06 18:51:34 -07:00
raghotham
fd7ab37c14
docs: fixing sphinx imports (#1884)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
2025-04-05 14:21:45 -07:00
Ashwin Bharambe
b8f1561956
feat: introduce llama4 support (#1877)
As title says. Details in README, elsewhere.
2025-04-05 11:53:35 -07:00
Francisco Arceo
23a99a4b22
docs: Minor updates to docs to make them a little friendlier to new users (#1871)
# What does this PR do?
This PR modifies some of the docs to help them map to (1) the mental
model of software engineers building AI models starting with RAG and
then moving to Agents and (2) aligning the navbar somewhat closer to the
diagram on the home page.

## Test Plan
N/A Tested locally.

# Documentation
Take a look at the screen shot for below and after.
## Before 
![Screenshot 2025-04-03 at 10 39
32 PM](https://github.com/user-attachments/assets/c4dc9998-3e46-43b0-8425-892c94ec3a6a)

## After
![Screenshot 2025-04-03 at 10 38
37 PM](https://github.com/user-attachments/assets/05670fcd-e56b-42dd-8af2-07b81f941d40)

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-04-04 08:10:35 -04:00
Francisco Arceo
19f504e9e2
docs: Updating docs to source from CONTRIBUTING.md (#1850)
# What does this PR do?
Another for https://github.com/meta-llama/llama-stack/issues/1815

This links the `CONTRIBUTING.md` file directly so that we don't have to
maintain two different files.

Also I updated the title for RAG under Building AI Applications.

## Changes 
Look of what the Contributing page looks like, proof it sources directly
from the markdown file.

![Screenshot 2025-04-01 at 12 43
51 AM](https://github.com/user-attachments/assets/f7021d29-eec3-44ad-a5b3-55c4480ea9ac)

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-04-01 14:50:04 +02:00
Francisco Arceo
d495922949
docs: Updated documentation and Sphinx configuration (#1845)
# What does this PR do?

The goal of this PR is to make the pages easier to navigate by surfacing
the child pages on the navbar, updating some of the copy, moving some of
the files around.

Some changes:
1. Clarifying Titles
2. Restructuring "Distributions" more formally in its own page to be
consistent with Providers and adding some clarity to the child pages to
surface them and make them easier to navigate
3. Updated sphinx config to not collapse navigation by default
4. Updated copyright year to be calculated dynamically 
5. Moved `docs/source/distributions/index.md` ->
`docs/source/distributions/starting_llama_stack_server.md`

Another for https://github.com/meta-llama/llama-stack/issues/1815

## Test Plan
Tested locally and pages build (screen shots for example).

## Documentation
###  Before:
![Screenshot 2025-03-31 at 1 09
21 PM](https://github.com/user-attachments/assets/98e34f76-f0d9-4055-8e2c-441b1e7d8f6a)

### After:
![Screenshot 2025-03-31 at 1 08
52 PM](https://github.com/user-attachments/assets/dfb6b8ad-3a1d-46b6-8f54-0c553664093f)

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-03-31 13:08:05 -07:00
Francisco Arceo
9b478f3756
docs: Adding darkmode to documentation (#1843)
# What does this PR do?
docs: Adding darkmode to documentation


## Test Plan
Tested locally. 

Here's the look:
![Screenshot 2025-03-31 at 9 43
05 AM](https://github.com/user-attachments/assets/5989dbc8-ba03-4710-ad8d-6d4b9ac79786)


## Issues

Related to https://github.com/meta-llama/llama-stack/issues/1815 

Closes https://github.com/meta-llama/llama-stack/issues/1844

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-03-31 08:31:53 -07:00
Anamika
d8a8a734b5
fix: update sink name for traces and metrics in LlamaStack 0.1.8 (#1836)
# What does this PR do?
This PR updates the sink name configuration for traces and metrics in
LlamaStack to align with the latest changes introduced in version 0.1.8.
Previously, when using the `otel` sink along with other sinks (like
`console` and `sqlite`), the system threw a **ValueError**, with the
message:

```shell
Value error, 'otel' is not a valid TelemetrySink [type=value_error, input_value='console,otel,sqlite', input_type=str]
For further information visit https://errors.pydantic.dev/2.10/v/value_error
``` 

## Test Plan
- **Test 1:**  
Ran the LlamaStack server with a configuration containing
`console,otel,sqlite` as sinks.
   - **Expected result:** No errors related to invalid sink names.
   - **Result:** The system ran without throwing a `ValueError`.

- **Test 2:**  
Verified that the `otel_trace`, `otel_metric` sink now works in
combination with other sinks (`console`, `sqlite`).
- **Expected result:** Telemetry data is correctly sent to all specified
sinks without errors.
- **Result:** All telemetry data was successfully sent to the specified
sinks.
2025-03-29 10:09:08 -07:00
Francisco Arceo
37b6da37ba
docs: Document sqlite-vec faiss comparison (#1821)
# What does this PR do?
This PR documents and benchmarks the performance tradeoffs between
sqlite-vec and FAISS inline VectorDB providers.

# Closes https://github.com/meta-llama/llama-stack/issues/1165

## Test Plan

The test was run using this script:

<details>
<summary>CLICK TO SHOW SCRIPT 👋  </summary>

```python

import cProfile
import os
import uuid
import time
import random
import string
import matplotlib.pyplot as plt
import pandas as pd
from termcolor import cprint
from llama_stack_client.types import Document
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
from memory_profiler import profile
from line_profiler import LineProfiler

os.environ["INFERENCE_MODEL"] = "llama3.2:3b-instruct-fp16"
os.environ["LLAMA_STACK_CONFIG"] = "ollama"

def generate_random_chars(count=400):
    return ''.join(random.choices(string.ascii_letters, k=count))

def generate_documents(num_docs: int, num_chars: int):
    documents = [
        Document(
            document_id=f"doc-{i}",
            content=f"Document content for document {i} - {generate_random_chars(count=num_chars)}",
            mime_type="text/plain",
            metadata={},
        )
        for i in range(num_docs)
    ]
    return documents


@profile
def benchmark_write(client, vector_db_id, documents, batch_size=100):
    write_times = []
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        start_time = time.time()
        client.tool_runtime.rag_tool.insert(
            documents=batch,
            vector_db_id=vector_db_id,
            chunk_size_in_tokens=512,
        )
        end_time = time.time()
        write_times.append(end_time - start_time)

    return write_times

@profile
def benchmark_read(client, provider_id, vector_db_id, user_prompts):
    response_times = []
    for prompt in user_prompts:
        start_time = time.time()
        response = client.vector_io.query(
            vector_db_id=vector_db_id,
            query=prompt,
        )
        end_time = time.time()
        response_times.append(end_time - start_time)
    return response_times

def profile_functions():
    profiler = LineProfiler()
    profiler.add_function(benchmark_write)
    profiler.add_function(benchmark_read)
    return profiler


def plot_results(output, batch_size):
    # Create a DataFrame for easy manipulation
    df_sqlite = pd.DataFrame(output['sqlite-vec'])
    df_faiss = pd.DataFrame(output['faiss'])

    df_sqlite['write_times'] *= 1000
    df_faiss['write_times'] *= 1000

    avg_write_sqlite = df_sqlite['write_times'].mean()
    avg_write_faiss = df_faiss['write_times'].mean()
    avg_read_sqlite = df_sqlite['read_times'].mean()
    avg_read_faiss = df_faiss['read_times'].mean()

    plt.figure(figsize=(12, 6))
    plt.hist(df_sqlite['write_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Write Times')
    plt.hist(df_faiss['write_times'], bins=10, alpha=0.5, color='red', label='faiss Write Times')
    plt.axvline(avg_write_sqlite, color='blue', linestyle='--',
                label=f'Average Write Time (sqlite-vec): {avg_write_sqlite:.3f} ms')
    plt.axvline(avg_write_faiss, color='red', linestyle='--',
                label=f'Average Write Time (faiss): {avg_write_faiss:.3f} ms')
    plt.title(f'Histogram of Write Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]} with batch size = {batch_size}')
    plt.xlabel('Time (milliseconds)')
    plt.ylabel('Density')
    plt.legend()
    plt.savefig('write_time_comparison.png')
    plt.close()

    plt.figure(figsize=(12, 6))
    plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times')
    plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times')
    plt.axvline(avg_read_sqlite, color='blue', linestyle='--',
                label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms')
    plt.axvline(avg_read_faiss, color='red', linestyle='--',
                label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms')
    plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}')
    plt.xlabel('Time (milliseconds)')
    plt.ylabel('Density')
    plt.legend()
    plt.savefig('read_time_comparison.png')
    plt.close()

    plt.figure(figsize=(12, 6))
    plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times')
    plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times')
    plt.axvline(avg_read_sqlite, color='blue', linestyle='--',
                label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms')
    plt.axvline(avg_read_faiss, color='red', linestyle='--',
                label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms')
    plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}')
    plt.xlabel('Time (milliseconds)')
    plt.ylabel('Density')
    plt.legend()
    plt.savefig('read_time_comparison.png')
    plt.close()

    plt.figure(figsize=(12, 6))
    plt.plot(df_sqlite.index, df_sqlite['write_times'],
             marker='o', markersize=4, linestyle='-', color='blue',
             label='sqlite-vec Write Times')
    plt.plot(df_faiss.index, df_faiss['write_times'],
             marker='x', markersize=4, linestyle='-', color='red',
             label='faiss Write Times')

    plt.title(f'Write Times by Operation Sequence\n(batch size = {batch_size})')
    plt.xlabel('Write Operation Sequence')
    plt.ylabel('Time (milliseconds)')
    plt.legend()
    plt.grid(True, linestyle='--', alpha=0.7)
    plt.tight_layout()
    plt.savefig('write_time_sequence.png')
    plt.close()
    # Print out the summary table
    print("\nPerformance Summary for sqlite-vec:")
    print(df_sqlite)

    # Print out the summary table
    print("\nPerformance Summary for faiss:")
    print(df_faiss)


def main():
    # Initialize the client
    client = LlamaStackAsLibraryClient("ollama")
    vector_db_id = f"test-vector-db-{uuid.uuid4().hex}"
    _ = client.initialize()

    # Generate a large dataset
    num_chars = 50
    num_docs = 100
    num_writes = 100
    write_batch_size = 100
    num_reads = 100

    documents = generate_documents(num_docs * write_batch_size, num_chars)
    user_prompts = [
        f"Tell me about document {i}" for i in range(1, num_reads + 1)
    ]

    providers = ["sqlite-vec", "faiss"]
    output = {
        provider_id: {"write_times": None, "read_times": None} for provider_id in providers
    }

    # Benchmark writes and reads for SQLite and Faiss
    for provider_id in providers:
        cprint(f"Benchmarking provider: {provider_id}", "yellow")
        client.vector_dbs.register(
            provider_id=provider_id,
            vector_db_id=vector_db_id,
            embedding_model="all-MiniLM-L6-v2",
            embedding_dimension=384,
        )
        write_times = benchmark_write(client, vector_db_id, documents, write_batch_size)

        average_write_time_ms = sum(write_times) / len(write_times) * 1000.
        cprint(f"Average write time for {provider_id} is {average_write_time_ms:.2f} milliseconds for {num_writes} runs", "blue")

        cprint(f"Benchmarking reads for provider: {provider_id}", "yellow")
        read_times = benchmark_read(client, provider_id, vector_db_id, user_prompts)

        average_read_time_ms = sum(read_times) / len(read_times) * 1000.
        cprint(f"Average read time for {provider_id} is {average_read_time_ms:.2f} milliseconds for {num_reads} runs", "blue")

        client.vector_dbs.unregister(vector_db_id=vector_db_id)
        output[provider_id]['write_times'] = write_times
        output[provider_id]['read_times'] = read_times
    # Generate plots and summary
    plot_results(output, write_batch_size)


if __name__ == "__main__":
    cProfile.run('main()', 'profile_output.prof')
```
</details>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-03-28 17:41:33 +01:00
Ihar Hrachyshka
18bac27d4e
fix: Use CONDA_DEFAULT_ENV presence as a flag to use conda mode (#1555)
# What does this PR do?

This is the second attempt to switch to system packages by default. Now
with a hack to detect conda environment - in which case conda image-type
is used.

Note: Conda will only be used when --image-name is unset *and*
CONDA_DEFAULT_ENV is set. This means that users without conda will
correctly fall back to using system packages when no --image-* arguments
are passed at all.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Uses virtualenv:

```
$ llama stack build --template ollama --image-type venv
$ llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using virtual environment: /home/ec2-user/src/llama-stack/schedule/.local
[...]
```

Uses system packages (virtualenv already initialized):

```
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
INFO     2025-03-27 20:46:22,882 llama_stack.cli.stack.run:142 server: No image type or image name provided. Assuming environment packages.
[...]
```

Attempt to run from environment packages without necessary packages
installed:
```
$ python -m venv barebones
$ . ./barebones/bin/activate
$ pip install -e . # to install llama command
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
ModuleNotFoundError: No module named 'fastapi'
```

^ failed as expected because the environment doesn't have necessary
packages installed.

Now install some packages in the new environment:

```
$ pip install fastapi opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp aiosqlite ollama openai datasets faiss-cpu mcp autoevals
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```

Now see if setting CONDA_DEFAULT_ENV will change what happens by
default:

```
$ export CONDA_DEFAULT_ENV=base
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using conda environment: base
Conda environment base does not exist.
[...]
```

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-27 17:13:22 -04:00
Xi Yan
b5c27f77ad
chore: clean up distro doc (#1804)
# What does this PR do?
- hide distro doc (docker needs to be thoroughly tested). 

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- docs

[//]: # (## Documentation)
2025-03-27 12:12:14 -07:00
Dmitry Rogozhkin
935e706b15
docs: fix remote-vllm instructions (#1805)
# What does this PR do?

* Fix location of `run.yaml` relative to the cloned llama stack
repository
* Drop `-it` from `docker run` commands as its not needed running
services

## Test Plan

* Verified running the llama stack following updated instruction

CC: @ashwinb

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-03-27 10:19:51 -04:00
Rashmi Pawar
1a73f8305b
feat: Add nemo customizer (#1448)
# What does this PR do?

This PR adds support for NVIDIA's NeMo Customizer API to the Llama Stack
post-training module. The integration enables users to fine-tune models
using NVIDIA's cloud-based customization service through a consistent
Llama Stack interface.


[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Yet to be done

Things pending under this PR:

- [x] Integration of fine-tuned model(new checkpoint) for inference with
nvidia llm distribution
- [x] distribution integration of API
- [x] Add test cases for customizer(In Progress)
- [x] Documentation

```

LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/post_training/test_supervised_fine_tuning.py 

============================================================================================================================================================================ test session starts =============================================================================================================================================================================
platform linux -- Python 3.10.0, pytest-8.3.4, pluggy-1.5.0 -- /home/ubuntu/llama-stack/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.0', 'Platform': 'Linux-6.8.0-1021-gcp-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'nbval': '0.11.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'html': '4.1.1', 'asyncio': '0.25.3'}}
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: nbval-0.11.0, metadata-3.1.1, anyio-4.8.0, html-4.1.1, asyncio-0.25.3
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items                                                                                                                                                                                                                                                                                                                                                            

tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_post_training_provider_registration[txt=8B] PASSED                                                                                                                                                                                                                                                 [ 50%]
tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_list_training_jobs[txt=8B] PASSED                                                                                                                                                                                                                                                                  [100%]

======================================================================================================================================================================== 2 passed, 1 warning in 0.10s ========================================================================================================================================================================
```
cc: @mattf @dglogo @sumitb

---------

Co-authored-by: Ubuntu <ubuntu@llama-stack-customizer-dev-inst-2tx95fyisatvlic4we8hidx5tfj.us-central1-a.c.brevdevprod.internal>
2025-03-25 11:01:10 -07:00
Yuan Tang
9ff82036f7
docs: Simplify vLLM deployment in K8s deployment guide (#1655)
# What does this PR do?

* Removes the use of `huggingface-cli` 
* Simplifies HF cache mount path
* Simplifies vLLM server startup command
* Separates PVC/secret creation from deployment/service
* Fixes a typo: "pod" should be "deployment"

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-24 09:08:50 -07:00
Mark Campbell
711cfa00fc
docs: fix typos in evaluation concepts (#1745)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Typo fix for `output_dir` flag and misspelling of aggregate 
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
N/A
[//]: # (## Documentation)
2025-03-21 12:00:53 -07:00
Hardik Shah
127bac6869
fix: Default to port 8321 everywhere (#1734)
As titled, moved all instances of 5001 to 8321
2025-03-20 15:50:41 -07:00
Hardik Shah
581e8ae562
fix: docker run with --pull always to fetch the latest image (#1733)
As titled
2025-03-20 15:35:48 -07:00
Yuan Tang
f5a5c5d459
docs: Add instruction on enabling tool calling for remote vLLM (#1719)
# What does this PR do?

This PR adds a link to tool calling instructions in vLLM. Users have
asked about this many times, e.g.
https://github.com/meta-llama/llama-stack/issues/1648#issuecomment-2740642077

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-20 15:18:17 -07:00
ehhuang
ea6a4a14ce
feat(api): simplify client imports (#1687)
# What does this PR do?
closes #1554 

## Test Plan
test_agents.py
2025-03-20 10:15:49 -07:00
ehhuang
b6b103a20d
docs: update for mcp tools (#1705)
# What does this PR do?


## Test Plan
read
2025-03-19 15:45:53 -07:00
Yuan Tang
7c0448456e
docs: Remove mentions of focus on Llama models (#1690)
# What does this PR do?

This is a follow-up of
https://github.com/meta-llama/llama-stack/issues/965 to avoid mentioning
exclusive support on Llama models.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-19 00:17:22 -04:00
Daniele Martinoli
cca9bd6cc3
feat: Qdrant inline provider (#1273)
# What does this PR do?
Removed local execution option from the remote Qdrant provider and
introduced an explicit inline provider for the embedded execution.
Updated the ollama template to include this option: this part can be
reverted in case we don't want to have two default `vector_io`
providers.

(Closes #1082)

## Test Plan
Build and run an ollama distro:
```bash
llama stack build --template ollama --image-type conda
llama stack run --image-type conda ollama
```

Run one of the sample ingestionapplicatinos like
[rag_with_vector_db.py](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py),
but replace this line:
```py
    selected_vector_provider = vector_providers[0]
```
with the following, to use the `qdrant` provider:
```py
    selected_vector_provider = vector_providers[1]
```

After running the test code, verify the timestamp of the Qdrant store:
```bash
% ls -ltr ~/.llama/distributions/ollama/qdrant.db/collection/test_vector_db_*
total 784
-rw-r--r--@ 1 dmartino  staff  401408 Feb 26 10:07 storage.sqlite
```

[//]: # (## Documentation)

---------

Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
Co-authored-by: Francisco Arceo <farceo@redhat.com>
2025-03-18 14:04:21 -07:00
Jamie Land
f4dc290705
feat: Created Playground Containerfile and Image Workflow (#1256)
# What does this PR do?
Adds a container file that can be used to build the playground UI.

This file will be built by this PR in the stack-ops repo:
https://github.com/meta-llama/llama-stack-ops/pull/9

Docker command in the docs will need to change once I know the address
of the official repository.

## Test Plan

Tested image on my local Openshift Instance using this helm chart:
https://github.com/Jaland/llama-stack-helm/tree/main/llama-stack

[//]: # (## Documentation)

---------

Co-authored-by: Jamie Land <hokie10@gmail.com>
2025-03-18 09:26:49 -07:00
Nathan Weinberg
1261bc93bf
docs: fixed broken tip in distro build docs (#1673)
# What does this PR do?
fixed broken tip in distro build docs

## Test Plan
Local docs build

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-03-17 17:22:26 -07:00
Xi Yan
5287b437ae
feat(api): (1/n) datasets api clean up (#1573)
## PR Stack
- https://github.com/meta-llama/llama-stack/pull/1573
- https://github.com/meta-llama/llama-stack/pull/1625
- https://github.com/meta-llama/llama-stack/pull/1656
- https://github.com/meta-llama/llama-stack/pull/1657
- https://github.com/meta-llama/llama-stack/pull/1658
- https://github.com/meta-llama/llama-stack/pull/1659
- https://github.com/meta-llama/llama-stack/pull/1660

**Client SDK**
- https://github.com/meta-llama/llama-stack-client-python/pull/203

**CI**
- 1391130488
<img width="1042" alt="image"
src="https://github.com/user-attachments/assets/69636067-376d-436b-9204-896e2dd490ca"
/>
-- the test_rag_agent_with_attachments is flaky and not related to this
PR

## Doc
<img width="789" alt="image"
src="https://github.com/user-attachments/assets/b88390f3-73d6-4483-b09a-a192064e32d9"
/>


## Client Usage
```python
client.datasets.register(
    source={
        "type": "uri",
        "uri": "lsfs://mydata.jsonl",
    },
    schema="jsonl_messages",
    # optional 
    dataset_id="my_first_train_data"
)

# quick prototype debugging
client.datasets.register(
    data_reference={
        "type": "rows",
        "rows": [
                "messages": [...],
        ],
    },
    schema="jsonl_messages",
)
```

## Test Plan
- CI:
1387805545

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/datasets/test_datasets.py
```

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py
```

```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
2025-03-17 16:55:45 -07:00
Ihar Hrachyshka
77ca09467f
chore: consolidate scripts under ./scripts directory (#1646) 2025-03-17 17:56:30 -04:00
cdgamarose-nv
252a487085
feat: added nvidia as safety provider (#1248)
# What does this PR do?
Adds nvidia as a safety provider by interfacing with the nemo guardrails
microservice.
This enables checking user’s input or the LLM’s output against input and
output guardrails by using the `/v1/guardrails/checks` endpoint of the[
guardrails
API.](https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/guides/checks-guide.html)

## Test Plan
Deploy nemo guardrails service following the documentation:
https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/getting-started/deploy-docker.html

### Standalone:
```bash
(venv) local-cdgamarose@a1u1g-rome-0153:~/llama-stack$ pytest -v -s llama_stack/providers/tests/safety/test_safety.py --providers inference=nvidia,safety=nvidia --safety-shield meta/llama-3.1-8b-instruct

=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /localhome/local-cdgamarose/llama-stack/venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.10.12', 'Platform': 'Linux-5.15.0-122-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'html': '4.1.1'}}
rootdir: /localhome/local-cdgamarose/llama-stack
configfile: pyproject.toml
plugins: metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, html-4.1.1
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items

llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_shield_list[--inference=nvidia:safety=nvidia] Initializing NVIDIASafetyAdapter(http://0.0.0.0:7331)...
PASSED
llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_run_shield[--inference=nvidia:safety=nvidia] PASSED

============================================================================== 2 passed, 2 warnings in 4.78s ==============================================================================

```
### Distribution:
```
llama stack run llama_stack/templates/nvidia/run-with-safety.yaml
curl -v -X 'POST' "http://localhost:8321/v1/safety/run-shield" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"shield_id": "meta/llama-3.1-8b-instruct", "messages":[{"role": "user", "content": "you are stupid"}]}'
{"violation":{"violation_level":"error","user_message":"Sorry I cannot do this.","metadata":{"self check input":{"status":"blocked"}}}}
```

[//]: # (## Documentation)

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-17 14:39:23 -07:00
Kelly Brown
ac51564ad5
docs: Fixing outputs in client cli and formatting suggestions (#1668)
**Description:** Updates the client example output as well as add a
suggested formatting for some of the required and optional cli flags.
If the re-formatting is unnecessary, I can remove it from this PR and
just have this fix the example output
2025-03-17 14:31:09 -07:00
Kelly Brown
60ae7455f6
docs: Fix trailing whitespace error (#1669)
Description: Fixes the trailing whitespace error thats coming up on main
2025-03-17 08:53:30 -07:00
Chirag Modi
b56b06037c
Web updates to point to latest releases for Mobile SDK (#1650)
# What does this PR do?
Web updates to point to latest releases for Mobile SDK

- point to `latest-release` branch for mobile sdk repos to minimize the
number of change points on the site.
- updates to some instructions
2025-03-14 17:06:07 -07:00
Nathan Weinberg
d2dda4af64
docs: add additional guidance around using virtualenv (#1642)
# What does this PR do?
current docs are very tailored to `conda`

also adds guidance around running code examples within virtual
environment for both `conda` and `virtualenv`

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-03-14 16:00:55 -07:00
Yuan Tang
b906bad238
docs: Add OpenAI, Anthropic, Gemini to inference API providers table (#1622)
# What does this PR do?

Forgot to update this page as well as part of
https://github.com/meta-llama/llama-stack/pull/1617.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-13 15:28:52 -07:00
Xi Yan
9617468d13
fix: passthrough provider template + fix (#1612)
# What does this PR do?

- Fix issue w/ passthrough provider


[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
llama stack run

[//]: # (## Documentation)
2025-03-13 09:44:26 -07:00
Dinesh Yeduguru
85501ed875
fix: remove Llama-3.2-1B-Instruct for fireworks (#1558)
# What does this PR do?
remove Llama-3.2-1B-Instruct for fireworks as its no longer appears to
be hosted on website.


## Test Plan

python distro_codegen.py
2025-03-11 11:19:29 -07:00