llama-stack

forked from phoenix-oss/llama-stack-mirror

Author	SHA1	Message	Date
Nathan Weinberg	854c2ad264	fix: misleading help text for 'llama stack build' and 'llama stack run' (#1910 ) # What does this PR do? current text for 'llama stack build' and 'llama stack run' says that if no argument is passed to '--image-name' that the active Conda environment will be used in reality, the active enviroment is used whether it is from conda, virtualenv, etc. ## Test Plan N/A ## Documentation N/A Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-04-12 01:19:11 -07:00
Aidan Reilly	51492bd9b6	docs: Update docs and fix warning in start-stack.sh (#1937 ) Small docs update and an update for `start-stack.sh` with missing color and if statment logic. # What does this PR do? 1. Makes a small change to start-stack.sh to resolve this error: ```cmd /home/aireilly/.local/lib/python3.13/site-packages/llama_stack/distribution/start_stack.sh: line 76: [: missing ]' ``` 2. Adds a missing $GREEN colour to start-stack.sh 3. Updated `docs/source/getting_started/detailed_tutorial.md` with some small changes and corrections. ## Test Plan Procedures described in `docs/source/getting_started/detailed_tutorial.md` were verified on Linux Fedora 41.	2025-04-11 16:26:17 -07:00
raghotham	ed58a94b30	docs: fixes to quick start (#1943 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-04-11 13:41:23 -07:00
Francisco Arceo	24d70cedca	docs: Updated docs to show minimal RAG example and some other minor changes (#1935 ) # What does this PR do? Incorporating some feedback into the docs. - `docs/source/getting_started/index.md`: - Demo actually does RAG now - Simplified the installation command for dependencies. - Updated demo script examples to align with the latest API changes. - Replaced manual document manipulation with `RAGDocument` for clarity and maintainability. - Introduced new logic for model and embedding selection using the Llama Stack Client SDK. - Enhanced examples to showcase proper agent initialization and logging. - `docs/source/getting_started/detailed_tutorial.md`: - Updated the section for listing models to include proper code formatting with `bash`. - Removed and reorganized the "Run the Demos" section for clarity. - Adjusted tab-item structures and added new instructions for demo scripts. - `docs/_static/css/my_theme.css`: - Updated heading styles to include `h2`, `h3`, and `h4` for consistent font weight. - Added a new style for `pre` tags to wrap text and break long words, this is particularly useful for rendering long output from generation. ## Test Plan Tested locally. Screenshot for reference: <img width="1250" alt="Screenshot 2025-04-10 at 10 12 12 PM" src="https://github.com/user-attachments/assets/ce1c8986-e072-4c6f-a697-ed0d8fb75b34" /> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-11 11:50:36 -07:00
Mark Campbell	6aa459b00c	docs: fix errors in kubernetes deployment guide (#1914 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Fixes a couple of errors in PVC/Secret setup and adds context for expected Hugging Face token [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-04-11 13:04:13 +02:00
Francisco Arceo	49955a06b1	docs: Update quickstart page to structure things a little more for the novices (#1873 ) # What does this PR do? Another doc enhancement for https://github.com/meta-llama/llama-stack/issues/1818 Summary of changes: - `docs/source/distributions/configuration.md` - Updated dropdown title to include a more user-friendly description. - `docs/_static/css/my_theme.css` - Added styling for `<h3>` elements to set a normal font weight. - `docs/source/distributions/starting_llama_stack_server.md` - Changed section headers from bold text to proper markdown headers (e.g., `##`). - Improved descriptions for starting Llama Stack server using different methods (library, container, conda, Kubernetes). - Enhanced clarity and structure by converting instructions into markdown headers and improved formatting. - `docs/source/getting_started/index.md` - Major restructuring of the "Quick Start" guide: - Added new introductory section for Llama Stack and its capabilities. - Reorganized steps into clearer subsections with proper markdown headers. - Replaced dropdowns with tabbed content for OS-specific instructions. - Added detailed steps for setting up and running the Llama Stack server and client. - Introduced new sections for running basic inference and building agents. - Enhanced readability and visual structure with emojis, admonitions, and examples. - `docs/source/providers/index.md` - Updated the list of LLM inference providers to include "Ollama." - Expanded the list of vector databases to include "SQLite-Vec." Let me know if you need further details! ## Test Plan Renders locally, included screenshot. # Documentation For https://github.com/meta-llama/llama-stack/issues/1818 <img width="1332" alt="Screenshot 2025-04-09 at 11 07 12 AM" src="https://github.com/user-attachments/assets/c106efb9-076c-4059-a4e0-a30fa738585b" /> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-10 14:09:00 -07:00
Sébastien Han	1f2df59ece	docs: fix model name (#1926 ) # What does this PR do? Use llama3.2:3b for consistency. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-10 09:37:48 -07:00
Yuan Tang	1be66d754e	docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923 ) # What does this PR do? vLLM website just added a [new index page for installing for different hardware accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html). This PR adds a link to that page with additional edits to make sure readers are aware that the use of GPUs on this page are for demonstration purposes only. This closes https://github.com/meta-llama/llama-stack/issues/1813. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-10 10:04:17 +02:00
Yuan Tang	712c6758c6	docs: Avoid bash script syntax highlighting for dark mode (#1918 ) See https://github.com/meta-llama/llama-stack/pull/1913#issuecomment-2790153778 Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-04-09 15:43:43 -07:00
Sébastien Han	770b38f8b5	chore: simplify running the demo UI (#1907 ) # What does this PR do? * Manage UI deps in pyproject * Use a new "ui" dep group to pull the deps with "uv" * Simplify the run command * Bump versions in requirements.txt Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-09 11:22:29 -07:00
Francisco Arceo	b93318e40b	chore: Detect browser setting for dark/light mode and set default to light mode (#1913 ) # What does this PR do? 1. Adding some lightweight JS to detect the default browser setting for dark/light mode 3. Setting default screen setting to light mode as to not change default behavior. From the docs: https://github.com/MrDogeBro/sphinx_rtd_dark_mode >This lets you choose which theme the user sees when they load the docs for the first time ever. After the first time however, this setting has no effect as the users preference is stored in local storage within their browser. This option accepts a boolean for the value. If this option is true (the default option), users will start in dark mode when first visiting the site. If this option is false, users will start in light mode when they first visit the site. # Closes #1915 ## Test Plan Tested locally on my Mac on Safari and Chrome. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-09 12:40:56 -04:00
Matthew Farrellee	a2cf299906	fix: update getting started guide to use `ollama pull` (#1855 ) # What does this PR do? download the getting started w/ ollama model instead of downloading and running it. directly running it was necessary before https://github.com/meta-llama/llama-stack/pull/1854 ## Test Plan run the code on the page	2025-04-09 10:35:19 +02:00
Sébastien Han	389767010b	feat: ability to execute external providers (#1672 ) # What does this PR do? Providers that live outside of the llama-stack codebase are now supported. A new property `external_providers_dir` has been added to the main config and can be configured as follow: ``` external_providers_dir: /etc/llama-stack/providers.d/ ``` Where the expected structure is: ``` providers.d/ inference/ custom_ollama.yaml vllm.yaml vector_io/ qdrant.yaml ``` Where `custom_ollama.yaml` is: ``` adapter: adapter_type: custom_ollama pip_packages: ["ollama", "aiohttp"] config_class: llama_stack_ollama_provider.config.OllamaImplConfig module: llama_stack_ollama_provider api_dependencies: [] optional_api_dependencies: [] ``` Obviously the package must be installed on the system, here is the `llama_stack_ollama_provider` example: ``` $ uv pip show llama-stack-ollama-provider Using Python 3.10.16 environment at: /Users/leseb/Documents/AI/llama-stack/.venv Name: llama-stack-ollama-provider Version: 0.1.0 Location: /Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.10/site-packages Editable project location: /private/var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.ZBHU5Ezxg4/ollama/llama-stack-ollama-provider Requires: Required-by: ``` Closes: https://github.com/meta-llama/llama-stack/issues/658 Signed-off-by: Sébastien Han <seb@redhat.com>	2025-04-09 10:30:41 +02:00
AlexHe99	983f6feeb8	docs: Update remote-vllm.md with AMD GPU vLLM server supported. (#1858 ) Add the content to use AMD GPU as the vLLM server. Split the original part to two sub chapters, 1. AMD vLLM server 2. NVIDIA vLLM server (orignal) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: Alex He <alehe@amd.com>	2025-04-08 21:35:32 -07:00
ehhuang	7b4eb0967e	test: verification on provider's OAI endpoints (#1893 ) # What does this PR do? ## Test Plan export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic; LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference --vision-model $MODEL --text-model $MODEL	2025-04-07 23:06:28 -07:00
Matthew Farrellee	c52ccc4bbd	docs: update importing_as_library.md (#1863 ) LlamaStackAsLibraryClient.initialize is not async, cannot be await'd	2025-04-07 12:31:04 +02:00
ehhuang	378f0de439	docs: llama4 getting started nb (#1878 ) # What does this PR do? ## Test Plan	2025-04-06 18:51:34 -07:00
raghotham	fd7ab37c14	docs: fixing sphinx imports (#1884 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-04-05 14:21:45 -07:00
Ashwin Bharambe	b8f1561956	feat: introduce llama4 support (#1877 ) As title says. Details in README, elsewhere.	2025-04-05 11:53:35 -07:00
Francisco Arceo	23a99a4b22	docs: Minor updates to docs to make them a little friendlier to new users (#1871 ) # What does this PR do? This PR modifies some of the docs to help them map to (1) the mental model of software engineers building AI models starting with RAG and then moving to Agents and (2) aligning the navbar somewhat closer to the diagram on the home page. ## Test Plan N/A Tested locally. # Documentation Take a look at the screen shot for below and after. ## Before ![Screenshot 2025-04-03 at 10 39 32 PM](https://github.com/user-attachments/assets/c4dc9998-3e46-43b0-8425-892c94ec3a6a) ## After ![Screenshot 2025-04-03 at 10 38 37 PM](https://github.com/user-attachments/assets/05670fcd-e56b-42dd-8af2-07b81f941d40) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-04 08:10:35 -04:00
Francisco Arceo	19f504e9e2	docs: Updating docs to source from CONTRIBUTING.md (#1850 ) # What does this PR do? Another for https://github.com/meta-llama/llama-stack/issues/1815 This links the `CONTRIBUTING.md` file directly so that we don't have to maintain two different files. Also I updated the title for RAG under Building AI Applications. ## Changes Look of what the Contributing page looks like, proof it sources directly from the markdown file. ![Screenshot 2025-04-01 at 12 43 51 AM](https://github.com/user-attachments/assets/f7021d29-eec3-44ad-a5b3-55c4480ea9ac) --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-04-01 14:50:04 +02:00
Francisco Arceo	d495922949	docs: Updated documentation and Sphinx configuration (#1845 ) # What does this PR do? The goal of this PR is to make the pages easier to navigate by surfacing the child pages on the navbar, updating some of the copy, moving some of the files around. Some changes: 1. Clarifying Titles 2. Restructuring "Distributions" more formally in its own page to be consistent with Providers and adding some clarity to the child pages to surface them and make them easier to navigate 3. Updated sphinx config to not collapse navigation by default 4. Updated copyright year to be calculated dynamically 5. Moved `docs/source/distributions/index.md` -> `docs/source/distributions/starting_llama_stack_server.md` Another for https://github.com/meta-llama/llama-stack/issues/1815 ## Test Plan Tested locally and pages build (screen shots for example). ## Documentation ### Before: ![Screenshot 2025-03-31 at 1 09 21 PM](https://github.com/user-attachments/assets/98e34f76-f0d9-4055-8e2c-441b1e7d8f6a) ### After: ![Screenshot 2025-03-31 at 1 08 52 PM](https://github.com/user-attachments/assets/dfb6b8ad-3a1d-46b6-8f54-0c553664093f) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-31 13:08:05 -07:00
Francisco Arceo	9b478f3756	docs: Adding darkmode to documentation (#1843 ) # What does this PR do? docs: Adding darkmode to documentation ## Test Plan Tested locally. Here's the look: ![Screenshot 2025-03-31 at 9 43 05 AM](https://github.com/user-attachments/assets/5989dbc8-ba03-4710-ad8d-6d4b9ac79786) ## Issues Related to https://github.com/meta-llama/llama-stack/issues/1815 Closes https://github.com/meta-llama/llama-stack/issues/1844 Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-31 08:31:53 -07:00
Anamika	d8a8a734b5	fix: update sink name for traces and metrics in LlamaStack 0.1.8 (#1836 ) # What does this PR do? This PR updates the sink name configuration for traces and metrics in LlamaStack to align with the latest changes introduced in version 0.1.8. Previously, when using the `otel` sink along with other sinks (like `console` and `sqlite`), the system threw a ValueError, with the message: ```shell Value error, 'otel' is not a valid TelemetrySink [type=value_error, input_value='console,otel,sqlite', input_type=str] For further information visit https://errors.pydantic.dev/2.10/v/value_error ``` ## Test Plan - Test 1: Ran the LlamaStack server with a configuration containing `console,otel,sqlite` as sinks. - Expected result: No errors related to invalid sink names. - Result: The system ran without throwing a `ValueError`. - Test 2: Verified that the `otel_trace`, `otel_metric` sink now works in combination with other sinks (`console`, `sqlite`). - Expected result: Telemetry data is correctly sent to all specified sinks without errors. - Result: All telemetry data was successfully sent to the specified sinks.	2025-03-29 10:09:08 -07:00
Francisco Arceo	37b6da37ba	docs: Document sqlite-vec faiss comparison (#1821 ) # What does this PR do? This PR documents and benchmarks the performance tradeoffs between sqlite-vec and FAISS inline VectorDB providers. # Closes https://github.com/meta-llama/llama-stack/issues/1165 ## Test Plan The test was run using this script: <details> <summary>CLICK TO SHOW SCRIPT 👋 </summary> ```python import cProfile import os import uuid import time import random import string import matplotlib.pyplot as plt import pandas as pd from termcolor import cprint from llama_stack_client.types import Document from llama_stack.distribution.library_client import LlamaStackAsLibraryClient from memory_profiler import profile from line_profiler import LineProfiler os.environ["INFERENCE_MODEL"] = "llama3.2:3b-instruct-fp16" os.environ["LLAMA_STACK_CONFIG"] = "ollama" def generate_random_chars(count=400): return ''.join(random.choices(string.ascii_letters, k=count)) def generate_documents(num_docs: int, num_chars: int): documents = [ Document( document_id=f"doc-{i}", content=f"Document content for document {i} - {generate_random_chars(count=num_chars)}", mime_type="text/plain", metadata={}, ) for i in range(num_docs) ] return documents @profile def benchmark_write(client, vector_db_id, documents, batch_size=100): write_times = [] for i in range(0, len(documents), batch_size): batch = documents[i:i + batch_size] start_time = time.time() client.tool_runtime.rag_tool.insert( documents=batch, vector_db_id=vector_db_id, chunk_size_in_tokens=512, ) end_time = time.time() write_times.append(end_time - start_time) return write_times @profile def benchmark_read(client, provider_id, vector_db_id, user_prompts): response_times = [] for prompt in user_prompts: start_time = time.time() response = client.vector_io.query( vector_db_id=vector_db_id, query=prompt, ) end_time = time.time() response_times.append(end_time - start_time) return response_times def profile_functions(): profiler = LineProfiler() profiler.add_function(benchmark_write) profiler.add_function(benchmark_read) return profiler def plot_results(output, batch_size): # Create a DataFrame for easy manipulation df_sqlite = pd.DataFrame(output['sqlite-vec']) df_faiss = pd.DataFrame(output['faiss']) df_sqlite['write_times'] = 1000 df_faiss['write_times'] = 1000 avg_write_sqlite = df_sqlite['write_times'].mean() avg_write_faiss = df_faiss['write_times'].mean() avg_read_sqlite = df_sqlite['read_times'].mean() avg_read_faiss = df_faiss['read_times'].mean() plt.figure(figsize=(12, 6)) plt.hist(df_sqlite['write_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Write Times') plt.hist(df_faiss['write_times'], bins=10, alpha=0.5, color='red', label='faiss Write Times') plt.axvline(avg_write_sqlite, color='blue', linestyle='--', label=f'Average Write Time (sqlite-vec): {avg_write_sqlite:.3f} ms') plt.axvline(avg_write_faiss, color='red', linestyle='--', label=f'Average Write Time (faiss): {avg_write_faiss:.3f} ms') plt.title(f'Histogram of Write Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]} with batch size = {batch_size}') plt.xlabel('Time (milliseconds)') plt.ylabel('Density') plt.legend() plt.savefig('write_time_comparison.png') plt.close() plt.figure(figsize=(12, 6)) plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times') plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times') plt.axvline(avg_read_sqlite, color='blue', linestyle='--', label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms') plt.axvline(avg_read_faiss, color='red', linestyle='--', label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms') plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}') plt.xlabel('Time (milliseconds)') plt.ylabel('Density') plt.legend() plt.savefig('read_time_comparison.png') plt.close() plt.figure(figsize=(12, 6)) plt.hist(df_sqlite['read_times'], bins=10, alpha=0.5, color='blue', label='sqlite-vec Read Times') plt.hist(df_faiss['read_times'], bins=10, alpha=0.5, color='red', label='faiss Read Times') plt.axvline(avg_read_sqlite, color='blue', linestyle='--', label=f'Average Read Time (sqlite-vec): {avg_read_sqlite:.3f} ms') plt.axvline(avg_read_faiss, color='red', linestyle='--', label=f'Average Read Time (faiss): {avg_read_faiss:.3f} ms') plt.title(f'Histogram of Read Times for sqlite-vec and faiss\nn = {df_faiss.shape[0]}') plt.xlabel('Time (milliseconds)') plt.ylabel('Density') plt.legend() plt.savefig('read_time_comparison.png') plt.close() plt.figure(figsize=(12, 6)) plt.plot(df_sqlite.index, df_sqlite['write_times'], marker='o', markersize=4, linestyle='-', color='blue', label='sqlite-vec Write Times') plt.plot(df_faiss.index, df_faiss['write_times'], marker='x', markersize=4, linestyle='-', color='red', label='faiss Write Times') plt.title(f'Write Times by Operation Sequence\n(batch size = {batch_size})') plt.xlabel('Write Operation Sequence') plt.ylabel('Time (milliseconds)') plt.legend() plt.grid(True, linestyle='--', alpha=0.7) plt.tight_layout() plt.savefig('write_time_sequence.png') plt.close() # Print out the summary table print("\nPerformance Summary for sqlite-vec:") print(df_sqlite) # Print out the summary table print("\nPerformance Summary for faiss:") print(df_faiss) def main(): # Initialize the client client = LlamaStackAsLibraryClient("ollama") vector_db_id = f"test-vector-db-{uuid.uuid4().hex}" _ = client.initialize() # Generate a large dataset num_chars = 50 num_docs = 100 num_writes = 100 write_batch_size = 100 num_reads = 100 documents = generate_documents(num_docs * write_batch_size, num_chars) user_prompts = [ f"Tell me about document {i}" for i in range(1, num_reads + 1) ] providers = ["sqlite-vec", "faiss"] output = { provider_id: {"write_times": None, "read_times": None} for provider_id in providers } # Benchmark writes and reads for SQLite and Faiss for provider_id in providers: cprint(f"Benchmarking provider: {provider_id}", "yellow") client.vector_dbs.register( provider_id=provider_id, vector_db_id=vector_db_id, embedding_model="all-MiniLM-L6-v2", embedding_dimension=384, ) write_times = benchmark_write(client, vector_db_id, documents, write_batch_size) average_write_time_ms = sum(write_times) / len(write_times) * 1000. cprint(f"Average write time for {provider_id} is {average_write_time_ms:.2f} milliseconds for {num_writes} runs", "blue") cprint(f"Benchmarking reads for provider: {provider_id}", "yellow") read_times = benchmark_read(client, provider_id, vector_db_id, user_prompts) average_read_time_ms = sum(read_times) / len(read_times) * 1000. cprint(f"Average read time for {provider_id} is {average_read_time_ms:.2f} milliseconds for {num_reads} runs", "blue") client.vector_dbs.unregister(vector_db_id=vector_db_id) output[provider_id]['write_times'] = write_times output[provider_id]['read_times'] = read_times # Generate plots and summary plot_results(output, write_batch_size) if __name__ == "__main__": cProfile.run('main()', 'profile_output.prof') ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-03-28 17:41:33 +01:00
Ihar Hrachyshka	18bac27d4e	fix: Use CONDA_DEFAULT_ENV presence as a flag to use conda mode (#1555 ) # What does this PR do? This is the second attempt to switch to system packages by default. Now with a hack to detect conda environment - in which case conda image-type is used. Note: Conda will only be used when --image-name is unset and CONDA_DEFAULT_ENV is set. This means that users without conda will correctly fall back to using system packages when no --image-* arguments are passed at all. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Uses virtualenv: ``` $ llama stack build --template ollama --image-type venv $ llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml [...] Using virtual environment: /home/ec2-user/src/llama-stack/schedule/.local [...] ``` Uses system packages (virtualenv already initialized): ``` $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] INFO 2025-03-27 20:46:22,882 llama_stack.cli.stack.run:142 server: No image type or image name provided. Assuming environment packages. [...] ``` Attempt to run from environment packages without necessary packages installed: ``` $ python -m venv barebones $ . ./barebones/bin/activate $ pip install -e . # to install llama command $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] ModuleNotFoundError: No module named 'fastapi' ``` ^ failed as expected because the environment doesn't have necessary packages installed. Now install some packages in the new environment: ``` $ pip install fastapi opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp aiosqlite ollama openai datasets faiss-cpu mcp autoevals $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` Now see if setting CONDA_DEFAULT_ENV will change what happens by default: ``` $ export CONDA_DEFAULT_ENV=base $ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml [...] Using conda environment: base Conda environment base does not exist. [...] ``` --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-27 17:13:22 -04:00
Xi Yan	b5c27f77ad	chore: clean up distro doc (#1804 ) # What does this PR do? - hide distro doc (docker needs to be thoroughly tested). [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - docs [//]: # (## Documentation)	2025-03-27 12:12:14 -07:00
Dmitry Rogozhkin	935e706b15	docs: fix remote-vllm instructions (#1805 ) # What does this PR do? * Fix location of `run.yaml` relative to the cloned llama stack repository * Drop `-it` from `docker run` commands as its not needed running services ## Test Plan * Verified running the llama stack following updated instruction CC: @ashwinb Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-03-27 10:19:51 -04:00
Rashmi Pawar	1a73f8305b	feat: Add nemo customizer (#1448 ) # What does this PR do? This PR adds support for NVIDIA's NeMo Customizer API to the Llama Stack post-training module. The integration enables users to fine-tune models using NVIDIA's cloud-based customization service through a consistent Llama Stack interface. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Yet to be done Things pending under this PR: - [x] Integration of fine-tuned model(new checkpoint) for inference with nvidia llm distribution - [x] distribution integration of API - [x] Add test cases for customizer(In Progress) - [x] Documentation ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/post_training/test_supervised_fine_tuning.py ============================================================================================================================================================================ test session starts ============================================================================================================================================================================= platform linux -- Python 3.10.0, pytest-8.3.4, pluggy-1.5.0 -- /home/ubuntu/llama-stack/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.0', 'Platform': 'Linux-6.8.0-1021-gcp-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'nbval': '0.11.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'html': '4.1.1', 'asyncio': '0.25.3'}} rootdir: /home/ubuntu/llama-stack configfile: pyproject.toml plugins: nbval-0.11.0, metadata-3.1.1, anyio-4.8.0, html-4.1.1, asyncio-0.25.3 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 2 items tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_post_training_provider_registration[txt=8B] PASSED [ 50%] tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_list_training_jobs[txt=8B] PASSED [100%] ======================================================================================================================================================================== 2 passed, 1 warning in 0.10s ======================================================================================================================================================================== ``` cc: @mattf @dglogo @sumitb --------- Co-authored-by: Ubuntu <ubuntu@llama-stack-customizer-dev-inst-2tx95fyisatvlic4we8hidx5tfj.us-central1-a.c.brevdevprod.internal>	2025-03-25 11:01:10 -07:00
Yuan Tang	9ff82036f7	docs: Simplify vLLM deployment in K8s deployment guide (#1655 ) # What does this PR do? * Removes the use of `huggingface-cli` * Simplifies HF cache mount path * Simplifies vLLM server startup command * Separates PVC/secret creation from deployment/service * Fixes a typo: "pod" should be "deployment" Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-24 09:08:50 -07:00
Mark Campbell	711cfa00fc	docs: fix typos in evaluation concepts (#1745 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Typo fix for `output_dir` flag and misspelling of aggregate [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] N/A [//]: # (## Documentation)	2025-03-21 12:00:53 -07:00
Hardik Shah	127bac6869	fix: Default to port 8321 everywhere (#1734 ) As titled, moved all instances of 5001 to 8321	2025-03-20 15:50:41 -07:00
Hardik Shah	581e8ae562	fix: docker run with `--pull always` to fetch the latest image (#1733 ) As titled	2025-03-20 15:35:48 -07:00
Yuan Tang	f5a5c5d459	docs: Add instruction on enabling tool calling for remote vLLM (#1719 ) # What does this PR do? This PR adds a link to tool calling instructions in vLLM. Users have asked about this many times, e.g. https://github.com/meta-llama/llama-stack/issues/1648#issuecomment-2740642077 --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-20 15:18:17 -07:00
ehhuang	ea6a4a14ce	feat(api): simplify client imports (#1687 ) # What does this PR do? closes #1554 ## Test Plan test_agents.py	2025-03-20 10:15:49 -07:00
ehhuang	b6b103a20d	docs: update for mcp tools (#1705 ) # What does this PR do? ## Test Plan read	2025-03-19 15:45:53 -07:00
Yuan Tang	7c0448456e	docs: Remove mentions of focus on Llama models (#1690 ) # What does this PR do? This is a follow-up of https://github.com/meta-llama/llama-stack/issues/965 to avoid mentioning exclusive support on Llama models. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-19 00:17:22 -04:00
Daniele Martinoli	cca9bd6cc3	feat: Qdrant inline provider (#1273 ) # What does this PR do? Removed local execution option from the remote Qdrant provider and introduced an explicit inline provider for the embedded execution. Updated the ollama template to include this option: this part can be reverted in case we don't want to have two default `vector_io` providers. (Closes #1082) ## Test Plan Build and run an ollama distro: ```bash llama stack build --template ollama --image-type conda llama stack run --image-type conda ollama ``` Run one of the sample ingestionapplicatinos like [rag_with_vector_db.py](https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/rag_with_vector_db.py), but replace this line: ```py selected_vector_provider = vector_providers[0] ``` with the following, to use the `qdrant` provider: ```py selected_vector_provider = vector_providers[1] ``` After running the test code, verify the timestamp of the Qdrant store: ```bash % ls -ltr ~/.llama/distributions/ollama/qdrant.db/collection/test_vector_db_* total 784 -rw-r--r--@ 1 dmartino staff 401408 Feb 26 10:07 storage.sqlite ``` [//]: # (## Documentation) --------- Signed-off-by: Daniele Martinoli <dmartino@redhat.com> Co-authored-by: Francisco Arceo <farceo@redhat.com>	2025-03-18 14:04:21 -07:00
Jamie Land	f4dc290705	feat: Created Playground Containerfile and Image Workflow (#1256 ) # What does this PR do? Adds a container file that can be used to build the playground UI. This file will be built by this PR in the stack-ops repo: https://github.com/meta-llama/llama-stack-ops/pull/9 Docker command in the docs will need to change once I know the address of the official repository. ## Test Plan Tested image on my local Openshift Instance using this helm chart: https://github.com/Jaland/llama-stack-helm/tree/main/llama-stack [//]: # (## Documentation) --------- Co-authored-by: Jamie Land <hokie10@gmail.com>	2025-03-18 09:26:49 -07:00
Nathan Weinberg	1261bc93bf	docs: fixed broken tip in distro build docs (#1673 ) # What does this PR do? fixed broken tip in distro build docs ## Test Plan Local docs build Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-17 17:22:26 -07:00
Xi Yan	5287b437ae	feat(api): (1/n) datasets api clean up (#1573 ) ## PR Stack - https://github.com/meta-llama/llama-stack/pull/1573 - https://github.com/meta-llama/llama-stack/pull/1625 - https://github.com/meta-llama/llama-stack/pull/1656 - https://github.com/meta-llama/llama-stack/pull/1657 - https://github.com/meta-llama/llama-stack/pull/1658 - https://github.com/meta-llama/llama-stack/pull/1659 - https://github.com/meta-llama/llama-stack/pull/1660 Client SDK - https://github.com/meta-llama/llama-stack-client-python/pull/203 CI - `1391130488` <img width="1042" alt="image" src="https://github.com/user-attachments/assets/69636067-376d-436b-9204-896e2dd490ca" /> -- the test_rag_agent_with_attachments is flaky and not related to this PR ## Doc <img width="789" alt="image" src="https://github.com/user-attachments/assets/b88390f3-73d6-4483-b09a-a192064e32d9" /> ## Client Usage ```python client.datasets.register( source={ "type": "uri", "uri": "lsfs://mydata.jsonl", }, schema="jsonl_messages", # optional dataset_id="my_first_train_data" ) # quick prototype debugging client.datasets.register( data_reference={ "type": "rows", "rows": [ "messages": [...], ], }, schema="jsonl_messages", ) ``` ## Test Plan - CI: `1387805545` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/datasets/test_datasets.py ``` ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py ``` ``` pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ```	2025-03-17 16:55:45 -07:00
Ihar Hrachyshka	77ca09467f	chore: consolidate scripts under ./scripts directory (#1646 )	2025-03-17 17:56:30 -04:00
cdgamarose-nv	252a487085	feat: added nvidia as safety provider (#1248 ) # What does this PR do? Adds nvidia as a safety provider by interfacing with the nemo guardrails microservice. This enables checking user’s input or the LLM’s output against input and output guardrails by using the `/v1/guardrails/checks` endpoint of the[ guardrails API.](https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/guides/checks-guide.html) ## Test Plan Deploy nemo guardrails service following the documentation: https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/getting-started/deploy-docker.html ### Standalone: ```bash (venv) local-cdgamarose@a1u1g-rome-0153:~/llama-stack$ pytest -v -s llama_stack/providers/tests/safety/test_safety.py --providers inference=nvidia,safety=nvidia --safety-shield meta/llama-3.1-8b-instruct =================================================================================== test session starts =================================================================================== platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /localhome/local-cdgamarose/llama-stack/venv/bin/python3 cachedir: .pytest_cache metadata: {'Python': '3.10.12', 'Platform': 'Linux-5.15.0-122-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'html': '4.1.1'}} rootdir: /localhome/local-cdgamarose/llama-stack configfile: pyproject.toml plugins: metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, html-4.1.1 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 2 items llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_shield_list[--inference=nvidia:safety=nvidia] Initializing NVIDIASafetyAdapter(http://0.0.0.0:7331)... PASSED llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_run_shield[--inference=nvidia:safety=nvidia] PASSED ============================================================================== 2 passed, 2 warnings in 4.78s ============================================================================== ``` ### Distribution: ``` llama stack run llama_stack/templates/nvidia/run-with-safety.yaml curl -v -X 'POST' "http://localhost:8321/v1/safety/run-shield" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"shield_id": "meta/llama-3.1-8b-instruct", "messages":[{"role": "user", "content": "you are stupid"}]}' {"violation":{"violation_level":"error","user_message":"Sorry I cannot do this.","metadata":{"self check input":{"status":"blocked"}}}} ``` [//]: # (## Documentation) --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-03-17 14:39:23 -07:00
Kelly Brown	ac51564ad5	docs: Fixing outputs in client cli and formatting suggestions (#1668 ) Description: Updates the client example output as well as add a suggested formatting for some of the required and optional cli flags. If the re-formatting is unnecessary, I can remove it from this PR and just have this fix the example output	2025-03-17 14:31:09 -07:00
Kelly Brown	60ae7455f6	docs: Fix trailing whitespace error (#1669 ) Description: Fixes the trailing whitespace error thats coming up on main	2025-03-17 08:53:30 -07:00
Chirag Modi	b56b06037c	Web updates to point to latest releases for Mobile SDK (#1650 ) # What does this PR do? Web updates to point to latest releases for Mobile SDK - point to `latest-release` branch for mobile sdk repos to minimize the number of change points on the site. - updates to some instructions	2025-03-14 17:06:07 -07:00
Nathan Weinberg	d2dda4af64	docs: add additional guidance around using `virtualenv` (#1642 ) # What does this PR do? current docs are very tailored to `conda` also adds guidance around running code examples within virtual environment for both `conda` and `virtualenv` Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-03-14 16:00:55 -07:00
Yuan Tang	b906bad238	docs: Add OpenAI, Anthropic, Gemini to inference API providers table (#1622 ) # What does this PR do? Forgot to update this page as well as part of https://github.com/meta-llama/llama-stack/pull/1617. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-03-13 15:28:52 -07:00
Xi Yan	9617468d13	fix: passthrough provider template + fix (#1612 ) # What does this PR do? - Fix issue w/ passthrough provider [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan llama stack run [//]: # (## Documentation)	2025-03-13 09:44:26 -07:00
Dinesh Yeduguru	85501ed875	fix: remove Llama-3.2-1B-Instruct for fireworks (#1558 ) # What does this PR do? remove Llama-3.2-1B-Instruct for fireworks as its no longer appears to be hosted on website. ## Test Plan python distro_codegen.py	2025-03-11 11:19:29 -07:00

1 2 3 4 5 ...

334 commits