diff --git a/docs/source/distributions/remote_hosted_distro/nvidia.md b/docs/source/distributions/remote_hosted_distro/nvidia.md deleted file mode 100644 index 58731392d..000000000 --- a/docs/source/distributions/remote_hosted_distro/nvidia.md +++ /dev/null @@ -1,88 +0,0 @@ - -# NVIDIA Distribution - -The `llamastack/distribution-nvidia` distribution consists of the following provider configurations. - -| API | Provider(s) | -|-----|-------------| -| agents | `inline::meta-reference` | -| datasetio | `inline::localfs` | -| eval | `inline::meta-reference` | -| inference | `remote::nvidia` | -| post_training | `remote::nvidia` | -| safety | `remote::nvidia` | -| scoring | `inline::basic` | -| telemetry | `inline::meta-reference` | -| tool_runtime | `inline::rag-runtime` | -| vector_io | `inline::faiss` | - - -### Environment Variables - -The following environment variables can be configured: - -- `NVIDIA_API_KEY`: NVIDIA API Key (default: ``) -- `NVIDIA_USER_ID`: NVIDIA User ID (default: `llama-stack-user`) -- `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`) -- `NVIDIA_ACCESS_POLICIES`: NVIDIA Access Policies (default: `{}`) -- `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`) -- `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`) -- `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`) -- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`) -- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`) -- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`) - -### Models - -The following models are available by default: - -- `meta/llama3-8b-instruct (aliases: meta-llama/Llama-3-8B-Instruct)` -- `meta/llama3-70b-instruct (aliases: meta-llama/Llama-3-70B-Instruct)` -- `meta/llama-3.1-8b-instruct (aliases: meta-llama/Llama-3.1-8B-Instruct)` -- `meta/llama-3.1-70b-instruct (aliases: meta-llama/Llama-3.1-70B-Instruct)` -- `meta/llama-3.1-405b-instruct (aliases: meta-llama/Llama-3.1-405B-Instruct-FP8)` -- `meta/llama-3.2-1b-instruct (aliases: meta-llama/Llama-3.2-1B-Instruct)` -- `meta/llama-3.2-3b-instruct (aliases: meta-llama/Llama-3.2-3B-Instruct)` -- `meta/llama-3.2-11b-vision-instruct (aliases: meta-llama/Llama-3.2-11B-Vision-Instruct)` -- `meta/llama-3.2-90b-vision-instruct (aliases: meta-llama/Llama-3.2-90B-Vision-Instruct)` -- `nvidia/llama-3.2-nv-embedqa-1b-v2 ` -- `nvidia/nv-embedqa-e5-v5 ` -- `nvidia/nv-embedqa-mistral-7b-v2 ` -- `snowflake/arctic-embed-l ` - - -### Prerequisite: API Keys - -Make sure you have access to a NVIDIA API Key. You can get one by visiting [https://build.nvidia.com/](https://build.nvidia.com/). - - -## Running Llama Stack with NVIDIA - -You can do this via Conda (build code) or Docker which has a pre-built image. - -### Via Docker - -This method allows you to get started quickly without having to build the distribution code. - -```bash -LLAMA_STACK_PORT=8321 -docker run \ - -it \ - --pull always \ - -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ - -v ./run.yaml:/root/my-run.yaml \ - llamastack/distribution-nvidia \ - --yaml-config /root/my-run.yaml \ - --port $LLAMA_STACK_PORT \ - --env NVIDIA_API_KEY=$NVIDIA_API_KEY -``` - -### Via Conda - -```bash -llama stack build --template nvidia --image-type conda -llama stack run ./run.yaml \ - --port 8321 \ - --env NVIDIA_API_KEY=$NVIDIA_API_KEY - --env INFERENCE_MODEL=$INFERENCE_MODEL -``` diff --git a/docs/source/distributions/self_hosted_distro/nvidia.md b/docs/source/distributions/self_hosted_distro/nvidia.md index 0c0801f89..58731392d 100644 --- a/docs/source/distributions/self_hosted_distro/nvidia.md +++ b/docs/source/distributions/self_hosted_distro/nvidia.md @@ -1,3 +1,4 @@ + # NVIDIA Distribution The `llamastack/distribution-nvidia` distribution consists of the following provider configurations. @@ -5,24 +6,49 @@ The `llamastack/distribution-nvidia` distribution consists of the following prov | API | Provider(s) | |-----|-------------| | agents | `inline::meta-reference` | +| datasetio | `inline::localfs` | +| eval | `inline::meta-reference` | | inference | `remote::nvidia` | -| memory | `inline::faiss`, `remote::chromadb`, `remote::pgvector` | -| safety | `inline::llama-guard` | +| post_training | `remote::nvidia` | +| safety | `remote::nvidia` | +| scoring | `inline::basic` | | telemetry | `inline::meta-reference` | +| tool_runtime | `inline::rag-runtime` | +| vector_io | `inline::faiss` | ### Environment Variables The following environment variables can be configured: -- `LLAMASTACK_PORT`: Port for the Llama Stack distribution server (default: `8321`) - `NVIDIA_API_KEY`: NVIDIA API Key (default: ``) +- `NVIDIA_USER_ID`: NVIDIA User ID (default: `llama-stack-user`) +- `NVIDIA_DATASET_NAMESPACE`: NVIDIA Dataset Namespace (default: `default`) +- `NVIDIA_ACCESS_POLICIES`: NVIDIA Access Policies (default: `{}`) +- `NVIDIA_PROJECT_ID`: NVIDIA Project ID (default: `test-project`) +- `NVIDIA_CUSTOMIZER_URL`: NVIDIA Customizer URL (default: `https://customizer.api.nvidia.com`) +- `NVIDIA_OUTPUT_MODEL_DIR`: NVIDIA Output Model Directory (default: `test-example-model@v1`) +- `GUARDRAILS_SERVICE_URL`: URL for the NeMo Guardrails Service (default: `http://0.0.0.0:7331`) +- `INFERENCE_MODEL`: Inference model (default: `Llama3.1-8B-Instruct`) +- `SAFETY_MODEL`: Name of the model to use for safety (default: `meta/llama-3.1-8b-instruct`) ### Models The following models are available by default: -- `${env.INFERENCE_MODEL} (None)` +- `meta/llama3-8b-instruct (aliases: meta-llama/Llama-3-8B-Instruct)` +- `meta/llama3-70b-instruct (aliases: meta-llama/Llama-3-70B-Instruct)` +- `meta/llama-3.1-8b-instruct (aliases: meta-llama/Llama-3.1-8B-Instruct)` +- `meta/llama-3.1-70b-instruct (aliases: meta-llama/Llama-3.1-70B-Instruct)` +- `meta/llama-3.1-405b-instruct (aliases: meta-llama/Llama-3.1-405B-Instruct-FP8)` +- `meta/llama-3.2-1b-instruct (aliases: meta-llama/Llama-3.2-1B-Instruct)` +- `meta/llama-3.2-3b-instruct (aliases: meta-llama/Llama-3.2-3B-Instruct)` +- `meta/llama-3.2-11b-vision-instruct (aliases: meta-llama/Llama-3.2-11B-Vision-Instruct)` +- `meta/llama-3.2-90b-vision-instruct (aliases: meta-llama/Llama-3.2-90B-Vision-Instruct)` +- `nvidia/llama-3.2-nv-embedqa-1b-v2 ` +- `nvidia/nv-embedqa-e5-v5 ` +- `nvidia/nv-embedqa-mistral-7b-v2 ` +- `snowflake/arctic-embed-l ` ### Prerequisite: API Keys @@ -58,4 +84,5 @@ llama stack build --template nvidia --image-type conda llama stack run ./run.yaml \ --port 8321 \ --env NVIDIA_API_KEY=$NVIDIA_API_KEY + --env INFERENCE_MODEL=$INFERENCE_MODEL ``` diff --git a/docs/source/getting_started/detailed_tutorial.md b/docs/source/getting_started/detailed_tutorial.md index 911b35437..610c0cad5 100644 --- a/docs/source/getting_started/detailed_tutorial.md +++ b/docs/source/getting_started/detailed_tutorial.md @@ -536,6 +536,6 @@ uv run python rag_agent.py :::: -## You're Ready to Build Your Own Apps! +**You're Ready to Build Your Own Apps!** Congrats! 🥳 Now you're ready to [build your own Llama Stack applications](../building_applications/index)! 🚀 diff --git a/docs/source/getting_started/index.md b/docs/source/getting_started/index.md index ce7dbe973..e084f68b7 100644 --- a/docs/source/getting_started/index.md +++ b/docs/source/getting_started/index.md @@ -8,20 +8,20 @@ environments. You can build and test using a local server first and deploy to a In this guide, we'll walk through how to build a RAG application locally using Llama Stack with [Ollama](https://ollama.com/) as the inference [provider](../providers/index.md#inference) for a Llama Model. -## Step 1. Install and Setup -Install [uv](https://docs.astral.sh/uv/), setup your virtual environment, and run inference on a Llama model with -[Ollama](https://ollama.com/download). +#### Step 1: Install and setup +1. Install [uv](https://docs.astral.sh/uv/) +2. Run inference on a Llama model with [Ollama](https://ollama.com/download) ```bash -uv pip install llama-stack -source .venv/bin/activate ollama run llama3.2:3b --keepalive 60m ``` -## Step 2: Run the Llama Stack Server +#### Step 2: Run the Llama Stack server +We will use `uv` to run the Llama Stack server. ```bash -INFERENCE_MODEL=llama3.2:3b llama stack build --template ollama --image-type venv --run +INFERENCE_MODEL=llama3.2:3b uv run --with llama-stack llama stack build --template ollama --image-type venv --run ``` -## Step 3: Run the Demo -Now open up a new terminal using the same virtual environment and you can run this demo as a script using `uv run demo_script.py` or in an interactive shell. +#### Step 3: Run the demo +Now open up a new terminal and copy the following script into a file named `demo_script.py`. + ```python from llama_stack_client import Agent, AgentEventLogger, RAGDocument, LlamaStackClient @@ -43,9 +43,11 @@ _ = client.vector_dbs.register( embedding_dimension=embedding_dimension, provider_id="faiss", ) +source = "https://www.paulgraham.com/greatwork.html" +print("rag_tool> Ingesting document:", source) document = RAGDocument( document_id="document_1", - content="https://www.paulgraham.com/greatwork.html", + content=source, mime_type="text/html", metadata={}, ) @@ -66,19 +68,44 @@ agent = Agent( ], ) +prompt = "How do you do great work?" +print("prompt>", prompt) + response = agent.create_turn( - messages=[{"role": "user", "content": "How do you do great work?"}], + messages=[{"role": "user", "content": prompt}], session_id=agent.create_session("rag_session"), + stream=True, ) for log in AgentEventLogger().log(response): log.print() ``` +We will use `uv` to run the script +``` +uv run --with llama-stack-client demo_script.py +``` And you should see output like below. -```bash -inference> [knowledge_search(query="What does it mean to do great work")] -tool_execution> Tool:knowledge_search Args:{'query': 'What does it mean to do great work'} -tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='Result 2:\nDocument_id:docum\nContent: [1]\nI don\'t think you could give a precise definition of what\ncounts as great work. Doing great work means doing something important\nso well\n', type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent: . And if so\nyou're already further along than you might realize, because the\nset of people willing to want to is small.

The factors in doing great work are factors in the literal,\nmathematical sense, and\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent: \nincreases your morale and helps you do even better work. But this\ncycle also operates in the other direction: if you're not doing\ngood work, that can demoralize you and make it even harder to. Since\nit matters\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent: to try to do\ngreat work. But that's what's going on subconsciously; they shy\naway from the question.

So I'm going to pull a sneaky trick on you. Do you want to do great\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')] +``` +rag_tool> Ingesting document: https://www.paulgraham.com/greatwork.html + +prompt> How do you do great work? + +inference> [knowledge_search(query="What is the key to doing great work")] + +tool_execution> Tool:knowledge_search Args:{'query': 'What is the key to doing great work'} + +tool_execution> Tool:knowledge_search Response:[TextContentItem(text='knowledge_search tool found 5 chunks:\nBEGIN of knowledge_search tool results.\n', type='text'), TextContentItem(text="Result 1:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 2:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 3:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 4:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text="Result 5:\nDocument_id:docum\nContent: work. Doing great work means doing something important\nso well that you expand people's ideas of what's possible. But\nthere's no threshold for importance. It's a matter of degree, and\noften hard to judge at the time anyway.\n", type='text'), TextContentItem(text='END of knowledge_search tool results.\n', type='text')] + +inference> Based on the search results, it seems that doing great work means doing something important so well that you expand people's ideas of what's possible. However, there is no clear threshold for importance, and it can be difficult to judge at the time. + +To further clarify, I would suggest that doing great work involves: + +* Completing tasks with high quality and attention to detail +* Expanding on existing knowledge or ideas +* Making a positive impact on others through your work +* Striving for excellence and continuous improvement + +Ultimately, great work is about making a meaningful contribution and leaving a lasting impression. ``` Congratulations! You've successfully built your first RAG application using Llama Stack! 🎉🥳 @@ -92,10 +119,3 @@ Now you're ready to dive deeper into Llama Stack! - Discover how to [Build Llama Stacks](../distributions/index.md). - Refer to our [References](../references/index.md) for details on the Llama CLI and Python SDK. - Check out the [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository for example applications and tutorials. - -```{toctree} -:maxdepth: 0 -:hidden: - -detailed_tutorial -``` diff --git a/docs/source/index.md b/docs/source/index.md index 99b0e1a3e..0c2d5a015 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -99,8 +99,9 @@ A number of "adapters" are available for some popular Inference and Vector Store :maxdepth: 3 self -introduction/index getting_started/index +getting_started/detailed_tutorial +introduction/index concepts/index providers/index distributions/index