Commit graph

593 commits

Author SHA1 Message Date
Jash Gulabrai
4999c8f9cc fix: missing key 2025-05-06 11:21:38 -04:00
Jash Gulabrai
b1d941e1f0 Merge branch 'main' into nvidia-e2e-notebook 2025-05-06 11:12:34 -04:00
Divya
3022f7b642
feat: Adding TLS support for Remote::Milvus vector_io (#2011)
# What does this PR do?
For the Issue :-
#[2010](https://github.com/meta-llama/llama-stack/issues/2010)
Currently, if we try to connect the Llama stack server to a remote
Milvus instance that has TLS enabled, the connection fails because TLS
support is not implemented in the Llama stack codebase. As a result,
users are unable to use secured Milvus deployments out of the box.

After adding this , the user will be able to connect to remote::Milvus
which is TLS enabled .
if TLS enabled :-
```
vector_io:
  - provider_id: milvus
    provider_type: remote::milvus
    config:
      uri: "http://<host>:<port>"
      token: "<user>:<password>"
      secure: True
      server_pem_path: "path/to/server.pem"
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
I have already tested it by connecting to a Milvus instance which is TLS
enabled and i was able to start llama stack server .
2025-05-06 14:15:34 +02:00
Christina Xu
65cc971877
docs: Add TrustyAI LM-Eval to list of known external providers (#2020)
# What does this PR do?
Adds documentation for the remote [TrustyAI LM-Eval Eval
Provider](https://github.com/trustyai-explainability/llama-stack-provider-lmeval).
LM-Eval is a service for large language model evaluation based on the
open source project
[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
and is integrated into the [TrustyAI Kubernetes
Operator](https://trustyai-explainability.github.io/trustyai-site/main/trustyai-operator.html).
2025-05-06 14:11:55 +02:00
Sébastien Han
a5d151e912
docs: fix typo mivus.md -> milvus.md (#2102)
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-05 09:48:38 -07:00
Ihar Hrachyshka
16e163da0e
docs: List external kubeflow pipelines provider prototype (#2100)
# What does this PR do?

Lists another external provider example (kfp).

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-05-05 10:24:52 +02:00
Christian Zaccaria
9f27578929
fix: improve Mermaid diagram visibility in dark mode (#2092)
# What does this PR do?
Closes #2078 

Previously, the Agent Execution Loop diagram was barely visible in dark
mode:


![image](https://github.com/user-attachments/assets/78567334-c57f-4cd0-ba93-290b20ed3aba)

I experimented with styling individual classes, but ultimately found
that adding an off-white background provides the best visibility in both
dark and light modes:


![image](https://github.com/user-attachments/assets/419d153a-d870-410b-b635-02b95da67a3d)

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

The documentation can be built locally by following the docs:
https://llama-stack.readthedocs.io/en/latest/contributing/index.html#building-the-documentation

[//]: # (## Documentation)
2025-05-02 13:09:45 -07:00
Ashwin Bharambe
272d3359ee
fix: remove code interpeter implementation (#2087)
# What does this PR do?

The builtin implementation of code interpreter is not robust and has a
really weak sandboxing shell (the `bubblewrap` container). Given the
availability of better MCP code interpreter servers coming up, we should
use them instead of baking an implementation into the Stack and
expanding the vulnerability surface to the rest of the Stack.

This PR only does the removal. We will add examples with how to
integrate with MCPs in subsequent ones.

## Test Plan

Existing tests.
2025-05-01 14:35:08 -07:00
Ihar Hrachyshka
9e6561a1ec
chore: enable pyupgrade fixes (#1806)
# What does this PR do?

The goal of this PR is code base modernization.

Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)

Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-05-01 14:23:50 -07:00
Derek Higgins
64829947d0
feat: Add temperature support to responses API (#2065)
# What does this PR do?
Add support for the temperature to the responses API 


## Test Plan
Manually tested simple case
unit tests added for simple case and tool calls

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-05-01 11:47:58 -07:00
Sébastien Han
dc94433072
feat(pre-commit): enhance pre-commit hooks with additional checks (#2014)
# What does this PR do?

Add several new pre-commit hooks to improve code quality and security:

- no-commit-to-branch: prevent direct commits to protected branches like
`main`
- check-yaml: validate YAML files
- detect-private-key: prevent accidental commit of private keys
- requirements-txt-fixer: maintain consistent requirements.txt format
and sorting
- mixed-line-ending: enforce LF line endings to avoid mixed line endings
- check-executables-have-shebangs: ensure executable scripts have
shebangs
- check-json: validate JSON files
- check-shebang-scripts-are-executable: verify shebang scripts are
executable
- check-symlinks: validate symlinks and report broken ones
- check-toml: validate TOML files mainly for pyproject.toml

The respective fixes have been included.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-30 11:35:49 -07:00
Nathan Weinberg
d897313e0b
feat: add additional logging to llama stack build (#1689)
# What does this PR do?
Partial revert of fa68ded07c

this commit ensures users know where their new templates are generated
and how to run the newly built distro locally

discussion on Discord:
1351652390

## Test Plan
Did a local run - let me know if we want any unit testing covering this

![Screenshot from 2025-03-18
22-38-18](https://github.com/user-attachments/assets/6d5dac52-edad-4a84-992f-a3c23cda10c8)

## Documentation
Updated "Zero to Hero" guide with new output

---------

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-04-30 11:06:24 -07:00
Jash Gulabrai
012dd6891f Merge branch 'main' into nvidia-e2e-notebook 2025-04-30 12:05:11 -04:00
Jash Gulabrai
f8f59c8335 fix: Update datasets metadata field from provider to provider_id 2025-04-30 10:52:12 -04:00
Ashwin Bharambe
4d0bfbf984
feat: add api.llama provider, llama-guard-4 model (#2058)
This PR adds a llama-stack inference provider for `api.llama.com`, as
well as adds entries for Llama-Guard-4 and updated Prompt-Guard models.
2025-04-29 10:07:41 -07:00
Jash Gulabrai
96afc98b88 Add reference to notebook in docs 2025-04-29 13:06:43 -04:00
Jash Gulabrai
2f60f3c347 fix: Consistently prefix customized models with the namespace 2025-04-29 12:57:49 -04:00
Ben Browning
8dfce2f596
feat: OpenAI Responses API (#1989)
# What does this PR do?

This provides an initial [OpenAI Responses
API](https://platform.openai.com/docs/api-reference/responses)
implementation. The API is not yet complete, and this is more a
proof-of-concept to show how we can store responses in our key-value
stores and use them to support the Responses API concepts like
`previous_response_id`.

## Test Plan

I've added a new
`tests/integration/openai_responses/test_openai_responses.py` as part of
a test-driven development for this new API. I'm only testing this
locally with the remote-vllm provider for now, but it should work with
any of our inference providers since the only API it requires out of the
inference provider is the `openai_chat_completion` endpoint.

```
VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack build --template remote-vllm --image-type venv --run
```

```
LLAMA_STACK_CONFIG="http://localhost:8321" \
python -m pytest -v \
  tests/integration/openai_responses/test_openai_responses.py \
  --text-model "meta-llama/Llama-3.2-3B-Instruct"
 ```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-04-28 14:06:00 -07:00
Sébastien Han
79851d93aa
feat: Add Kubernetes authentication (#1778)
# What does this PR do?

This commit adds a new authentication system to the Llama Stack server
with support for Kubernetes and custom authentication providers. Key
changes include:

- Implemented KubernetesAuthProvider for validating Kubernetes service
account tokens
- Implemented CustomAuthProvider for validating tokens against external
endpoints - this is the same code that was already present.
- Added test for Kubernetes
- Updated server configuration to support authentication settings
- Added documentation for authentication configuration and usage

The authentication system supports:
- Bearer token validation
- Kubernetes service account token validation
- Custom authentication endpoints

## Test Plan

Setup a Kube cluster using Kind or Minikube.

Run a server with:

```
server:
  port: 8321
  auth:
    provider_type: kubernetes
    config:
      api_server_url: http://url
      ca_cert_path: path/to/cert (optional)
```

Run:

```
curl -s -L -H "Authorization: Bearer $(kubectl create token my-user)" http://127.0.0.1:8321/v1/providers
```

Or replace "my-user" with your service account.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-28 22:24:58 +02:00
Jash Gulabrai
29f57d528d Remove unused env vars; change the other tmp folder name; fix examples 2025-04-28 13:08:36 -04:00
Rashmi Pawar
e6bbf8d20b
feat: Add NVIDIA NeMo datastore (#1852)
# What does this PR do?
Implemetation of NeMO Datastore register, unregister API.

Open Issues: 
- provider_id gets set to `localfs` in client.datasets.register() as it
is specified in routing_tables.py: DatasetsRoutingTable
see: #1860

Currently I have passed `"provider_id":"nvidia"` in metadata and have
parsed that in `DatasetsRoutingTable`
(Not the best approach, but just a quick workaround to make it work for
now.)

## Test Plan
- Unit test cases: `pytest
tests/unit/providers/nvidia/test_datastore.py`
```bash
========================================================== test session starts ===========================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-0.26.0, nbval-0.11.0, metadata-3.1.1, html-4.1.1, cov-6.1.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                                        

tests/unit/providers/nvidia/test_datastore.py ..                                                                                   [100%]

============================================================ warnings summary ============================================================

====================================================== 2 passed, 1 warning in 0.84s ======================================================
```

cc: @dglogo, @mattf, @yanxi0830
2025-04-28 09:41:59 -07:00
Jash Gulabrai
e64961697a Rename tmp dir to sample_data; remove print statements 2025-04-28 12:04:36 -04:00
Jash Gulabrai
73275f07b7 Merge branch 'main' into nvidia-e2e-notebook 2025-04-28 12:00:11 -04:00
Derek Higgins
0e4307de0f
docs: Fix missing --gpu all flag in Docker run commands (#2026)
adding the --gpu all flag to Docker run commands
for meta-reference-gpu distributions ensures models are loaded into GPU
instead of CPU.

Remove docs for meta-reference-quantized-gpu
The distribution was removed in #1887
but these files were left behind.


Fixes: #1798

# What does this PR do?
Fixes doc to add --gpu all command to docker run

[//]: # (If resolving an issue, uncomment and update the line below)
Closes #1798

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

verified in docker documentation but untested

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-04-25 12:17:31 -07:00
Sajikumar JS
1bb1d9b2ba
feat: Add watsonx inference adapter (#1895)
# What does this PR do?
IBM watsonx ai added as the inference [#1741
](https://github.com/meta-llama/llama-stack/issues/1741)

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

---------

Co-authored-by: Sajikumar JS <sajikumar.js@ibm.com>
2025-04-25 11:29:21 -07:00
Kevin Postlethwait
d9e00fca66
fix: specify nbformat version in nb (#2023)
# What does this PR do?
Adding nbformat version fixes this issue. Not sure exactly why this
needs to be done, but this version was rewritten to the bottom of a nb
file when I changed its name trying to get to the bottom of this. When I
opened it on GH the issue was no longer present
 Closes #1837 

## Test Plan
N/A
2025-04-25 10:10:37 +02:00
Rashmi Pawar
ace82836c1
feat: NVIDIA allow non-llama model registration (#1859)
# What does this PR do?
Adds custom model registration functionality to NVIDIAInferenceAdapter
which let's the inference happen on:
- post-training model
- non-llama models in API Catalogue(behind
https://integrate.api.nvidia.com and endpoints compatible with
AyncOpenAI)

## Example Usage:
```python
from llama_stack.apis.models import Model, ModelType
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
client = LlamaStackAsLibraryClient("nvidia")
_ = client.initialize()

client.models.register(
        model_id=model_name,
        model_type=ModelType.llm,
        provider_id="nvidia"
)

response = client.inference.chat_completion(
    model_id=model_name,
    messages=[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"Write a limerick about the wonders of GPU computing."}],
)
```

## Test Plan
```bash
pytest tests/unit/providers/nvidia/test_supervised_fine_tuning.py 
========================================================== test session starts ===========================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0
collected 6 items                                                                                                                        

tests/unit/providers/nvidia/test_supervised_fine_tuning.py ......                                                                  [100%]

============================================================ warnings summary ============================================================
../miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076
  /home/ubuntu/miniconda/envs/nvidia-1/lib/python3.10/site-packages/pydantic/fields.py:1076: PydanticDeprecatedSince20: Using extra keyword arguments on `Field` is deprecated and will be removed. Use `json_schema_extra` instead. (Extra keys: 'contentEncoding'). Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
    warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================== 6 passed, 1 warning in 1.51s ======================================================
```

[//]: # (## Documentation)
Updated Readme.md

cc: @dglogo, @sumitb, @mattf
2025-04-24 17:13:33 -07:00
Jash Gulabrai
cc77f79f55
feat: Add NVIDIA Eval integration (#1890)
# What does this PR do?
This PR adds support for NVIDIA's NeMo Evaluator API to the Llama Stack
eval module. The integration enables users to evaluate models via the
Llama Stack interface.

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
1. Added unit tests and successfully ran from root of project:
`./scripts/unit-tests.sh tests/unit/providers/nvidia/test_eval.py`
```
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_cancel PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_result PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_job_status PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_register_benchmark PASSED
tests/unit/providers/nvidia/test_eval.py::TestNVIDIAEvalImpl::test_run_eval PASSED
```
2. Verified I could build the Llama Stack image: `LLAMA_STACK_DIR=$(pwd)
llama stack build --template nvidia --image-type venv`

Documentation added to
`llama_stack/providers/remote/eval/nvidia/README.md`

---------

Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
2025-04-24 17:12:42 -07:00
Jash Gulabrai
e24959ea9e Fix variable name 2025-04-24 10:41:38 -04:00
Charlie Doern
a673697858
chore: rename ramalama provider (#2008)
# What does this PR do?

the ramalama team has decided to rename their external provider
`ramalama-stack` (more catchy!). Update docs accordingly

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-04-24 09:34:15 +02:00
Nathan Weinberg
6a44e7ba20
docs: add API to external providers table (#2006)
Also does a minor reorg of the columns

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-04-23 15:58:10 +02:00
Kevin Postlethwait
e0fa67c81c
docs: add examples for how to define RAG docs (#1981)
# What does this PR do?
Add examples for how to define RAGDocuments. Not sure if this is the
best place for these docs. @raghotham Please advise

## Test Plan
None, documentation

[//]: # (## Documentation)

Signed-off-by: Kevin <kpostlet@redhat.com>
2025-04-23 15:39:18 +02:00
Nathan Weinberg
d6e88e0bc6
docs: add RamaLama to list of known external providers (#2004)
The RamaLama project now has an external provider offering for Llama
Stack: https://github.com/containers/llama-stack-provider-ramalama

See also: https://github.com/meta-llama/llama-stack/pull/1676

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-04-23 09:44:18 +02:00
Jash Gulabrai
0d06c654d0
feat: Update NVIDIA to GA docs; remove notebook reference until ready (#1999)
# What does this PR do?
- Update NVIDIA documentation links to GA docs
- Remove reference to notebooks until merged

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
2025-04-18 19:13:18 -04:00
Jash Gulabrai
8fd656dcac Add changes 2025-04-18 16:28:04 -04:00
Jash Gulabrai
4131e8146f Clean up instructions and implementation; reorganize notebooks 2025-04-18 16:27:19 -04:00
Sébastien Han
94f83382eb
feat: allow building distro with external providers (#1967)
# What does this PR do?

We can now build a distribution that includes external providers.
Closes: https://github.com/meta-llama/llama-stack/issues/1948

## Test Plan

Build a distro with an external provider following the doc instructions.

[//]: # (## Documentation)

Added.

Rendered:


![Screenshot 2025-04-18 at 11 26
39](https://github.com/user-attachments/assets/afcf3d50-8d30-48c3-8d24-06a4b3662881)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-18 17:18:28 +02:00
Yuan Tang
c4570bcb48
docs: Add tips for debugging remote vLLM provider (#1992)
# What does this PR do?

This is helpful when debugging issues with vLLM + Llama Stack after this
PR https://github.com/vllm-project/vllm/pull/15593

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-04-18 14:47:47 +02:00
Yuan Tang
4c6b7005fa
fix: Fix docs lint issues (#1993)
# What does this PR do?

This was not caught as part of the CI build:
dd62a2388c.
[This PR](https://github.com/meta-llama/llama-stack/pull/1354) was too
old and didn't include the additional CI builds yet.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-04-18 02:33:13 -04:00
AN YU (安宇)
dd62a2388c
docs: add notes to websearch tool and two extra example scripts (#1354)
# What does this PR do?

- Adds a note about unexpected Brave Search output appearing even when
Tavily Search is called. This behavior is expected for now and is a work
in progress https://github.com/meta-llama/llama-stack/issues/1229. The
note aims to clear any confusion for new users.
- Adds two example scripts demonstrating how to build an agent using:
    1. WebSearch tool
    2. WolframAlpha tool
These examples provide new users with an instant understanding of how to
integrate these tools.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
Tested these example scripts using following steps:
step 1. `ollama run llama3.2:3b-instruct-fp16 --keepalive 60m`
step 2. 
```
export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
export LLAMA_STACK_PORT=8321
```
step 3: `llama stack run --image-type conda
~/llama-stack/llama_stack/templates/ollama/run.yaml`
step 4: run the example script with your api keys.

expected output:

![image](https://github.com/user-attachments/assets/308ddb17-a087-4cf2-8622-b085174ea0ab)

![image](https://github.com/user-attachments/assets/639f239f-8966-433d-943c-ee6b304c0d71)


[//]: # (## Documentation)
2025-04-17 20:20:52 -04:00
Sébastien Han
cb874287a4
fix: resync api spec (#1987) 2025-04-17 11:36:04 -04:00
Jash Gulabrai
0d9d333a4e Ensure sampling_params param is included in run_eval calls 2025-04-17 10:23:21 -04:00
Alexey Rybak
326cbba579
feat(agents): add agent naming functionality (#1922)
# What does this PR do?
Allow users to name an agent and use the name in telemetry instead of
relying on randomly generated agent_ids. This improves the developer
experience by making it easier to find specific agents in telemetry
logs.

Closes #1832

## Test Plan

- Added tests to verify the agent name is properly stored and retrieved
- Ran `uv run -- pytest -v
tests/integration/telemetry/test_telemetry.py::test_agent_name_filtering`
from the root of the project and made sure the tests pass
- Ran `uv run -- pytest -v
tests/integration/telemetry/test_telemetry.py::test_agent_query_spans`
to verify existing code without agent names still works correctly

## Use Example
```
agent = Agent(
    llama_stack_client, 
    model=text_model_id, 
    name="CustomerSupportAgent",  # New parameter
    instructions="You are a helpful customer support assistant"
)
session_id = agent.create_session(f"test-session-{uuid4()}")
```

## Implementation Notes
- Agent names are optional string parameters with no additional
validation
- Names are not required to be unique - multiple agents can have the
same name
- The agent_id remains the unique identifier for an agent

---------

Co-authored-by: raghotham <raghotham@gmail.com>
2025-04-17 07:02:47 -07:00
Ben Browning
5b8e75b392
fix: OpenAI spec cleanup for assistant requests (#1963)
# What does this PR do?

Some of our multi-turn verification tests were failing because I had
accidentally marked content as a required field in the OpenAI chat
completion request assistant messages, but it's actually optional. It is
required for messages from other roles, but assistant is explicitly
allowed to be optional.

Similarly, the assistant message tool_calls field should default to None
instead of an empty list.

These two changes get the openai-llama-stack verification test back to
100% passing, just like it passes 100% when not behind Llama Stack. They
also increase the pass rate of some of the other providers in the
verification test, but don't get them to 100%.

## Test Plan

I started a Llama Stack server setup to run all the verification tests
(requires OPENAI_API_KEY env variable)

```
llama stack run --image-type venv tests/verifications/openai-api-verification-run.yaml
```

Then, I manually ran the verification tests to see which were failing,
fix them, and ran them again after these changes to ensure they were all
passing.

```
python -m pytest -s -v tests/verifications/openai_api/test_chat_completion.py --provider=openai-llama-stack
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-17 06:56:10 -07:00
Matthew Farrellee
4205376653
chore: add meta/llama-3.3-70b-instruct as supported nvidia inference provider model (#1985)
see https://build.nvidia.com/meta/llama-3_3-70b-instruct
2025-04-17 06:50:40 -07:00
Jash Gulabrai
2ae1d7f4e6
docs: Add NVIDIA platform distro docs (#1971)
# What does this PR do?
Add NVIDIA platform docs that serve as a starting point for Llama Stack
users and explains all supported microservices.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>
2025-04-17 05:54:30 -07:00
Francisco Arceo
00b232c282
chore: Fix to persist the theme preference across page navigation. (#1974)
# What does this PR do?
This PR persists the theme preference across page navigation.

Currently, if the default theme is detected, it is used. 

But if a user flips **_the default theme_** and goes to a new page, the
theme will switch back to the default.

This resolves that issue.

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-04-16 13:58:25 -07:00
Jash Gulabrai
6927cdf5ce feat: NVIDIA beginner e2e notebook 2025-04-15 23:26:38 -04:00
Chirag Modi
fb8ff77ff2
docs: 0.2.2 doc updates (#1961)
Add updates to android site readme for 0.2.2
2025-04-15 13:26:17 -07:00
Dmitry Rogozhkin
71ed47ea76
docs: add example for intel gpu in vllm remote (#1952)
# What does this PR do?

PR adds instructions to setup vLLM remote endpoint for vllm-remote llama
stack distribution.

## Test Plan

* Verified with manual tests of the configured vllm-remote against vllm
endpoint running on the system with Intel GPU
* Also verified with ci pytests (see cmdline below). Test passes in the
same capacity as it does on the A10 Nvidia setup (some tests do fail
which seems to be known issues with vllm remote llama stack
distribution)

```
pytest -s -v tests/integration/inference/test_text_inference.py \
   --stack-config=http://localhost:5001 \
   --text-model=meta-llama/Llama-3.2-3B-Instruct
```

CC: @ashwinb

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-04-15 07:56:23 -07:00