# What does this PR do?
Add win platform run command for stack
- [x] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
https://github.com/meta-llama/llama-stack/pull/889
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
This PR implements windows platform support for build_container.sh
execution from terminal. Additionally, it resolves "no support for
Terminos and PTY for Window PC" issues.
- [x] Addresses issue (#issue)
Releates issues: https://github.com/meta-llama/llama-stack/issues/826,
https://github.com/meta-llama/llama-stack/issues/726
## Test Plan
Changes were tested manually by executing standard scripts from LLama
guide:
- llama stack build --template ollama --image-type container
- llama stack build --list-templates
- llama stack build
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
Add response format for agents structured output.
- [ ] Using structured output for agents (interior_design app as an
example) (#issue)
https://github.com/meta-llama/llama-stack-apps/issues/122
## Test Plan
E2E test plan with llama-stack-apps interior_design
Please describe:
Test ran:
- provide instructions so it can be reproduced.
Start your distro:
llama stack run llama_stack/templates/fireworks/run.yaml --env
FIREWORKS_API_KEY=<API_KEY>
Run api test:
```PYTHONPATH=. python examples/interior_design_assistant/api.py localhost 5000 examples/interior_design_assistant/resources/documents/ examples/interior_design_assistant/resources/images/fireplaces```
## Sources
Results:
https://github.com/meta-llama/llama-stack-client-python/pull/72
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
fixed report generation:
1) do not initialize a new client in report.py - instead get it from
pytest fixture
2) Add "provider" for "safety" and "agents" section
3) add logprobs functionality in "inference" section
## Test Plan
See the regenerated report
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Fixing the bullets
# What does this PR do?
The bullets were not there as intended so I helped fix them.
- [x] Addresses issue (#issue)
## Test Plan
Please describe:
Ran the test, and the bullets are there now to be consistent with the
page.
## Sources
N/A
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
- Fix loading SambaNovaImpl issue
- Add LlamaGuard model support for inference
## Test Plan
Run the following unit test scripts and results
### Embedding
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models)
llama_stack/providers/tests/inference/test_embeddings.py::TestEmbeddings::test_batch_embeddings[-sambanova] SKIPPED (This test is only applicable for embedding models)
=================================================================================================================== 2 skipped, 1 warning in 0.32s ===================================================================================================================
```
### Vision
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_vision_inference.py --inference-model meta-llama/Llama-3.2-11B-Vision-Instruct --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image0-expected_strings0] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_non_streaming[-sambanova-image1-expected_strings1] PASSED
llama_stack/providers/tests/inference/test_vision_inference.py::TestVisionModelInference::test_vision_chat_completion_streaming[-sambanova] PASSED
=================================================================================================================== 3 passed, 1 warning in 2.68s ====================================================================================================================
```
### Text
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED
=================================================================================================================== 1 passed, 1 warning in 0.46s ====================================================================================================================
```
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={SAMBANOVA_API_KEY}
```
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED
=================================================================================================================== 1 passed, 1 warning in 0.48s ====================================================================================================================
```
## Before submitting
- [] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [Y] Updated relevant documentation.
- [Y] Wrote necessary unit or integration tests.
# What does this PR do?
When you re-initialize the library client in a notebook, we were seeing
this error:
```
Getting traces for session_id=5c8d1969-0957-49d2-b852-32cbb8ef8caf
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-11-d74bb6cdd3ab>](https://localhost:8080/#) in <cell line: 0>()
7 agent_logs = []
8
----> 9 for span in client.telemetry.query_spans(
10 attribute_filters=[
11 {"key": "session_id", "op": "eq", "value": session_id},
10 frames
[/usr/local/lib/python3.11/dist-packages/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py](https://localhost:8080/#) in query_traces(self, attribute_filters, limit, offset, order_by)
246 ) -> QueryTracesResponse:
247 return QueryTracesResponse(
--> 248 data=await self.trace_store.query_traces(
249 attribute_filters=attribute_filters,
250 limit=limit,
AttributeError: 'TelemetryAdapter' object has no attribute 'trace_store'
```
This is happening because the we were skipping some required steps for
the object state as part of the global _TRACE_PROVIDER check. This PR
moves the initialization of the object state out of the TRACE_PROVIDER
init.
# What does this PR do?
In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.
- [ ] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.
- [ ] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Earlier, we would have some unknown magic to identify the path for
remote endpoints when testing client_sdk/tests.
Removed that and now you have to explicitly pass a path
The previous curl command was wrong and did not actually check for
version correctly (status code was always 200 regardless of what you
retrieved.)
Also added tagging latest. cc @wukaixingxp
# What does this PR do?
Fix documentation to reflect new API
## Test Plan
Before:
User> What are the top 5 topics that were explained? Only list succinct
bullet points.
inference> I'm ready to help, but we haven't discussed any topics yet!
This is the start of our conversation. What would you like to talk
about? I can summarize our discussion at the end if you'd like.
Run with the change, observe relevant response
<img width="1029" alt="image"
src="https://github.com/user-attachments/assets/a7dece3c-e8b4-4a60-9092-ba544c87dffd"
/>
## Sources
Please link relevant resources if necessary.
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Co-authored-by: Eric Huang (AI Platform) <erichuang@fb.com>
# What does this PR do?
Previously the tests hard coded the tool prompt format to be json which
will cause it to fail when using 3.2/3.3 family of models. This change
make the default to be none for the agent config and just remove the
specification in the tests.
## Test Plan
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/agents/test_agents.py
# What does this PR do?
This PR adds SambaNova as one of the Provider
- Add SambaNova as a provider
## Test Plan
Test the functional command
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key>
```
Test the distribution template:
```
# Docker
LLAMA_STACK_PORT=5001
docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
llamastack/distribution-sambanova \
--port $LLAMA_STACK_PORT \
--env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY
# Conda
llama stack build --template sambanova --image-type conda
llama stack run ./run.yaml \
--port $LLAMA_STACK_PORT \
--env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY
```
## Source
[SambaNova API Documentation](https://cloud.sambanova.ai/apis)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [Y] Updated relevant documentation.
- [Y ] Wrote necessary unit or integration tests.
---------
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
No need for the cookbook now. Removing the folder
- [ ] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
As title
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
Adds raw completions API to vLLM
## Test Plan
<details>
<summary>Setup</summary>
```bash
# Run vllm server
conda create -n vllm python=3.12 -y
conda activate vllm
pip install vllm
# Run llamastack
conda create --name llamastack-vllm python=3.10
conda activate llamastack-vllm
export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct && \
pip install -e . && \
pip install --no-cache --index-url https://pypi.org/simple/ --extra-index-url https://test.pypi.org/simple/ llama-stack==0.1.0rc7 && \
llama stack build --template remote-vllm --image-type conda && \
llama stack run ./distributions/remote-vllm/run.yaml \
--port 5000 \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env VLLM_URL=http://localhost:8000/v1 | tee -a llama-stack.log
```
</details>
<details>
<summary>Integration</summary>
```bash
# Run
conda activate llamastack-vllm
export VLLM_URL=http://localhost:8000/v1
pip install pytest pytest_html pytest_asyncio aiosqlite
pytest llama_stack/providers/tests/inference/test_text_inference.py -v -k vllm
# Results
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[-vllm_remote] PASSED [ 11%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[-vllm_remote] PASSED [ 22%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_logprobs[-vllm_remote] SKIPPED [ 33%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[-vllm_remote] SKIPPED [ 44%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-vllm_remote] PASSED [ 55%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[-vllm_remote] PASSED [ 66%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-vllm_remote] PASSED [ 77%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-vllm_remote] PASSED [ 88%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[-vllm_remote] PASSED [100%]
====================================== 7 passed, 2 skipped, 99 deselected, 1 warning in 9.80s ======================================
```
</details>
<details>
<summary>Manual</summary>
```bash
# Install
pip install --no-cache --index-url https://pypi.org/simple/ --extra-index-url https://test.pypi.org/simple/ llama-stack==0.1.0rc7
```
Apply this diff
```diff
diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py
index 8dbb193..95173e2 100644
--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -250,7 +250,7 @@ class ClientVersionMiddleware:
server_version_parts = tuple(
map(int, self.server_version.split(".")[:2])
)
- if client_version_parts != server_version_parts:
+ if False and client_version_parts != server_version_parts:
async def send_version_error(send):
await send(
diff --git a/llama_stack/templates/remote-vllm/run.yaml b/llama_stack/templates/remote-vllm/run.yaml
index 4eac4da..32eb50e 100644
--- a/llama_stack/templates/remote-vllm/run.yaml
+++ b/llama_stack/templates/remote-vllm/run.yaml
@@ -94,7 +94,8 @@ metadata_store:
type: sqlite
db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/registry.db
models:
-- metadata: {}
+- metadata:
+ llama_model: meta-llama/Llama-3.2-3B-Instruct
model_id: ${env.INFERENCE_MODEL}
provider_id: vllm-inference
model_type: llm
```
Test 1:
```python
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(
base_url="http://localhost:5000",
)
response = client.inference.completion(
model_id="meta-llama/Llama-3.2-3B-Instruct",
content="Hello, world client!",
)
print(response)
```
Test 2
```
from llama_stack_client import LlamaStackClient
client = LlamaStackClient(
base_url="http://localhost:5000",
)
response = client.inference.completion(
model_id="meta-llama/Llama-3.2-3B-Instruct",
content="Hello, world client!",
stream=True,
)
for chunk in response:
print(chunk.delta, end="", flush=True)
```
```
I'm excited to introduce you to our latest project, a comprehensive guide to the best coffee shops in [City]. As a coffee connoisseur, you're in luck because we've scoured the city to bring you the top picks for the perfect cup of joe.
In this guide, we'll take you on a journey through the city's most iconic coffee shops, highlighting their unique features, must-try drinks, and insider tips from the baristas themselves. From cozy cafes to trendy cafes, we've got you covered.
**Top 5 Coffee Shops in [City]**
1. **The Daily Grind**: This beloved institution has been serving up expertly crafted pour-overs and lattes for over 10 years. Their expert baristas are always happy to guide you through their menu, which features a rotating selection of single-origin beans from around the world...
```
</details>
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
Update README and other documentation
## Before submitting
- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.