Commit graph

957 commits

Author SHA1 Message Date
Ashwin Bharambe
d111bad2f2
Update GH action so it correctly queries for test.pypi, etc. (#875)
The previous curl command was wrong and did not actually check for
version correctly (status code was always 200 regardless of what you
retrieved.)

Also added tagging latest. cc @wukaixingxp
2025-01-24 11:56:29 -08:00
Hardik Shah
2cebb24d3a
Update doc templates for running safety on self-hosted templates (#874) 2025-01-24 11:28:20 -08:00
Ashwin Bharambe
eaba6a550a Point to 0.1.0 release notes in docs 2025-01-24 10:00:16 -08:00
Ashwin Bharambe
05d73dd4fd Bump version to 0.1.0 2025-01-24 09:50:07 -08:00
Ashwin Bharambe
19521cb22e More doc updates 2025-01-24 09:22:15 -08:00
Ashwin Bharambe
2118f37350 Doc updates 2025-01-23 21:31:18 -08:00
Ashwin Bharambe
9351a4b2d7 Update documentation 2025-01-23 17:10:57 -08:00
ehhuang
2fefe8dacd
Update 'first RAG agent' in gettingstarted doc (#867)
# What does this PR do?

Fix documentation to reflect new API


## Test Plan
Before:

User> What are the top 5 topics that were explained? Only list succinct
bullet points.
inference> I'm ready to help, but we haven't discussed any topics yet!
This is the start of our conversation. What would you like to talk
about? I can summarize our discussion at the end if you'd like.


Run with the change, observe relevant response

<img width="1029" alt="image"
src="https://github.com/user-attachments/assets/a7dece3c-e8b4-4a60-9092-ba544c87dffd"
/>



## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Co-authored-by: Eric Huang (AI Platform) <erichuang@fb.com>
2025-01-23 17:02:04 -08:00
Dinesh Yeduguru
cb11336886
remove logger handler only in notebook (#868)
remove logger handler only in notebook
2025-01-23 16:58:17 -08:00
Dinesh Yeduguru
ebffa15f40
update python sdk reference (#866)
# What does this PR do?

syncs changes from
https://github.com/stainless-sdks/llama-stack-python/blob/main/api.md
2025-01-23 16:04:06 -08:00
Dinesh Yeduguru
c570a708bf
update the client reference (#864)
# What does this PR do?

Syncs changes from
https://github.com/meta-llama/llama-stack-client-python/pull/96
2025-01-23 15:32:16 -08:00
Dinesh Yeduguru
a78f1fc70d
make default tool prompt format none in agent config (#863)
# What does this PR do?

Previously the tests hard coded the tool prompt format to be json which
will cause it to fail when using 3.2/3.3 family of models. This change
make the default to be none for the agent config and just remove the
specification in the tests.


## Test Plan
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/agents/test_agents.py
2025-01-23 14:44:59 -08:00
Hardik Shah
94ffaf468c
More updates to ReadTheDocs (#861)
Improve Contributing section
2025-01-23 12:50:38 -08:00
Dinesh Yeduguru
7df40da5fa
sync readme.md to index.md (#860)
# What does this PR do?

README has some new content that is being synced to index.md
2025-01-23 12:43:09 -08:00
Hardik Shah
a6a4270eef
Updates to ReadTheDocs (#859)
Move evals section to AI Agents section 
drop from top level and other minor fixes
2025-01-23 12:42:15 -08:00
Ashwin Bharambe
d78027f3b5 Move runpod provider to the correct directory
Also cleanup the test code to avoid skipping tests. Let failures be
known and public.
2025-01-23 12:25:12 -08:00
snova-edwardm
22dc684da6
Sambanova inference provider (#555)
# What does this PR do?

This PR adds SambaNova as one of the Provider

- Add SambaNova as a provider

## Test Plan
Test the functional command
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key>
```

Test the distribution template:
```
# Docker
LLAMA_STACK_PORT=5001
docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-sambanova \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY

# Conda
llama stack build --template sambanova --image-type conda
llama stack run ./run.yaml \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY
```

## Source
[SambaNova API Documentation](https://cloud.sambanova.ai/apis)

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [Y ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-01-23 12:20:28 -08:00
Marut Pandya
e2b5456e48
Add Runpod Provider + Distribution (#362)
Add Runpod as a inference provider for openAI compatible managed
endpoints.

Testing 
- Configured llama stack from scratch, set `remote::runpod` as a
inference provider.
- Added Runpod Endpoint URL and API key. 
- Started llama-stack server - llama stack run my-local-stack --port
3000
```
curl http://localhost:5000/inference/chat_completion \
-H "Content-Type: application/json" \
-d '{
	"model": "Llama3.1-8B-Instruct",
	"messages": [
		{"role": "system", "content": "You are a helpful assistant."},
		{"role": "user", "content": "Write me a 2 sentence poem about the moon"}
	],
	"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
}' ```

---------

Signed-off-by: pandyamarut <pandyamarut@gmail.com>
2025-01-23 12:19:02 -08:00
Dinesh Yeduguru
86466b71a9
update docs for adding new API providers (#855)
# What does this PR do?

update docs for adding new API providers
![Screenshot 2025-01-23 at 11 21
42 AM](https://github.com/user-attachments/assets/0d4621d4-ef7e-43cd-9c4a-3e8e0b49242f)
2025-01-23 12:05:57 -08:00
Dinesh Yeduguru
d0be9288a3
Llama_Stack_Building_AI_Applications.ipynb -> getting_started.ipynb (#854)
Llama_Stack_Building_AI_Applications.ipynb -> getting_started.ipynb
2025-01-23 12:04:06 -08:00
Hardik Shah
a10cdc7cdb
Update README.md 2025-01-23 12:00:01 -08:00
Hardik Shah
74e933cbfd
More Updates to Read the Docs (#856) 2025-01-23 11:39:33 -08:00
Dinesh Yeduguru
8a686270e9
remove getting started notebook (#853)
# What does this PR do?

This notebook is no longer updated and we should be using
https://github.com/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Building_AI_Applications.ipynb
2025-01-23 10:09:09 -08:00
Hardik Shah
25a70ca4dc
Fixed distro documentation (#852)
More docs
2025-01-23 08:19:51 -08:00
raghotham
e44a1a68f1
Delete docs/to_situate directory (#851)
# What does this PR do?

No need for the cookbook now. Removing the folder

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-23 07:15:47 -08:00
Sixian Yi
bfbd773b54 remove test report 2025-01-23 01:06:39 -08:00
Sixian Yi
82a28f3a24
update doc for client-sdk testing (#849)
As title


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-23 00:17:16 -08:00
Ashwin Bharambe
3d14a3d46f Kill colons 2025-01-22 22:59:30 -08:00
Aidan Do
910717c1fd
Add vLLM raw completions API (#823)
# What does this PR do?

Adds raw completions API to vLLM

## Test Plan

<details>
<summary>Setup</summary>

```bash
# Run vllm server
conda create -n vllm python=3.12 -y
conda activate vllm
pip install vllm

# Run llamastack
conda create --name llamastack-vllm python=3.10
conda activate llamastack-vllm

export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct && \
pip install -e . && \
pip install --no-cache --index-url https://pypi.org/simple/ --extra-index-url https://test.pypi.org/simple/ llama-stack==0.1.0rc7 && \
llama stack build --template remote-vllm --image-type conda && \
llama stack run ./distributions/remote-vllm/run.yaml \
  --port 5000 \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env VLLM_URL=http://localhost:8000/v1 | tee -a llama-stack.log
```
</details>

<details>
<summary>Integration</summary>

```bash
# Run
conda activate llamastack-vllm
export VLLM_URL=http://localhost:8000/v1
pip install pytest pytest_html pytest_asyncio aiosqlite
pytest llama_stack/providers/tests/inference/test_text_inference.py -v -k vllm

# Results
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_model_list[-vllm_remote] PASSED            [ 11%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion[-vllm_remote] PASSED            [ 22%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_logprobs[-vllm_remote] SKIPPED  [ 33%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output[-vllm_remote] SKIPPED [ 44%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-vllm_remote] PASSED [ 55%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_structured_output[-vllm_remote] PASSED     [ 66%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-vllm_remote] PASSED [ 77%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling[-vllm_remote] PASSED [ 88%]
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_with_tool_calling_streaming[-vllm_remote] PASSED [100%]

====================================== 7 passed, 2 skipped, 99 deselected, 1 warning in 9.80s ======================================
```
</details>

<details>
<summary>Manual</summary>

```bash
# Install
pip install --no-cache --index-url https://pypi.org/simple/ --extra-index-url https://test.pypi.org/simple/ llama-stack==0.1.0rc7
```

Apply this diff
```diff
diff --git a/llama_stack/distribution/server/server.py b/llama_stack/distribution/server/server.py
index 8dbb193..95173e2 100644
--- a/llama_stack/distribution/server/server.py
+++ b/llama_stack/distribution/server/server.py
@@ -250,7 +250,7 @@ class ClientVersionMiddleware:
                     server_version_parts = tuple(
                         map(int, self.server_version.split(".")[:2])
                     )
-                    if client_version_parts != server_version_parts:
+                    if False and client_version_parts != server_version_parts:
 
                         async def send_version_error(send):
                             await send(
diff --git a/llama_stack/templates/remote-vllm/run.yaml b/llama_stack/templates/remote-vllm/run.yaml
index 4eac4da..32eb50e 100644
--- a/llama_stack/templates/remote-vllm/run.yaml
+++ b/llama_stack/templates/remote-vllm/run.yaml
@@ -94,7 +94,8 @@ metadata_store:
   type: sqlite
   db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/remote-vllm}/registry.db
 models:
-- metadata: {}
+- metadata: 
+    llama_model: meta-llama/Llama-3.2-3B-Instruct
   model_id: ${env.INFERENCE_MODEL}
   provider_id: vllm-inference
   model_type: llm
```

Test 1:

```python
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
    base_url="http://localhost:5000",
)

response = client.inference.completion(
    model_id="meta-llama/Llama-3.2-3B-Instruct",
    content="Hello, world client!",
)

print(response)
```

Test 2

```
from llama_stack_client import LlamaStackClient

client = LlamaStackClient(
    base_url="http://localhost:5000",
)

response = client.inference.completion(
    model_id="meta-llama/Llama-3.2-3B-Instruct",
    content="Hello, world client!",
    stream=True,
)

for chunk in response:
    print(chunk.delta, end="", flush=True)
```

```
 I'm excited to introduce you to our latest project, a comprehensive guide to the best coffee shops in [City]. As a coffee connoisseur, you're in luck because we've scoured the city to bring you the top picks for the perfect cup of joe.

In this guide, we'll take you on a journey through the city's most iconic coffee shops, highlighting their unique features, must-try drinks, and insider tips from the baristas themselves. From cozy cafes to trendy cafes, we've got you covered.

**Top 5 Coffee Shops in [City]**

1. **The Daily Grind**: This beloved institution has been serving up expertly crafted pour-overs and lattes for over 10 years. Their expert baristas are always happy to guide you through their menu, which features a rotating selection of single-origin beans from around the world...
```

</details>

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-22 22:58:27 -08:00
Ashwin Bharambe
4d7c8c797f Kill colons 2025-01-22 22:54:13 -08:00
Dinesh Yeduguru
28012c51bb
update docs for tools and telemetry (#846)
# What does this PR do?

Added a new Tools doc describing how to use tools and updated the main
building agents doc to point to the tools doc.
Also updated telemetry doc.


https://llama-stack.readthedocs.io/en/tools-doc/building_applications/tools.html
2025-01-22 22:50:29 -08:00
Ashwin Bharambe
35c71d5bbe
Update OpenAPI generator to output discriminator (#848)
oneOf should have discriminators so Stainless can generate better code

## Test Plan

Going to generate the SDK now and check.
2025-01-22 22:15:23 -08:00
Hardik Shah
65f07c3d63
Update Documentation (#838)
# What does this PR do?

Update README and other documentation


## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-22 20:38:52 -08:00
Ashwin Bharambe
6c205e1d5a Fix tool tests 2025-01-22 20:31:18 -08:00
Ashwin Bharambe
0bff6e1658 Move tool_runtime.memory -> tool_runtime.rag 2025-01-22 20:25:02 -08:00
Ashwin Bharambe
f3d8864c36 Rename builtin::memory -> builtin::rag 2025-01-22 20:22:51 -08:00
Sixian Yi
597869a2aa
add distro report (#847)
# What does this PR do?

Generate distro reports to cover inference, agents, and vector_io. 


## Test Plan

Report generated through `/opt/miniconda3/envs/stack/bin/pytest -s -v
tests/client-sdk/ --report`


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-22 19:20:49 -08:00
Ashwin Bharambe
23f1980f9c Fix meta-reference GPU implementation for inference 2025-01-22 18:31:59 -08:00
Ashwin Bharambe
f4b0f2af8b If initialization fails for library client, error the test 2025-01-22 18:12:15 -08:00
Ashwin Bharambe
72a1b27d01 nitpick 2025-01-22 18:09:46 -08:00
Ashwin Bharambe
a8345f5f76 Fix llama stack build docker creation to have correct entrypoint 2025-01-22 16:53:54 -08:00
Ashwin Bharambe
08dcb9e31e Accept "query_config" params for the RAG tool 2025-01-22 16:42:36 -08:00
Sixian Yi
f4f47970e5
[client sdk test] add options for inference_model, safety_shield, embedding_model (#843)
# What does this PR do?
Default inference_model for testing: "meta-llama/Llama-3.1-8B-Instruct"
Default vision inference_model for testing:
"meta-llama/Llama-3.2-11B-Vision-Instruct"


## Test Plan
`/opt/miniconda3/envs/stack/bin/pytest -s -v
--inference-model=meta-llama/Llama-3.2-3B-Instruct
tests/client-sdk/agents`


`/opt/miniconda3/envs/stack/bin/pytest -s -v
--embedding-model=all-MiniLM-L6-v2 tests/client-sdk/vector_io`

`/opt/miniconda3/envs/stack/bin/pytest -s -v
--safety-shield=meta-llama/Llama-Guard-3-1B tests/client-sdk/safety`

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-22 15:35:19 -08:00
Ashwin Bharambe
4dd4f09fc5 Rename a test and add some comments 2025-01-22 15:28:45 -08:00
Ashwin Bharambe
494e969f8d add a bunch of NBVAL SKIPs to unblock ugh 2025-01-22 15:28:45 -08:00
Hardik Shah
deab4f57dd
Improved report generation for providers (#844)
# What does this PR do?

Automates the model list check by querying the distro. 
Added support for both remote hosted and templates. 

## Test Plan
Run on a remote hosted distro via 
`LLAMA_STACK_BASE_URL="https://llamastack-preview.fireworks.ai" pytest
-s -v tests/client-sdk --report`
Run on a template via 
`LLAMA_STACK_CONFIG=fireworks pytest -s -v  tests/client-sdk --report`
2025-01-22 15:27:09 -08:00
Botao Chen
8738c3e5a7
fix experimental-post-training template (#842)
##  What does this PR do?
For the completion of https://github.com/meta-llama/llama-stack/pull/835


## Test Plan
llama stack build --template experimental-post-training --image-type
conda
llama stack run
llama_stack/templates/experimental-post-training/run.yaml
2025-01-22 15:04:05 -08:00
Ashwin Bharambe
82d942b501 Foo 2025-01-22 13:58:17 -08:00
Ashwin Bharambe
55d01339c2 Update notebook 2025-01-22 13:31:11 -08:00
Ashwin Bharambe
07b87365ab
[inference api] modify content types so they follow a more standard structure (#841)
Some small updates to the inference types to make them more standard

Specifically:
- image data is now located in a "image" subkey
- similarly tool call data is located in a "tool_call" subkey

The pattern followed is `dict(type="foo", foo=<...>)`
2025-01-22 12:16:18 -08:00