Commit graph

331 commits

Author SHA1 Message Date
Jeff Tang
162cfb280e added note of the image understanding working with LS 0.1.0 and 0.1.2 2025-02-09 09:27:15 -08:00
Jeff Tang
44f1a4fd5c fix of the agent image understanding example error for LS 0.1.2 2025-02-09 09:24:15 -08:00
raghotham
7766e68e92
docs: update index.md for 0.1.2 (#1013)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
2025-02-07 15:36:20 -08:00
Jeff Tang
a229de6d1e
Getting started notebook update (#936)
# What does this PR do?

Added examples (Section 4) of using Llama Stack 0.1 distro on together
and Llama 3.2 to answer questions about an image with LS Chat and Agent
APIs.
2025-02-07 15:36:15 -08:00
Ashwin Bharambe
62e5461da7 No spaces in ipynb tests 2025-02-07 11:56:22 -08:00
Ashwin Bharambe
a8820597ee Minor clean up of notebook 2025-02-07 11:36:29 -08:00
ehhuang
af15426ad7
doc: getting started notebook (#996)
# What does this PR do?

Fix link

## Test Plan

<!--
Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.
-->

<!--
## Sources

Please link relevant resources if necessary.

-->

<!--
## Documentation

- [ ] Added a
[Changelog](https://github.com/meta-llama/llama-stack/blob/main/CHANGELOG.md)
entry if the change is significant (new feature, breaking change etc.).

-->
2025-02-06 17:30:21 -08:00
Hardik Shah
28a0fe57cc
fix: Update rag examples to use fresh faiss index every time (#998)
# What does this PR do?
In several examples we use the same faiss index , which means running it
multiple times fills up the index with duplicates which eventually
degrades the model performance on RAG as multiple copies of the same
irrelevant chunks might be picked up several times.

Fix is to ensure we create a new index each time. 

Resolves issue in this discussion -
https://github.com/meta-llama/llama-stack/discussions/995

## Test Plan
Re-ran the getting started guide multiple times to see the same output

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 16:12:29 -08:00
Maxime Lecanu
e964ec95e9
docs: Correct typos in Zero to Hero guide (#997)
# What does this PR do?

<!-- Provide a short summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue. -->
Corrects some typographical errors found in the
`docs/zero_to_hero_guide/README.md` file.

<!-- Uncomment this section with the issue number if an issue is being
resolved
**Issue resolved by this Pull Request:** Closes #
--->


## Test Plan

<!--
Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.
-->
N/A


<!--
## Sources

Please link relevant resources if necessary. 

-->

<!--
## Documentation

- [ ] Added a
[Changelog](https://github.com/meta-llama/llama-stack/blob/main/CHANGELOG.md)
entry if the change is significant (new feature, breaking change etc.).

-->

Co-authored-by: Maxime Lecanu <mlecanu@fb.com>
2025-02-06 17:29:52 -05:00
Hardik Shah
a84e7669f0
feat: Add a new template for dell (#978)
- Added new template `dell` and its documentation 
- Update docs 
- [minor] uv fix i came across 
- codegen for all templates 

Tested with 

```bash
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=6601
export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT](about:blank)
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321

# build the stack template 
llama stack build --template=dell 

# start the TGI inference server 
podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0

# start chroma-db for vector-io ( aka RAG )
podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname)

# build docker 
llama stack build --template=dell --image-type=container

# run llama stack server ( via docker )
podman run -it \
--network host \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes 
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \
--port $LLAMA_STACK_PORT  \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL

# test the server 
cd <PATH_TO_LLAMA_STACK_REPO>
LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py

```

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 14:14:39 -08:00
Yuan Tang
09ed0e9c9f
Add Kubernetes deployment guide (#899)
This PR moves some content from [the recent blog
post](https://blog.vllm.ai/2025/01/27/intro-to-llama-stack-with-vllm.html)
to here as a more official guide for users who'd like to deploy Llama
Stack on Kubernetes.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-06 10:28:02 -08:00
ehhuang
3922999118
sys_prompt support in Agent (#938)
# What does this PR do?

The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.

This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.

- [ ] Addresses issue (#issue)


## Test Plan


LLAMA_STACK_CONFIG=together pytest
\-\-inference\-model=meta\-llama/Llama\-3\.3\-70B\-Instruct -s -v
tests/client-sdk/agents/test_agents.py::test_override_system_message_behavior


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-05 21:11:32 -08:00
Nathan Weinberg
e777d965a1
docs: add addn server guidance for Linux users in Quick Start (#972)
# What does this PR do?

- [x] Addresses issue #971


## Test Plan
Ran docs build locally

## Sources
See discussion linked in the issue

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
Co-authored-by: Mert Parker <mertpaker@gmail.com>
2025-02-05 20:57:51 -08:00
Ihar Hrachyshka
f4343f7dc0
docs: clarify host.docker.internal works for recent podman (#977)
The host.docker.internal alias was implemented in podman 4.7.0:


b672ddc792

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?

Follow-up to previous podman specific doc update.

## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-05 16:02:05 -08:00
Aakanksha Duggal
8fa642835b
Fix README.md notebook links (#976)
# What does this PR do?

In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
2025-02-05 14:33:46 -08:00
Ryan Cook
2d9c8b549e
docs: missing T in import (#974)
# What does this PR do?

Missing T in import

## Test Plan

N/A doc update

## Sources

Please link relevant resources if necessary.


## Before submitting

- [X ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-05 17:06:39 -05:00
Kamesh Akella
d9c0b4e3ba
[docs] update the zero_to_hero_guide llama stack version to 0.1.0 (#960)
# What does this PR do?

The Zero to Hero guide currently references an older 0.0.61 llama-stack
version. Using the most recent stable release of the product in the
documentation, would help the users not to go through any issues from
the older llama-stack versions.

## Test Plan

I have ran the workflow locally using the proposed version change and I
am able to proceed further ahead without any issue.

## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-05 11:49:26 -08:00
Ihar Hrachyshka
5c8e35a9e2
docs, tests: replace datasets.rst with memory_optimizations.rst (#968)
datasets.rst was removed from torchtune repo.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?

Replace a missing 404 document with another one that exists. (Removed it
from
the list when memory_optimizations.rst was already pulled.)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-05 11:25:56 -05:00
Ihar Hrachyshka
529708215c
[docs] Make RAG example self-contained (#962)
Before the patch, the example could not be executed verbatim without
copy-pasting client function from the inference example. I think it's
better to have examples self-contained, especially in a getting started
guide.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>

# What does this PR do?

See above.

## Test Plan

Confirmed example can now be executed verbatim.

## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-04 16:22:50 -08:00
Ashwin Bharambe
474c4bdd7a
Make a couple properties optional (#963) 2025-02-04 16:20:24 -08:00
Ihar Hrachyshka
0cbb3e401c
docs: miscellaneous small fixes (#961)
- **[docs] Fix misc typos and formatting issues in intro docs**
- **[docs]: Export variables (e.g. INFERENCE_MODEL) in getting_started**
- **[docs] Show that `llama-stack-client configure` will ask for api
key**

# What does this PR do?

Miscellaneous fixes in the documentation; not worth reporting an issue.

## Test Plan

No code changes. Addressed issues spotted when walking through the
guide.
Confirmed locally.

## Sources

Please link relevant resources if necessary.

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-04 15:31:30 -08:00
Bill Murdock
b0dec797a0
Add Podman instructions to Quick Start (#957)
Podman is a popular alternative to Docker, so it would be nice to make
it clear that it can also be used to deploy the container for the
server. The instructions are a little different because you have to
create the directory (unlike with Docker which makes the directory for
you).

# What does this PR do?

- [ ] Add Podman instructions to Quick Start

## Test Plan

Documentation only.


## Sources

I tried it out and it worked.

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-04 14:37:02 -08:00
Ashwin Bharambe
d67401c644 Several documentation fixes and fix link to API reference 2025-02-04 14:00:43 -08:00
Charlie Doern
26aef50bc5
if client.initialize fails, the example should exit (#954)
# What does this PR do?

the example script can gracefully exit if the boolean returned from
initialize is used properly

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-04 13:54:21 -08:00
Ashwin Bharambe
b17277b06a Fix the OpenAPI HTML 2025-02-04 10:38:49 -08:00
ehhuang
c9ab72fa82
Support sys_prompt behavior in inference (#937)
# What does this PR do?

The current default system prompt for llama3.2 tends to overindex on
tool calling and doesn't work well when the prompt does not require tool
calling.

This PR adds an option to override the default system prompt, and
organizes tool-related configs into a new config object.

- [ ] Addresses issue (#issue)


## Test Plan

python -m unittest
llama_stack.providers.tests.inference.test_prompt_adapter


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937).
* #938
* __->__ #937
2025-02-03 23:35:16 -08:00
Xi Yan
62cd3c391e notebook point to github as source of truth 2025-02-03 15:08:25 -08:00
Ashwin Bharambe
753a1aa7bc Update colab link to be pointing back to github source 2025-02-03 15:00:21 -08:00
Ashwin Bharambe
aefd5bb619 Test notebook update 2025-02-03 14:59:06 -08:00
Nathan Weinberg
7a72082cdd
fix: formatting for ollama note in Quick Start doc (#945)
# What does this PR do?
Fixes formatting for Ollama note found here:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#start-ollama

- [ ] Addresses issue (#issue)

## Test Plan
Ran local docs build as described
[here](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md#building-the-documentation)

## Sources
N/A

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-02-03 14:13:57 -08:00
Ashwin Bharambe
f98efe68c9
Misc fixes (#944)
- Make sure torch + torchvision go together as deps, otherwise bad stuff
happens
- Add a pre-commit for requirements.txt
2025-02-03 14:08:47 -08:00
Nathan Weinberg
0f14378135
fix: broken "core concepts" link in docs website (#940)
# What does this PR do?
The `core concepts` link on [this
page](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html)
is currently broken - this PR fixes that link

## Test Plan
Ran local docs build as described
[here](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md#building-the-documentation)

## Sources
N/A

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-02-03 13:46:34 -08:00
Nathan Weinberg
1e36721686
fix: broken link in Quick Start doc (#943)
# What does this PR do?
Ollama download link is broken on this page:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html

## Test Plan
N/A

## Sources
https://ollama.com/docs/installation ==> 404
https://ollama.com/download

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-02-03 13:45:35 -08:00
Ashwin Bharambe
ccf0cbb903 Update release pointer 2025-02-02 12:11:57 -08:00
Ashwin Bharambe
7fdbd5b642 Add NBVAL skips to the getting started notebook 2025-02-02 07:53:07 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Hardik Shah
a7b929f17e
Sec fixes as raised by bandit (#917)
minor fixes to hashlib and jinja
2025-01-31 13:44:26 -08:00
Xi Yan
15dcc4ea5e
openapi gen return type fix for streaming/non-streaming (#910)
# What does this PR do?

We need to change

```yaml
/v1/inference/chat-completion:
    post:
      responses:
        '200':
          description: >-
            If stream=False, returns a ChatCompletionResponse with the full completion.
            If stream=True, returns an SSE event stream of ChatCompletionResponseStreamChunk
          content:
            text/event-stream:
              schema:
                oneOf:
                  - $ref: '#/components/schemas/ChatCompletionResponse'
                  - $ref: '#/components/schemas/ChatCompletionResponseStreamChunk'
```

into

```yaml
/v1/inference/chat-completion:
    post:
      responses:
        '200':
          description: >-
            If stream=False, returns a ChatCompletionResponse with the full completion.
            If stream=True, returns an SSE event stream of ChatCompletionResponseStreamChunk
          content:
            text/event-stream:
              schema:
                $ref: '#/components/schemas/ChatCompletionResponseStreamChunk'
            application/json:
              schema:
                $ref: '#/components/schemas/ChatCompletionResponse'
```

## Test Plan

**Python**
- tested in SDK sync:
https://github.com/meta-llama/llama-stack-client-python/pull/108

**Node**
- tested w/
https://gist.github.com/yanxi0830/b782f4b91e21dcccdfef8898ce55157e (SDK
udpate follow up)


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-30 18:03:02 -08:00
Xi Yan
94051cfe9e
fix ImageContentItem to take base64 string as image.data (#909)
# What does this PR do?

- Discussion in
https://github.com/meta-llama/llama-stack/pull/906#discussion_r1936260819

- image.data should accept base64 string as input instead of binary
bytes, change prompt_adapter to account for that.

## Test Plan

```
pytest -v tests/client-sdk/inference/test_inference.py
```

with test in https://github.com/meta-llama/llama-stack/pull/906

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-30 15:58:23 -08:00
snova-edwardm
7fe2592795
SambaNova supports Llama 3.3 (#905)
# What does this PR do?

- Fix typo
- Support Llama 3.3 70B

## Test Plan

Run the following scripts and obtain the test results

Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={API_KEY}
```

Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED

=========================================== 1 passed, 1 warning in 1.26s ============================================
```

Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={API_KEY}
```

Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED

=========================================== 1 passed, 1 warning in 0.52s ============================================
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [N] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [N] Wrote necessary unit or integration tests.
2025-01-30 09:24:46 -08:00
Yuan Tang
d5b7de3897
Fix link to selection guide and change "docker" to "container" (#898)
The current link doesn't work. Also changed docs to be consistent with
https://github.com/meta-llama/llama-stack/pull/802.
2025-01-29 11:59:40 -08:00
Ashwin Bharambe
0d96070af9
Update OpenAPI generator to add param and field documentation (#896)
We desperately need to document our APIs. This is the basic requirement
of having a Spec :)

This PR updates the OpenAPI generator so documentation for request
parameters and object fields can be properly added to the OpenAPI specs.
From there, this should get picked by Stainless, etc.

## Test Plan:

Updated client-sdk (See
https://github.com/meta-llama/llama-stack-client-python/pull/104) and
then ran:

```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=../../llama_stack/templates/fireworks/run.yaml pytest -s -v inference/test_inference.py agents/test_agents.py
```
2025-01-29 10:04:30 -08:00
Ashwin Bharambe
9f709387e2 Kill X-LlamaStack-{Client-Version, Provider-Data} from OpenAPI spec
ClientVersion: We don't need each SDK method to support this parameter
because you wouldn't be passing a different client version each time you
make an API call.

ProviderData: although in this case, you _could_ be passing different
API keys depending on which SDK call you make, it makes for a confusing
experience. It is best to initialize the LlamaStackClient with all the
keys which are then passed in each request.
2025-01-28 13:30:23 -08:00
Ashwin Bharambe
ec3ebb5bcf
Use ruamel.yaml to format the OpenAPI spec (#892)
Stainless ends up reformatting the YAML when we paste it in the Studio.
We cannot have that happen if we are going to ever partially automate
stainless config updates.

Try ruamel.yaml, specifically `block_seq_indent` to avoid that.
2025-01-28 11:27:40 -08:00
Ashwin Bharambe
d123e9d3d7 Update docs for RAG and improve CONTRIBUTING.md 2025-01-28 06:09:48 -08:00
Justin Lee
e4865c3510
adding readme to docs folder for easier discoverability of notebooks … (#857)
as titled

<img width="454" alt="image"
src="https://github.com/user-attachments/assets/7579d1d2-06cd-48e4-9659-79ab1ec6a4c2"
/>
2025-01-28 04:58:46 -08:00
Chris Khanoyan
5b0d778871
Update index.md (#888)
Fixing the bullets

# What does this PR do?

The bullets were not there as intended so I helped fix them. 

- [x] Addresses issue (#issue)

## Test Plan

Please describe:

Ran the test, and the bullets are there now to be consistent with the
page.

## Sources

N/A

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-28 04:55:41 -08:00
Ashwin Bharambe
e5936a8df8
Update discriminator to have the correct mapping (#881)
See
https://swagger.io/docs/specification/v3_0/data-models/inheritance-and-polymorphism/#discriminator

When specifying discriminators, mapping must be specified unless the
value of the discriminator is the subtype itself (which in our case is
not.)

The changes in the YAML are self-explanatory.
2025-01-27 09:18:13 -08:00
Bakunga Bronson
7de46e40f9
Fixed multiple typos (#878)
# What does this PR do?

In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-24 14:45:43 -08:00
Bakunga Bronson
33113139e8
Fixed typo (#877)
# What does this PR do?

In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-24 13:16:00 -08:00