Commit graph

350 commits

Author SHA1 Message Date
Nathan Weinberg
0f14378135
fix: broken "core concepts" link in docs website (#940)
# What does this PR do?
The `core concepts` link on [this
page](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html)
is currently broken - this PR fixes that link

## Test Plan
Ran local docs build as described
[here](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md#building-the-documentation)

## Sources
N/A

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-02-03 13:46:34 -08:00
Nathan Weinberg
1e36721686
fix: broken link in Quick Start doc (#943)
# What does this PR do?
Ollama download link is broken on this page:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html

## Test Plan
N/A

## Sources
https://ollama.com/docs/installation ==> 404
https://ollama.com/download

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-02-03 13:45:35 -08:00
Ashwin Bharambe
ccf0cbb903 Update release pointer 2025-02-02 12:11:57 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Hardik Shah
a7b929f17e
Sec fixes as raised by bandit (#917)
minor fixes to hashlib and jinja
2025-01-31 13:44:26 -08:00
snova-edwardm
7fe2592795
SambaNova supports Llama 3.3 (#905)
# What does this PR do?

- Fix typo
- Support Llama 3.3 70B

## Test Plan

Run the following scripts and obtain the test results

Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={API_KEY}
```

Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED

=========================================== 1 passed, 1 warning in 1.26s ============================================
```

Script
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={API_KEY}
```

Result
```
llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED

=========================================== 1 passed, 1 warning in 0.52s ============================================
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [N] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [N] Wrote necessary unit or integration tests.
2025-01-30 09:24:46 -08:00
Yuan Tang
d5b7de3897
Fix link to selection guide and change "docker" to "container" (#898)
The current link doesn't work. Also changed docs to be consistent with
https://github.com/meta-llama/llama-stack/pull/802.
2025-01-29 11:59:40 -08:00
Ashwin Bharambe
d123e9d3d7 Update docs for RAG and improve CONTRIBUTING.md 2025-01-28 06:09:48 -08:00
Chris Khanoyan
5b0d778871
Update index.md (#888)
Fixing the bullets

# What does this PR do?

The bullets were not there as intended so I helped fix them. 

- [x] Addresses issue (#issue)

## Test Plan

Please describe:

Ran the test, and the bullets are there now to be consistent with the
page.

## Sources

N/A

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-28 04:55:41 -08:00
Bakunga Bronson
7de46e40f9
Fixed multiple typos (#878)
# What does this PR do?

In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-24 14:45:43 -08:00
Bakunga Bronson
33113139e8
Fixed typo (#877)
# What does this PR do?

In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-24 13:16:00 -08:00
Hardik Shah
2cebb24d3a
Update doc templates for running safety on self-hosted templates (#874) 2025-01-24 11:28:20 -08:00
Ashwin Bharambe
eaba6a550a Point to 0.1.0 release notes in docs 2025-01-24 10:00:16 -08:00
Ashwin Bharambe
19521cb22e More doc updates 2025-01-24 09:22:15 -08:00
Ashwin Bharambe
2118f37350 Doc updates 2025-01-23 21:31:18 -08:00
Ashwin Bharambe
9351a4b2d7 Update documentation 2025-01-23 17:10:57 -08:00
ehhuang
2fefe8dacd
Update 'first RAG agent' in gettingstarted doc (#867)
# What does this PR do?

Fix documentation to reflect new API


## Test Plan
Before:

User> What are the top 5 topics that were explained? Only list succinct
bullet points.
inference> I'm ready to help, but we haven't discussed any topics yet!
This is the start of our conversation. What would you like to talk
about? I can summarize our discussion at the end if you'd like.


Run with the change, observe relevant response

<img width="1029" alt="image"
src="https://github.com/user-attachments/assets/a7dece3c-e8b4-4a60-9092-ba544c87dffd"
/>



## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Co-authored-by: Eric Huang (AI Platform) <erichuang@fb.com>
2025-01-23 17:02:04 -08:00
Dinesh Yeduguru
ebffa15f40
update python sdk reference (#866)
# What does this PR do?

syncs changes from
https://github.com/stainless-sdks/llama-stack-python/blob/main/api.md
2025-01-23 16:04:06 -08:00
Dinesh Yeduguru
c570a708bf
update the client reference (#864)
# What does this PR do?

Syncs changes from
https://github.com/meta-llama/llama-stack-client-python/pull/96
2025-01-23 15:32:16 -08:00
Hardik Shah
94ffaf468c
More updates to ReadTheDocs (#861)
Improve Contributing section
2025-01-23 12:50:38 -08:00
Dinesh Yeduguru
7df40da5fa
sync readme.md to index.md (#860)
# What does this PR do?

README has some new content that is being synced to index.md
2025-01-23 12:43:09 -08:00
Hardik Shah
a6a4270eef
Updates to ReadTheDocs (#859)
Move evals section to AI Agents section 
drop from top level and other minor fixes
2025-01-23 12:42:15 -08:00
snova-edwardm
22dc684da6
Sambanova inference provider (#555)
# What does this PR do?

This PR adds SambaNova as one of the Provider

- Add SambaNova as a provider

## Test Plan
Test the functional command
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key>
```

Test the distribution template:
```
# Docker
LLAMA_STACK_PORT=5001
docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-sambanova \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY

# Conda
llama stack build --template sambanova --image-type conda
llama stack run ./run.yaml \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY
```

## Source
[SambaNova API Documentation](https://cloud.sambanova.ai/apis)

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [Y ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-01-23 12:20:28 -08:00
Dinesh Yeduguru
86466b71a9
update docs for adding new API providers (#855)
# What does this PR do?

update docs for adding new API providers
![Screenshot 2025-01-23 at 11 21
42 AM](https://github.com/user-attachments/assets/0d4621d4-ef7e-43cd-9c4a-3e8e0b49242f)
2025-01-23 12:05:57 -08:00
Dinesh Yeduguru
d0be9288a3
Llama_Stack_Building_AI_Applications.ipynb -> getting_started.ipynb (#854)
Llama_Stack_Building_AI_Applications.ipynb -> getting_started.ipynb
2025-01-23 12:04:06 -08:00
Hardik Shah
74e933cbfd
More Updates to Read the Docs (#856) 2025-01-23 11:39:33 -08:00
Hardik Shah
25a70ca4dc
Fixed distro documentation (#852)
More docs
2025-01-23 08:19:51 -08:00
Sixian Yi
82a28f3a24
update doc for client-sdk testing (#849)
As title


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-23 00:17:16 -08:00
Dinesh Yeduguru
28012c51bb
update docs for tools and telemetry (#846)
# What does this PR do?

Added a new Tools doc describing how to use tools and updated the main
building agents doc to point to the tools doc.
Also updated telemetry doc.


https://llama-stack.readthedocs.io/en/tools-doc/building_applications/tools.html
2025-01-22 22:50:29 -08:00
Hardik Shah
65f07c3d63
Update Documentation (#838)
# What does this PR do?

Update README and other documentation


## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-22 20:38:52 -08:00
Ashwin Bharambe
f3d8864c36 Rename builtin::memory -> builtin::rag 2025-01-22 20:22:51 -08:00
Ashwin Bharambe
c9e5578151
[memory refactor][5/n] Migrate all vector_io providers (#835)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This PR finishes off all the stragglers and migrates everything to the
new naming.
2025-01-22 10:17:59 -08:00
Dinesh Yeduguru
3d4c53dfec
add mcp runtime as default to all providers (#816)
# What does this PR do?

This is needed to have the notebook work with MCP
2025-01-17 16:40:58 -08:00
Yuan Tang
6da3053c0e
More generic image type for OCI-compliant container technologies (#802)
It's a more generic term and applicable to alternatives of Docker, such
as Podman or other OCI-compliant technologies.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-17 16:37:42 -08:00
Xi Yan
9d005154d7
fix vllm template (#813)
# What does this PR do?

- Fix vLLM template to resolve
https://github.com/meta-llama/llama-stack/issues/805
- Fix agents test with shields

## Test Plan

```
vllm serve meta-llama/Llama-3.1-8B-Instruct
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack run ./llama_stack/templates/remote-vllm/run.yaml
```

```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v ./tests/client-sdk/
```

<img width="1245" alt="image"
src="https://github.com/user-attachments/assets/9af27684-5a9c-4187-b338-cbfc5211bd99"
/>


- custom tool flaky due to model outputs
- /completions API not implemented

**Vision Model**
- 11B-Vision-Instruct
<img width="1240" alt="image"
src="https://github.com/user-attachments/assets/1d3b3b17-fa09-43a7-b56c-3f77263825c5"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-17 15:34:29 -08:00
Paul McCarthy
e1decaec9d
Fixing small typo in quick start guide (#807)
# What does this PR do?

Fixing small typo in the quick start guide

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
2025-01-17 11:15:55 -08:00
Xi Yan
d1f3b032c9
cerebras template update for memory (#792)
# What does this PR do?

- we no longer have meta-reference as memory provider, update cerebras
template


## Test Plan

```
python llama_stack/scripts/distro_codegen.py
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 16:07:53 -08:00
Ashwin Bharambe
03ac84a829 Update default port from 5000 -> 8321 2025-01-16 15:26:48 -08:00
Xi Yan
a6b9f2cec7
fix cerebras template (#790)
# What does this PR do?

- fix cerebras template

## Test Plan

```
llama stack build --template cerebras --image-type conda
llama stack run cerebras
LLAMA_STACK_BASE_URL="http://localhost:5000" pytest -v tests/client-sdk/ --html=report.html --self-contained-html
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 13:53:06 -08:00
Xi Yan
b76bef169c
fix nvidia inference provider (#781)
# What does this PR do?

- fixes to nvidia inference provider to account for strategy update
- update nvidia templates

## Test Plan

```
llama stack run ./llama_stack/templates/nvidia/run.yaml --port 5000

LLAMA_STACK_BASE_URL="http://localhost:5000" pytest -v tests/client-sdk/inference/test_inference.py --html=report.html --self-contained-html
```
<img width="1288" alt="image"
src="https://github.com/user-attachments/assets/d20f9aea-525e-47de-a5be-586e022e0d55"
/>

**NOTE**
- vision inference broken
- tool calling broken
- /completion broken

cc @mattf @cdgamarose-nv  for improving NVIDIA inference adapter

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-15 18:49:36 -08:00
cdgamarose-nv
b3202bcf77
add nvidia distribution (#565)
# What does this PR do?

adds nvidia template for creating a distribution using inference adapter
for NVIDIA NIMs.

## Test Plan

Please describe:
Build llama stack distribution for nvidia using the template, docker and
conda.
```bash
(.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client configure --endpoint http://localhost:5000
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5000
(.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client models list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ identifier                       ┃ provider_id ┃ provider_resource_id       ┃ metadata ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Llama3.1-8B-Instruct             │ nvidia      │ meta/llama-3.1-8b-instruct │ {}       │
│ meta-llama/Llama-3.2-3B-Instruct │ nvidia      │ meta/llama-3.2-3b-instruct │ {}       │
└──────────────────────────────────┴─────────────┴────────────────────────────┴──────────┘
(.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client inference chat-completion --message "hello, write me a 2 sentence poem"
ChatCompletionResponse(
    completion_message=CompletionMessage(
        content='Here is a 2 sentence poem:\n\nThe sun sets slow and paints the sky, \nA gentle hue of pink that makes me sigh.',
        role='assistant',
        stop_reason='end_of_turn',
        tool_calls=[]
    ),
    logprobs=None
)
```

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-01-15 14:04:43 -08:00
Hardik Shah
a51c8b4efc
Convert SamplingParams.strategy to a union (#767)
# What does this PR do?

Cleans up how we provide sampling params. Earlier, strategy was an enum
and all params (top_p, temperature, top_k) across all strategies were
grouped. We now have a strategy union object with each strategy (greedy,
top_p, top_k) having its corresponding params.
Earlier, 
```
class SamplingParams: 
    strategy: enum ()
    top_p, temperature, top_k and other params
```
However, the `strategy` field was not being used in any providers making
it confusing to know the exact sampling behavior purely based on the
params since you could pass temperature, top_p, top_k and how the
provider would interpret those would not be clear.

Hence we introduced -- a union where the strategy and relevant params
are all clubbed together to avoid this confusion.

Have updated all providers, tests, notebooks, readme and otehr places
where sampling params was being used to use the new format.
   

## Test Plan
`pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py`
// inference on ollama, fireworks and together 
`with-proxy pytest -v -s -k "ollama"
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/inference/test_text_inference.py `
// agents on fireworks 
`pytest -v -s -k 'fireworks and create_agent'
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/agents/test_agents.py
--safety-shield="meta-llama/Llama-Guard-3-8B"`

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [X] Updated relevant documentation.
- [X] Wrote necessary unit or integration tests.

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-01-15 05:38:51 -08:00
Yuan Tang
300e6e2702
Fix issue when generating distros (#755)
Addressed comment
https://github.com/meta-llama/llama-stack/pull/723#issuecomment-2581902075.

cc @yanxi0830 

I am not 100% sure if the diff is correct though but this is the result
of running `python llama_stack/scripts/distro_codegen.py`.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-15 05:34:08 -08:00
Henry Tu
f320eede2b
Update Cerebras docs to include header (#704)
# What does this PR do?

I noticed that the documentation for other providers have this header,
so I have added it to the Cerebras docs too.
```
---
orphan: true
---

# TGI Distribution

```{toctree}
:maxdepth: 2
:hidden:

self
    ```
```

This also fixes a typo in README.md where the link to the Cerebras docs included an extra `getting_started` section.

I did notice however that https://hub.docker.com/r/llamastack/distribution-cerebras still does not exist. How do I get the Cerebras Docker image uploaded?

cc: @ashwinb @raghotham 

## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-13 20:18:34 -08:00
Yufei (Benny) Chen
1cc137cf9c
[Fireworks] Update model name for Fireworks (#753)
# What does this PR do?

Fix https://github.com/meta-llama/llama-stack/issues/697


## Test Plan
Run the 405b model. the full `accounts/fireworks/models/<model_id>` is
the full model name for Fireworks, the 'fireworks/<model_id>' is just a
short hand and sometimes have routing issues

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-13 15:53:57 -08:00
Botao Chen
78727aad26
Improve model download doc (#748)
## context
The documentation around model download from meta source part
https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html#downloading-from-meta
confused me and another colleague because we met
[issue](https://github.com/meta-llama/llama-stack/issues/746) during
downloading.

After some debugging, I found that we need to quote META_URL in the
command. To avoid other users have the same confusion, I updated the doc
tor make it more clear

## test 

before 
![Screenshot 2025-01-12 at 11 48
37 PM](https://github.com/user-attachments/assets/960a8793-4d32-44b0-a099-6214be7921b6)

after
![Screenshot 2025-01-12 at 11 40
02 PM](https://github.com/user-attachments/assets/8dfe5e36-bdba-47ef-a251-ec337d12e2be)
2025-01-13 00:39:12 -08:00
raghotham
ff182ff6de
rename LLAMASTACK_PORT to LLAMA_STACK_PORT for consistency with other env vars (#744)
# What does this PR do?

Rename environment var for consistency

## Test Plan

No regressions

## Sources

## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [X] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-10 11:09:49 -08:00
Yuan Tang
203d36e2db
Fixed typo in default VLLM_URL in remote-vllm.md (#723)
Fixed a small typo.
2025-01-09 22:34:34 -08:00
Dinesh Yeduguru
a5c57cd381
agents to use tools api (#673)
# What does this PR do?

PR #639 introduced the notion of Tools API and ability to invoke tools
through API just as any resource. This PR changes the Agents to start
using the Tools API to invoke tools. Major changes include:
1) Ability to specify tool groups with AgentConfig
2) Agent gets the corresponding tool definitions for the specified tools
and pass along to the model
3) Attachements are now named as Documents and their behavior is mostly
unchanged from user perspective
4) You can specify args that can be injected to a tool call through
Agent config. This is especially useful in case of memory tool, where
you want the tool to operate on a specific memory bank.
5) You can also register tool groups with args, which lets the agent
inject these as well into the tool call.
6) All tests have been migrated to use new tools API and fixtures
including client SDK tests
7) Telemetry just works with tools API because of our trace protocol
decorator


## Test Plan
```
pytest -s -v -k fireworks llama_stack/providers/tests/agents/test_agents.py  \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct

pytest -s -v -k together  llama_stack/providers/tests/tools/test_tools.py \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct

LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py
```
run.yaml:
https://gist.github.com/dineshyv/0365845ad325e1c2cab755788ccc5994

Notebook:
https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d?usp=sharing
2025-01-08 19:01:00 -08:00
Xi Yan
a5e6f10e33
fix links for distro (#733)
# What does this PR do?

- fix links for distro docs


## Test Plan

<img width="653" alt="image"
src="https://github.com/user-attachments/assets/a546a11e-2071-4d72-8232-8f30552b7341"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-08 14:47:09 -08:00