Commit graph

275 commits

Author SHA1 Message Date
Dinesh Yeduguru
ebffa15f40
update python sdk reference (#866)
# What does this PR do?

syncs changes from
https://github.com/stainless-sdks/llama-stack-python/blob/main/api.md
2025-01-23 16:04:06 -08:00
Dinesh Yeduguru
c570a708bf
update the client reference (#864)
# What does this PR do?

Syncs changes from
https://github.com/meta-llama/llama-stack-client-python/pull/96
2025-01-23 15:32:16 -08:00
Hardik Shah
94ffaf468c
More updates to ReadTheDocs (#861)
Improve Contributing section
2025-01-23 12:50:38 -08:00
Dinesh Yeduguru
7df40da5fa
sync readme.md to index.md (#860)
# What does this PR do?

README has some new content that is being synced to index.md
2025-01-23 12:43:09 -08:00
Hardik Shah
a6a4270eef
Updates to ReadTheDocs (#859)
Move evals section to AI Agents section 
drop from top level and other minor fixes
2025-01-23 12:42:15 -08:00
snova-edwardm
22dc684da6
Sambanova inference provider (#555)
# What does this PR do?

This PR adds SambaNova as one of the Provider

- Add SambaNova as a provider

## Test Plan
Test the functional command
```
pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_embeddings.py llama_stack/providers/tests/inference/test_prompt_adapter.py llama_stack/providers/tests/inference/test_text_inference.py llama_stack/providers/tests/inference/test_vision_inference.py --env SAMBANOVA_API_KEY=<sambanova-api-key>
```

Test the distribution template:
```
# Docker
LLAMA_STACK_PORT=5001
docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-sambanova \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY

# Conda
llama stack build --template sambanova --image-type conda
llama stack run ./run.yaml \
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY
```

## Source
[SambaNova API Documentation](https://cloud.sambanova.ai/apis)

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [Y] Ran pre-commit to handle lint / formatting issues.
- [Y] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [Y] Updated relevant documentation.
- [Y ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-01-23 12:20:28 -08:00
Dinesh Yeduguru
86466b71a9
update docs for adding new API providers (#855)
# What does this PR do?

update docs for adding new API providers
![Screenshot 2025-01-23 at 11 21
42 AM](https://github.com/user-attachments/assets/0d4621d4-ef7e-43cd-9c4a-3e8e0b49242f)
2025-01-23 12:05:57 -08:00
Dinesh Yeduguru
d0be9288a3
Llama_Stack_Building_AI_Applications.ipynb -> getting_started.ipynb (#854)
Llama_Stack_Building_AI_Applications.ipynb -> getting_started.ipynb
2025-01-23 12:04:06 -08:00
Hardik Shah
74e933cbfd
More Updates to Read the Docs (#856) 2025-01-23 11:39:33 -08:00
Dinesh Yeduguru
8a686270e9
remove getting started notebook (#853)
# What does this PR do?

This notebook is no longer updated and we should be using
https://github.com/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Building_AI_Applications.ipynb
2025-01-23 10:09:09 -08:00
Hardik Shah
25a70ca4dc
Fixed distro documentation (#852)
More docs
2025-01-23 08:19:51 -08:00
raghotham
e44a1a68f1
Delete docs/to_situate directory (#851)
# What does this PR do?

No need for the cookbook now. Removing the folder

- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-23 07:15:47 -08:00
Sixian Yi
82a28f3a24
update doc for client-sdk testing (#849)
As title


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-23 00:17:16 -08:00
Dinesh Yeduguru
28012c51bb
update docs for tools and telemetry (#846)
# What does this PR do?

Added a new Tools doc describing how to use tools and updated the main
building agents doc to point to the tools doc.
Also updated telemetry doc.


https://llama-stack.readthedocs.io/en/tools-doc/building_applications/tools.html
2025-01-22 22:50:29 -08:00
Ashwin Bharambe
35c71d5bbe
Update OpenAPI generator to output discriminator (#848)
oneOf should have discriminators so Stainless can generate better code

## Test Plan

Going to generate the SDK now and check.
2025-01-22 22:15:23 -08:00
Hardik Shah
65f07c3d63
Update Documentation (#838)
# What does this PR do?

Update README and other documentation


## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-22 20:38:52 -08:00
Ashwin Bharambe
f3d8864c36 Rename builtin::memory -> builtin::rag 2025-01-22 20:22:51 -08:00
Ashwin Bharambe
494e969f8d add a bunch of NBVAL SKIPs to unblock ugh 2025-01-22 15:28:45 -08:00
Ashwin Bharambe
82d942b501 Foo 2025-01-22 13:58:17 -08:00
Ashwin Bharambe
55d01339c2 Update notebook 2025-01-22 13:31:11 -08:00
Ashwin Bharambe
07b87365ab
[inference api] modify content types so they follow a more standard structure (#841)
Some small updates to the inference types to make them more standard

Specifically:
- image data is now located in a "image" subkey
- similarly tool call data is located in a "tool_call" subkey

The pattern followed is `dict(type="foo", foo=<...>)`
2025-01-22 12:16:18 -08:00
Ashwin Bharambe
a63a43c646
[memory refactor][6/n] Update naming and routes (#839)
Making a few small naming changes as per feedback:

- RAGToolRuntime methods are called `insert` and `query` to keep them
more general
- The tool names are changed to non-namespaced forms
`insert_into_memory` and `query_from_memory`
- The REST endpoints are more REST-ful
2025-01-22 10:39:13 -08:00
Ashwin Bharambe
c9e5578151
[memory refactor][5/n] Migrate all vector_io providers (#835)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This PR finishes off all the stragglers and migrates everything to the
new naming.
2025-01-22 10:17:59 -08:00
Ashwin Bharambe
1a7490470a
[memory refactor][3/n] Introduce RAGToolRuntime as a specialized sub-protocol (#832)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Third part:
- we need to make `tool_runtime.rag_tool.query_context()` and
`tool_runtime.rag_tool.insert_documents()` methods work smoothly with
complete type safety. To that end, we introduce a sub-resource path
`tool-runtime/rag-tool/` and make changes to the resolver to make things
work.
- the PR updates the agents implementation to directly call these typed
APIs for memory accesses rather than going through the complex, untyped
"invoke_tool" API. the code looks much nicer and simpler (expectedly.)
- there are a number of hacks in the server resolver implementation
still, we will live with some and fix some

Note that we must make sure the client SDKs are able to handle this
subresource complexity also. Stainless has support for subresources, so
this should be possible but beware.

## Test Plan

Our RAG test is sad (doesn't actually test for actual RAG output) but I
verified that the implementation works. I will work on fixing the RAG
test afterwards.

```bash
pytest -s -v tests/agents/test_agents.py -k "rag and together" --safety-shield=meta-llama/Llama-Guard-3-8B
```
2025-01-22 10:04:16 -08:00
Dinesh Yeduguru
7a4b382ae9
add section for mcp tool usage in notebook (#831)
# What does this PR do?

Adds a section to the notebook on how to use tools hosted in MCP server.


![Screenshot 2025-01-21 at 11 05
39 AM](https://github.com/user-attachments/assets/23e900f1-e2a7-4a46-be9b-13642753dca1)
Notebook:
https://colab.research.google.com/drive/1hBKX01NlG6p2BUrBU0ynwIlWjXQRxc3k?usp=sharing

Rendered notebook on this branch:
https://github.com/meta-llama/llama-stack/blob/mcp-notebook/docs/notebooks/Llama_Stack_Building_AI_Applications.ipynb
2025-01-21 13:10:42 -08:00
Dinesh Yeduguru
3d4c53dfec
add mcp runtime as default to all providers (#816)
# What does this PR do?

This is needed to have the notebook work with MCP
2025-01-17 16:40:58 -08:00
Yuan Tang
6da3053c0e
More generic image type for OCI-compliant container technologies (#802)
It's a more generic term and applicable to alternatives of Docker, such
as Podman or other OCI-compliant technologies.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-17 16:37:42 -08:00
Xi Yan
9d005154d7
fix vllm template (#813)
# What does this PR do?

- Fix vLLM template to resolve
https://github.com/meta-llama/llama-stack/issues/805
- Fix agents test with shields

## Test Plan

```
vllm serve meta-llama/Llama-3.1-8B-Instruct
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.1-8B-Instruct" llama stack run ./llama_stack/templates/remote-vllm/run.yaml
```

```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v ./tests/client-sdk/
```

<img width="1245" alt="image"
src="https://github.com/user-attachments/assets/9af27684-5a9c-4187-b338-cbfc5211bd99"
/>


- custom tool flaky due to model outputs
- /completions API not implemented

**Vision Model**
- 11B-Vision-Instruct
<img width="1240" alt="image"
src="https://github.com/user-attachments/assets/1d3b3b17-fa09-43a7-b56c-3f77263825c5"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-17 15:34:29 -08:00
Paul McCarthy
e1decaec9d
Fixing small typo in quick start guide (#807)
# What does this PR do?

Fixing small typo in the quick start guide

## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
2025-01-17 11:15:55 -08:00
Dinesh Yeduguru
53b5f6b24a
add json_schema_type to ParamType deps (#808)
# What does this PR do?

Add missing json_schema_type annotation to ParamType deps
2025-01-17 11:02:25 -08:00
Xi Yan
c2a072911d
fix eval notebook & add test to workflow (#803) 2025-01-16 23:11:21 -08:00
Xi Yan
d1f3b032c9
cerebras template update for memory (#792)
# What does this PR do?

- we no longer have meta-reference as memory provider, update cerebras
template


## Test Plan

```
python llama_stack/scripts/distro_codegen.py
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 16:07:53 -08:00
Ashwin Bharambe
03ac84a829 Update default port from 5000 -> 8321 2025-01-16 15:26:48 -08:00
Hardik Shah
f1faa9c924 pop fix 2025-01-16 14:09:59 -08:00
Dinesh Yeduguru
fcd1a57429 update notebook 2025-01-16 14:00:48 -08:00
Xi Yan
a6b9f2cec7
fix cerebras template (#790)
# What does this PR do?

- fix cerebras template

## Test Plan

```
llama stack build --template cerebras --image-type conda
llama stack run cerebras
LLAMA_STACK_BASE_URL="http://localhost:5000" pytest -v tests/client-sdk/ --html=report.html --self-contained-html
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 13:53:06 -08:00
Dinesh Yeduguru
12c994b5b2
REST API fixes (#789)
# What does this PR do?

Client SDK fixes

## Test Plan


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/safety/test_safety.py


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/memory/test_memory.py
2025-01-16 13:47:08 -08:00
Dinesh Yeduguru
59eeaf7f81
Idiomatic REST API: Telemetry (#786)
# What does this PR do?

Changes Telemetry API to follow more idiomatic REST


- [ ] Addresses issue (#issue)


## Test Plan

TBD, once i get an approval for rest endpoints
2025-01-16 12:08:46 -08:00
Hardik Shah
74e4d520ac un-skip telemetry cells in notebook 2025-01-16 11:54:25 -08:00
Hardik Shah
17fd2d2fd0
Make notebook testable (#780)
# What does this PR do?

This PR updates the notebook to run as a pytest by using a package
called `nbval`.

- [ ] Addresses issue (#issue)


## Test Plan
```
pytest -v -s --nbval-lax  docs/notebooks/Llama_Stack_Building_AI_Applications.ipynb

=================================== test session starts ====================================
platform linux -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /home/hjshah/.conda/envs/nbeval/bin/python
cachedir: .pytest_cache
rootdir: /home/hjshah/git/llama-stack
configfile: pyproject.toml
plugins: nbval-0.11.0, anyio-4.8.0
collected 20 items                                                                         

docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 0 SKIPPED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 1 SKIPPED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 2 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 3 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 4 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 5 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 6 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 7 SKIPPED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 8 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 9 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 10 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 11 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 12 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 13 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 14 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 15 SKIPPED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 16 PASSED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 17 SKIPPED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 18 SKIPPED
docs/notebooks/Llama_Stack_Building_AI_Applications::ipynb::Cell 19 PASSED

========================= 14 passed, 6 skipped in 89.69s (0:01:29) =========================
```

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-01-15 19:28:17 -08:00
Xi Yan
b76bef169c
fix nvidia inference provider (#781)
# What does this PR do?

- fixes to nvidia inference provider to account for strategy update
- update nvidia templates

## Test Plan

```
llama stack run ./llama_stack/templates/nvidia/run.yaml --port 5000

LLAMA_STACK_BASE_URL="http://localhost:5000" pytest -v tests/client-sdk/inference/test_inference.py --html=report.html --self-contained-html
```
<img width="1288" alt="image"
src="https://github.com/user-attachments/assets/d20f9aea-525e-47de-a5be-586e022e0d55"
/>

**NOTE**
- vision inference broken
- tool calling broken
- /completion broken

cc @mattf @cdgamarose-nv  for improving NVIDIA inference adapter

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-15 18:49:36 -08:00
cdgamarose-nv
b3202bcf77
add nvidia distribution (#565)
# What does this PR do?

adds nvidia template for creating a distribution using inference adapter
for NVIDIA NIMs.

## Test Plan

Please describe:
Build llama stack distribution for nvidia using the template, docker and
conda.
```bash
(.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client configure --endpoint http://localhost:5000
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5000
(.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client models list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ identifier                       ┃ provider_id ┃ provider_resource_id       ┃ metadata ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Llama3.1-8B-Instruct             │ nvidia      │ meta/llama-3.1-8b-instruct │ {}       │
│ meta-llama/Llama-3.2-3B-Instruct │ nvidia      │ meta/llama-3.2-3b-instruct │ {}       │
└──────────────────────────────────┴─────────────┴────────────────────────────┴──────────┘
(.venv) local-cdgamarose@a4u8g-0006:~/llama-stack$ llama-stack-client inference chat-completion --message "hello, write me a 2 sentence poem"
ChatCompletionResponse(
    completion_message=CompletionMessage(
        content='Here is a 2 sentence poem:\n\nThe sun sets slow and paints the sky, \nA gentle hue of pink that makes me sigh.',
        role='assistant',
        stop_reason='end_of_turn',
        tool_calls=[]
    ),
    logprobs=None
)
```

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-01-15 14:04:43 -08:00
Dinesh Yeduguru
7fb2c1c48d
More idiomatic REST API (#765)
# What does this PR do?

This PR changes our API to follow more idiomatic REST API approaches of
having paths being resources and methods indicating the action being
performed.

Changes made to generator:
1) removed the prefix check of "get" as its not required and is actually
needed for other method types too
2) removed _ check on path since variables can have "_"



## Test Plan

LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v
tests/client-sdk/agents/test_agents.py
2025-01-15 13:20:09 -08:00
Ashwin Bharambe
b78e6675ea llama-stack version alpha -> v1 2025-01-15 05:58:09 -08:00
Hardik Shah
a51c8b4efc
Convert SamplingParams.strategy to a union (#767)
# What does this PR do?

Cleans up how we provide sampling params. Earlier, strategy was an enum
and all params (top_p, temperature, top_k) across all strategies were
grouped. We now have a strategy union object with each strategy (greedy,
top_p, top_k) having its corresponding params.
Earlier, 
```
class SamplingParams: 
    strategy: enum ()
    top_p, temperature, top_k and other params
```
However, the `strategy` field was not being used in any providers making
it confusing to know the exact sampling behavior purely based on the
params since you could pass temperature, top_p, top_k and how the
provider would interpret those would not be clear.

Hence we introduced -- a union where the strategy and relevant params
are all clubbed together to avoid this confusion.

Have updated all providers, tests, notebooks, readme and otehr places
where sampling params was being used to use the new format.
   

## Test Plan
`pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py`
// inference on ollama, fireworks and together 
`with-proxy pytest -v -s -k "ollama"
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/inference/test_text_inference.py `
// agents on fireworks 
`pytest -v -s -k 'fireworks and create_agent'
--inference-model="meta-llama/Llama-3.1-8B-Instruct"
llama_stack/providers/tests/agents/test_agents.py
--safety-shield="meta-llama/Llama-Guard-3-8B"`

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [X] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [X] Updated relevant documentation.
- [X] Wrote necessary unit or integration tests.

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-01-15 05:38:51 -08:00
Yuan Tang
300e6e2702
Fix issue when generating distros (#755)
Addressed comment
https://github.com/meta-llama/llama-stack/pull/723#issuecomment-2581902075.

cc @yanxi0830 

I am not 100% sure if the diff is correct though but this is the result
of running `python llama_stack/scripts/distro_codegen.py`.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-15 05:34:08 -08:00
Ashwin Bharambe
d9d34433fc Update spec 2025-01-13 23:16:53 -08:00
Henry Tu
f320eede2b
Update Cerebras docs to include header (#704)
# What does this PR do?

I noticed that the documentation for other providers have this header,
so I have added it to the Cerebras docs too.
```
---
orphan: true
---

# TGI Distribution

```{toctree}
:maxdepth: 2
:hidden:

self
    ```
```

This also fixes a typo in README.md where the link to the Cerebras docs included an extra `getting_started` section.

I did notice however that https://hub.docker.com/r/llamastack/distribution-cerebras still does not exist. How do I get the Cerebras Docker image uploaded?

cc: @ashwinb @raghotham 

## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-13 20:18:34 -08:00
Yufei (Benny) Chen
1cc137cf9c
[Fireworks] Update model name for Fireworks (#753)
# What does this PR do?

Fix https://github.com/meta-llama/llama-stack/issues/697


## Test Plan
Run the 405b model. the full `accounts/fireworks/models/<model_id>` is
the full model name for Fireworks, the 'fireworks/<model_id>' is just a
short hand and sometimes have routing issues

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-13 15:53:57 -08:00
Dinesh Yeduguru
6964510dc1
update notebook to use new tool defs (#745)
# What does this PR do?

Update notebook for new tool defs
2025-01-13 15:07:15 -08:00