llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Xi Yan	2a8e199e10	fix notebook	2025-02-13 16:52:46 -08:00
Xi Yan	8b655e3cd2	fix!: update eval-tasks -> benchmarks (#1032 ) # What does this PR do? - Update `/eval-tasks` to `/benchmarks` - ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task config. Now we only have `BenchmarkConfig`. The overloaded `benchmark` is confusing and do not add any value. Backward compatibility is being kept as the "type" is not being used anywhere. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - This change is backward compatible - Run notebook test with ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="846" alt="image" src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Signed-off-by: Ben Browning <bbrownin@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> Co-authored-by: Ben Browning <ben324@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com> Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com> Co-authored-by: reidliu <reid201711@gmail.com> Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-13 16:40:58 -08:00
Xi Yan	2fa9e3c941	fix: make backslash work in GET /models/{model_id:path} (#1068 )	2025-02-13 08:46:43 -08:00
Charlie Doern	025f615868	feat: add support for running in a venv (#1018 ) # What does this PR do? add --image-type to `llama stack run`. Which takes conda, container or venv also add start_venv.sh which start the stack using a venv resolves #1007 ## Test Plan running locally: `llama stack build --template ollama --image-type venv` `llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml` ... ``` llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml Using run configuration: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml + python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321 Using config file: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml Run configuration: apis: - agents - datasetio ... ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-12 11:13:04 -05:00
Dinesh Yeduguru	d8a20e034b	feat: make telemetry attributes be dict[str,PrimitiveType] (#1055 ) # What does this PR do? Make attributes in telemetry be only primitive types and avoid arbitrary nesting. ## Test Plan ``` LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py -k "test_builtin_tool_web_search" # Verified that attributes still show up correclty in jaeger ```	2025-02-11 15:10:17 -08:00
Dinesh Yeduguru	ab7f802698	feat: add MetricResponseMixin to chat completion response types (#1050 ) # What does this PR do? Defines a MetricResponseMixin which can be inherited by any response class. Adds it to chat completion response types. This is a short term solution to allow inference API to return metrics The ideal way to do this is to have a way for all response types to include metrics and all metric events logged to the telemetry API to be included with the response To do this, we will need to augment all response types with a metrics field. We have hit a blocker from stainless SDK that prevents us from doing this. The blocker is that if we were to augment the response types that have a data field in them like so class ListModelsResponse(BaseModel): metrics: Optional[List[MetricEvent]] = None data: List[Models] ... The client SDK will need to access the data by using a .data field, which is not ergonomic. Stainless SDK does support unwrapping the response type, but it requires that the response type to only have a single field. We will need a way in the client SDK to signal that the metrics are needed and if they are needed, the client SDK has to return the full response type without unwrapping it. ## Test Plan sh run_openapi_generator.sh ./ sh stainless_sync.sh dineshyv/dev add-metrics-to-resp-v4 LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/fireworks/fireworks-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py	2025-02-11 14:58:12 -08:00
Ellis Tarn	36d35406a7	fix: a bad newline in ollama docs (#1036 ) # What does this PR do? Catches a bug in the previous codegen which was removing newlines. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` python llama_stack/scripts/distro_codegen.py ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant)	2025-02-10 14:27:17 -08:00
Ellis Tarn	afca9d92f9	fix: Readthedocs cannot parse comments, resulting in docs bugs (#1033 )	2025-02-10 16:35:16 -05:00
Ellis Tarn	ab9516c789	fix: Gaps in doc codegen (#1035 ) # What does this PR do? Catches docs up to source with: ``` python llama_stack/scripts/distro_codegen.py ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] Manually checked ``` sphinx-autobuild docs/source build/html ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant)	2025-02-10 13:24:15 -08:00
Michael Clifford	076213165c	docs: update rag.md example code to prevent errors (#1009 )	2025-02-10 09:25:30 -05:00
raghotham	7766e68e92	docs: update index.md for 0.1.2 (#1013 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant)	2025-02-07 15:36:20 -08:00
Jeff Tang	a229de6d1e	Getting started notebook update (#936 ) # What does this PR do? Added examples (Section 4) of using Llama Stack 0.1 distro on together and Llama 3.2 to answer questions about an image with LS Chat and Agent APIs.	2025-02-07 15:36:15 -08:00
Ashwin Bharambe	62e5461da7	No spaces in ipynb tests	2025-02-07 11:56:22 -08:00
Ashwin Bharambe	a8820597ee	Minor clean up of notebook	2025-02-07 11:36:29 -08:00
ehhuang	af15426ad7	doc: getting started notebook (#996 ) # What does this PR do? Fix link ## Test Plan <!-- Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. --> <!-- ## Sources Please link relevant resources if necessary. --> <!-- ## Documentation - [ ] Added a [Changelog](https://github.com/meta-llama/llama-stack/blob/main/CHANGELOG.md) entry if the change is significant (new feature, breaking change etc.). -->	2025-02-06 17:30:21 -08:00
Hardik Shah	28a0fe57cc	fix: Update rag examples to use fresh faiss index every time (#998 ) # What does this PR do? In several examples we use the same faiss index , which means running it multiple times fills up the index with duplicates which eventually degrades the model performance on RAG as multiple copies of the same irrelevant chunks might be picked up several times. Fix is to ensure we create a new index each time. Resolves issue in this discussion - https://github.com/meta-llama/llama-stack/discussions/995 ## Test Plan Re-ran the getting started guide multiple times to see the same output Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-02-06 16:12:29 -08:00
Maxime Lecanu	e964ec95e9	docs: Correct typos in Zero to Hero guide (#997 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Usually, the relevant context should be present in a linked issue. --> Corrects some typographical errors found in the `docs/zero_to_hero_guide/README.md` file. <!-- Uncomment this section with the issue number if an issue is being resolved Issue resolved by this Pull Request: Closes # ---> ## Test Plan <!-- Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. --> N/A <!-- ## Sources Please link relevant resources if necessary. --> <!-- ## Documentation - [ ] Added a [Changelog](https://github.com/meta-llama/llama-stack/blob/main/CHANGELOG.md) entry if the change is significant (new feature, breaking change etc.). --> Co-authored-by: Maxime Lecanu <mlecanu@fb.com>	2025-02-06 17:29:52 -05:00
Hardik Shah	a84e7669f0	feat: Add a new template for `dell` (#978 ) - Added new template `dell` and its documentation - Update docs - [minor] uv fix i came across - codegen for all templates Tested with ```bash export INFERENCE_PORT=8181 export DEH_URL=http://0.0.0.0:$INFERENCE_PORT export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct export CHROMADB_HOST=localhost export CHROMADB_PORT=6601 export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT](about:blank) export CUDA_VISIBLE_DEVICES=0 export LLAMA_STACK_PORT=8321 # build the stack template llama stack build --template=dell # start the TGI inference server podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0 # start chroma-db for vector-io ( aka RAG ) podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname) # build docker llama stack build --template=dell --image-type=container # run llama stack server ( via docker ) podman run -it \ --network host \ -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \ -v ~/.llama:/root/.llama \ # NOTE: mount the llama-stack / llama-model directories if testing local changes -v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env DEH_URL=$DEH_URL \ --env CHROMA_URL=$CHROMA_URL # test the server cd <PATH_TO_LLAMA_STACK_REPO> LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py ``` --------- Co-authored-by: Hardik Shah <hjshah@fb.com>	2025-02-06 14:14:39 -08:00
Yuan Tang	09ed0e9c9f	Add Kubernetes deployment guide (#899 ) This PR moves some content from [the recent blog post](https://blog.vllm.ai/2025/01/27/intro-to-llama-stack-with-vllm.html) to here as a more official guide for users who'd like to deploy Llama Stack on Kubernetes. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-06 10:28:02 -08:00
ehhuang	3922999118	sys_prompt support in Agent (#938 ) # What does this PR do? The current default system prompt for llama3.2 tends to overindex on tool calling and doesn't work well when the prompt does not require tool calling. This PR adds an option to override the default system prompt, and organizes tool-related configs into a new config object. - [ ] Addresses issue (#issue) ## Test Plan LLAMA_STACK_CONFIG=together pytest \-\-inference\-model=meta\-llama/Llama\-3\.3\-70B\-Instruct -s -v tests/client-sdk/agents/test_agents.py::test_override_system_message_behavior ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-05 21:11:32 -08:00
Nathan Weinberg	e777d965a1	docs: add addn server guidance for Linux users in Quick Start (#972 ) # What does this PR do? - [x] Addresses issue #971 ## Test Plan Ran docs build locally ## Sources See discussion linked in the issue ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Nathan Weinberg <nweinber@redhat.com> Co-authored-by: Mert Parker <mertpaker@gmail.com>	2025-02-05 20:57:51 -08:00
Ihar Hrachyshka	f4343f7dc0	docs: clarify host.docker.internal works for recent podman (#977 ) The host.docker.internal alias was implemented in podman 4.7.0: `b672ddc792` Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> # What does this PR do? Follow-up to previous podman specific doc update. ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-05 16:02:05 -08:00
Aakanksha Duggal	8fa642835b	Fix README.md notebook links (#976 ) # What does this PR do? In short, provide a summary of what this PR does and why. Usually, the relevant context should be present in a linked issue. - [ ] Addresses issue (#issue) ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>	2025-02-05 14:33:46 -08:00
Ryan Cook	2d9c8b549e	docs: missing T in import (#974 ) # What does this PR do? Missing T in import ## Test Plan N/A doc update ## Sources Please link relevant resources if necessary. ## Before submitting - [X ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-05 17:06:39 -05:00
Kamesh Akella	d9c0b4e3ba	[docs] update the zero_to_hero_guide llama stack version to 0.1.0 (#960 ) # What does this PR do? The Zero to Hero guide currently references an older 0.0.61 llama-stack version. Using the most recent stable release of the product in the documentation, would help the users not to go through any issues from the older llama-stack versions. ## Test Plan I have ran the workflow locally using the proposed version change and I am able to proceed further ahead without any issue. ## Before submitting - [X] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-05 11:49:26 -08:00
Ihar Hrachyshka	5c8e35a9e2	docs, tests: replace datasets.rst with memory_optimizations.rst (#968 ) datasets.rst was removed from torchtune repo. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> # What does this PR do? Replace a missing 404 document with another one that exists. (Removed it from the list when memory_optimizations.rst was already pulled.) ## Test Plan Please describe: - tests you ran to verify your changes with result summaries. - provide instructions so it can be reproduced. ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-05 11:25:56 -05:00
Ihar Hrachyshka	529708215c	[docs] Make RAG example self-contained (#962 ) Before the patch, the example could not be executed verbatim without copy-pasting client function from the inference example. I think it's better to have examples self-contained, especially in a getting started guide. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com> # What does this PR do? See above. ## Test Plan Confirmed example can now be executed verbatim. ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-04 16:22:50 -08:00
Ashwin Bharambe	474c4bdd7a	Make a couple properties optional (#963 )	2025-02-04 16:20:24 -08:00
Ihar Hrachyshka	0cbb3e401c	docs: miscellaneous small fixes (#961 ) - [docs] Fix misc typos and formatting issues in intro docs - [docs]: Export variables (e.g. INFERENCE_MODEL) in getting_started - [docs] Show that `llama-stack-client configure` will ask for api key # What does this PR do? Miscellaneous fixes in the documentation; not worth reporting an issue. ## Test Plan No code changes. Addressed issues spotted when walking through the guide. Confirmed locally. ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. --------- Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-02-04 15:31:30 -08:00
Bill Murdock	b0dec797a0	Add Podman instructions to Quick Start (#957 ) Podman is a popular alternative to Docker, so it would be nice to make it clear that it can also be used to deploy the container for the server. The instructions are a little different because you have to create the directory (unlike with Docker which makes the directory for you). # What does this PR do? - [ ] Add Podman instructions to Quick Start ## Test Plan Documentation only. ## Sources I tried it out and it worked. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-02-04 14:37:02 -08:00
Ashwin Bharambe	d67401c644	Several documentation fixes and fix link to API reference	2025-02-04 14:00:43 -08:00
Charlie Doern	26aef50bc5	if client.initialize fails, the example should exit (#954 ) # What does this PR do? the example script can gracefully exit if the boolean returned from initialize is used properly Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-02-04 13:54:21 -08:00
Ashwin Bharambe	b17277b06a	Fix the OpenAPI HTML	2025-02-04 10:38:49 -08:00
ehhuang	c9ab72fa82	Support sys_prompt behavior in inference (#937 ) # What does this PR do? The current default system prompt for llama3.2 tends to overindex on tool calling and doesn't work well when the prompt does not require tool calling. This PR adds an option to override the default system prompt, and organizes tool-related configs into a new config object. - [ ] Addresses issue (#issue) ## Test Plan python -m unittest llama_stack.providers.tests.inference.test_prompt_adapter ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/937). * #938 * __->__ #937	2025-02-03 23:35:16 -08:00
Xi Yan	62cd3c391e	notebook point to github as source of truth	2025-02-03 15:08:25 -08:00
Ashwin Bharambe	753a1aa7bc	Update colab link to be pointing back to github source	2025-02-03 15:00:21 -08:00
Ashwin Bharambe	aefd5bb619	Test notebook update	2025-02-03 14:59:06 -08:00
Nathan Weinberg	7a72082cdd	fix: formatting for ollama note in Quick Start doc (#945 ) # What does this PR do? Fixes formatting for Ollama note found here: https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#start-ollama - [ ] Addresses issue (#issue) ## Test Plan Ran local docs build as described [here](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md#building-the-documentation) ## Sources N/A ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests. Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-02-03 14:13:57 -08:00
Ashwin Bharambe	f98efe68c9	Misc fixes (#944 ) - Make sure torch + torchvision go together as deps, otherwise bad stuff happens - Add a pre-commit for requirements.txt	2025-02-03 14:08:47 -08:00
Nathan Weinberg	0f14378135	fix: broken "core concepts" link in docs website (#940 ) # What does this PR do? The `core concepts` link on [this page](https://llama-stack.readthedocs.io/en/latest/contributing/new_api_provider.html) is currently broken - this PR fixes that link ## Test Plan Ran local docs build as described [here](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md#building-the-documentation) ## Sources N/A ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-02-03 13:46:34 -08:00
Nathan Weinberg	1e36721686	fix: broken link in Quick Start doc (#943 ) # What does this PR do? Ollama download link is broken on this page: https://llama-stack.readthedocs.io/en/latest/getting_started/index.html ## Test Plan N/A ## Sources https://ollama.com/docs/installation ==> 404 https://ollama.com/download ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-02-03 13:45:35 -08:00
Ashwin Bharambe	ccf0cbb903	Update release pointer	2025-02-02 12:11:57 -08:00
Ashwin Bharambe	7fdbd5b642	Add NBVAL skips to the getting started notebook	2025-02-02 07:53:07 -08:00
Yuan Tang	34ab7a3b6c	Fix precommit check after moving to ruff (#927 ) Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 06:46:45 -08:00
Hardik Shah	a7b929f17e	Sec fixes as raised by bandit (#917 ) minor fixes to hashlib and jinja	2025-01-31 13:44:26 -08:00
Xi Yan	15dcc4ea5e	openapi gen return type fix for streaming/non-streaming (#910 ) # What does this PR do? We need to change ```yaml /v1/inference/chat-completion: post: responses: '200': description: >- If stream=False, returns a ChatCompletionResponse with the full completion. If stream=True, returns an SSE event stream of ChatCompletionResponseStreamChunk content: text/event-stream: schema: oneOf: - $ref: '#/components/schemas/ChatCompletionResponse' - $ref: '#/components/schemas/ChatCompletionResponseStreamChunk' ``` into ```yaml /v1/inference/chat-completion: post: responses: '200': description: >- If stream=False, returns a ChatCompletionResponse with the full completion. If stream=True, returns an SSE event stream of ChatCompletionResponseStreamChunk content: text/event-stream: schema: $ref: '#/components/schemas/ChatCompletionResponseStreamChunk' application/json: schema: $ref: '#/components/schemas/ChatCompletionResponse' ``` ## Test Plan Python - tested in SDK sync: https://github.com/meta-llama/llama-stack-client-python/pull/108 Node - tested w/ https://gist.github.com/yanxi0830/b782f4b91e21dcccdfef8898ce55157e (SDK udpate follow up) ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-30 18:03:02 -08:00
Xi Yan	94051cfe9e	fix ImageContentItem to take base64 string as image.data (#909 ) # What does this PR do? - Discussion in https://github.com/meta-llama/llama-stack/pull/906#discussion_r1936260819 - image.data should accept base64 string as input instead of binary bytes, change prompt_adapter to account for that. ## Test Plan ``` pytest -v tests/client-sdk/inference/test_inference.py ``` with test in https://github.com/meta-llama/llama-stack/pull/906 ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2025-01-30 15:58:23 -08:00
snova-edwardm	7fe2592795	SambaNova supports Llama 3.3 (#905 ) # What does this PR do? - Fix typo - Support Llama 3.3 70B ## Test Plan Run the following scripts and obtain the test results Script ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming --env SAMBANOVA_API_KEY={API_KEY} ``` Result ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_streaming[-sambanova] PASSED =========================================== 1 passed, 1 warning in 1.26s ============================================ ``` Script ``` pytest -s -v --providers inference=sambanova llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming --env SAMBANOVA_API_KEY={API_KEY} ``` Result ``` llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_chat_completion_non_streaming[-sambanova] PASSED =========================================== 1 passed, 1 warning in 0.52s ============================================ ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [N] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [Y] Ran pre-commit to handle lint / formatting issues. - [Y] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [Y] Updated relevant documentation. - [N] Wrote necessary unit or integration tests.	2025-01-30 09:24:46 -08:00
Yuan Tang	d5b7de3897	Fix link to selection guide and change "docker" to "container" (#898 ) The current link doesn't work. Also changed docs to be consistent with https://github.com/meta-llama/llama-stack/pull/802.	2025-01-29 11:59:40 -08:00
Ashwin Bharambe	0d96070af9	Update OpenAPI generator to add param and field documentation (#896 ) We desperately need to document our APIs. This is the basic requirement of having a Spec :) This PR updates the OpenAPI generator so documentation for request parameters and object fields can be properly added to the OpenAPI specs. From there, this should get picked by Stainless, etc. ## Test Plan: Updated client-sdk (See https://github.com/meta-llama/llama-stack-client-python/pull/104) and then ran: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=../../llama_stack/templates/fireworks/run.yaml pytest -s -v inference/test_inference.py agents/test_agents.py ```	2025-01-29 10:04:30 -08:00

1 2 3 4 5 ...

339 commits