llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 18:00:36 +00:00

Author	SHA1	Message	Date
Xi Yan	d3508c4c76	feat(1/n): scoring function registration for llm-as-judge (#1405 ) # What does this PR do? - add ability to register a llm-as-judge scoring function with custom judge prompts / params. - Closes https://github.com/meta-llama/llama-stack/issues/1395 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Via CLI ``` llama-stack-client scoring_functions register \ --scoring-fn-id "llm-as-judge::my-prompt" \ --description "my custom judge" \ --return-type '{"type": "string"}' \ --provider-id "llm-as-judge" \ --provider-scoring-fn-id "my-prompt" \ --params '{"type": "llm_as_judge", "judge_model": "meta-llama/Llama-3.2-3B-Instruct", "prompt_template": "always output 1.0"}' ``` <img width="1373" alt="image" src="https://github.com/user-attachments/assets/7c6fc0ae-64fe-4581-8927-a9d8d746bd72" /> - Unit test will be addressed with https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (## Documentation)	2025-03-05 10:00:34 -08:00
Xi Yan	3d9331840e	docs: api documentation for agents/eval/scoring/datasets (#1400 ) # What does this PR do? - add some docs to OpenAPI for agents/eval/scoring/datasetio [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - read [//]: # (## Documentation)	2025-03-05 09:40:24 -08:00
Ellis Tarn	24a27baf7c	chore: Make README code blocks more easily copy pastable (#1420 ) # What does this PR do? When going through READMEs, I found that I had to keep editing the code blocks since they were prefixed with `$ `. A common pattern is to triple click (highlight all) a block and then copy paste. This minor change will make this easier for folks to follow the READMEs. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan N/A [//]: # (## Documentation)	2025-03-05 09:11:01 -08:00
Daniele Martinoli	fb998683e0	fix: Agent uses the first configured vector_db_id when documents are provided (#1276 ) # What does this PR do? The agent API allows to query multiple DBs using the `vector_db_ids` argument of the `rag` tool: ```py toolgroups=[ { "name": "builtin::rag", "args": {"vector_db_ids": [vector_db_id]}, } ], ``` This means that multiple DBs can be used to compose an aggregated context by executing the query on each of them. When documents are passed to the next agent turn, there is no explicit way to configure the vector DB where the embeddings will be ingested. In such cases, we can assume that: - if any `vector_db_ids` is given, we use the first one (it probably makes sense to assume that it's the only one in the list, otherwise we should loop on all the given DBs to have a consistent ingestion) - if no `vector_db_ids` is given, we can use the current logic to generate a default DB using the default provider. If multiple providers are defined, the API will fail as expected: the user has to provide details on where to ingest the documents. (Closes #1270) ## Test Plan The issue description details how to replicate the problem. [//]: # (## Documentation) --------- Signed-off-by: Daniele Martinoli <dmartino@redhat.com>	2025-03-04 21:44:13 -08:00
Xi Yan	78962be996	chore: refactor create_and_execute_turn and resume_turn (#1399 ) # What does this PR do? - Closes https://github.com/meta-llama/llama-stack/issues/1212 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/integration/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` <img width="1203" alt="image" src="https://github.com/user-attachments/assets/35b60017-b3f2-4e98-87f2-2868730261bd" /> ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/agents/test_agents.py::test_rag_and_code_agent --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` [//]: # (## Documentation)	2025-03-04 16:07:30 -08:00
Ashwin Bharambe	abfbaf3c1b	refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401 ) All of the tests from `llama_stack/providers/tests/` are now moved to `tests/integration`. I converted the `tools`, `scoring` and `datasetio` tests to use API. However, `eval` and `post_training` proved to be a bit challenging to leaving those. I think `post_training` should be relatively straightforward also. As part of this, I noticed that `wolfram_alpha` tool wasn't added to some of our commonly used distros so I added it. I am going to remove a lot of code duplication from distros next so while this looks like a one-off right now, it will go away and be there uniformly for all distros.	2025-03-04 14:53:47 -08:00
Ashwin Bharambe	dd0db8038b	refactor(test): unify vector_io tests and make them configurable (#1398 ) ## Test Plan `LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=sqlite-vec pytest -s -v test_vector_io.py --embedding-model all-miniLM-L6-V2 --inference-model='' --vision-inference-model=''` ``` test_vector_io.py::test_vector_db_retrieve[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_vector_db_register[txt=:vis=:emb=all-miniLM-L6-V2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case0] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case1] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case2] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case3] PASSED test_vector_io.py::test_insert_chunks[txt=:vis=:emb=all-miniLM-L6-V2-test_case4] PASSED ``` Same thing with: - LLAMA_STACK_CONFIG=inference=sentence-transformers,vector_io=faiss - LLAMA_STACK_CONFIG=fireworks (Note that ergonomics will soon be improved re: cmd-line options and env variables)	2025-03-04 13:37:45 -08:00
ehhuang	fd8c991393	fix: rag as attachment bug (#1392 ) Summary: Test Plan: added new test LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/api/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B	2025-03-04 13:08:16 -08:00
Xi Yan	e9a37bad63	chore: rename task_config to benchmark_config (#1397 ) # What does this PR do? - This was missed from previous deprecation: https://github.com/meta-llama/llama-stack/pull/1186 - Part of https://github.com/meta-llama/llama-stack/issues/1396 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./llama-stack/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` [//]: # (## Documentation)	2025-03-04 12:44:04 -08:00
Xi Yan	158b6dc404	chore: deprecate allow_turn_resume (#1377 ) # What does this PR do? - Deprecate allow_turn_resume flag as this is used for staying backward compat. - Closes https://github.com/meta-llama/llama-stack/issues/1363 [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_CONFIG=fireworks pytest -v tests/api/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct" --record-responses ``` <img width="1054" alt="image" src="https://github.com/user-attachments/assets/d31de2d4-0953-41e1-a71a-7e1579fa351a" /> [//]: # (## Documentation)	2025-03-04 12:22:11 -08:00
Ashwin Bharambe	cad5eed4b5	refactor(tests): delete inference, safety and agents tests from providers/tests/ (#1393 ) Continues the refactor of tests. Tests from `providers/tests` should be considered deprecated. For this PR, I deleted most of the tests in - inference - safety - agents since much more comprehensive tests exist in `tests/integration/{inference,safety,agents}` already. I moved `test_persistence.py` from agents, but disabled all the tests since that test needs to be properly migrated. ## Test Plan ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v agents --vision-inference-model='' /Users/ashwin/homebrew/Caskroom/miniconda/base/envs/toolchain/lib/python3.10/site-packages/pytest_asyncio/plugin.py:208: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ======================================================================================================= test session starts ======================================================================================================== platform darwin -- Python 3.10.16, pytest-8.3.3, pluggy-1.5.0 -- /Users/ashwin/homebrew/Caskroom/miniconda/base/envs/toolchain/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-15.3.1-arm64-arm-64bit', 'Packages': {'pytest': '8.3.3', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.24.0', 'html': '4.1.1', 'metadata': '3.1.1', 'anyio': '4.8.0', 'nbval': '0.11.0'}} rootdir: /Users/ashwin/local/llama-stack configfile: pyproject.toml plugins: asyncio-0.24.0, html-4.1.1, metadata-3.1.1, anyio-4.8.0, nbval-0.11.0 asyncio: mode=strict, default_loop_scope=None collected 15 items agents/test_agents.py::test_agent_simple[txt=8B] PASSED agents/test_agents.py::test_tool_config[txt=8B] PASSED agents/test_agents.py::test_builtin_tool_web_search[txt=8B] PASSED agents/test_agents.py::test_builtin_tool_code_execution[txt=8B] PASSED agents/test_agents.py::test_code_interpreter_for_attachments[txt=8B] PASSED agents/test_agents.py::test_custom_tool[txt=8B] PASSED agents/test_agents.py::test_custom_tool_infinite_loop[txt=8B] PASSED agents/test_agents.py::test_tool_choice[txt=8B] PASSED agents/test_agents.py::test_rag_agent[txt=8B-builtin::rag/knowledge_search] PASSED agents/test_agents.py::test_rag_agent[txt=8B-builtin::rag] PASSED agents/test_agents.py::test_rag_agent_with_attachments[txt=8B] PASSED agents/test_agents.py::test_rag_and_code_agent[txt=8B] PASSED agents/test_agents.py::test_create_turn_response[txt=8B] PASSED agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This test needs to be migrated to api / client-sdk world) agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This test needs to be migrated to api / client-sdk world) ```	2025-03-04 10:41:57 -08:00
Reid	cb085d56c6	docs: fix typo (#1390 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-04 09:02:55 -08:00
Alexey Rybak	d57cffb495	fix(pgvector): replace hyphens with underscores in table names (#1385 ) # What does this PR do? Fix SQL syntax errors caused by hyphens in Vector DB IDs by sanitizing table # (Closes #1332 ) ## Test Plan Test confirms table names with hyphens are properly converted to underscores	2025-03-04 07:06:35 -08:00
ehhuang	07a992ef90	feat: deterministic tools ordering (#1380 ) Summary: 1. The `tools` parameter we construct to pass the inference API is non-deterministic. As a result, our recordable mocks is flaky as the ordering change sometimes. This PR makes it so that `tools` ordering is deterministic and aligned with the order user specified. 2. In recordable mock key generation, client tool's parameter type was 'str' and now is 'string' for some reason. I didn't dig into exactly why, but just regenerated the fixtures. Test Plan: Regenerate mocks: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B --record-responses ``` Rerun tests without --record-responses: ``` LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/agents/test_agents.py --safety-shield meta-llama/Llama-Guard-3-8B ```	2025-03-03 20:38:07 -08:00
Ashwin Bharambe	86fc514abb	refactor: move more tests, delete some providers tests (#1382 ) Move unittests to tests/unittests. Gradually nuking tests from providers/tests/ and unifying them into tests/api (which are e2e tests using SDK types) ## Test Plan `pytest -s -v tests/unittests/`	2025-03-03 20:28:34 -08:00
Ashwin Bharambe	55668d3c5b	refactor: move a few tests to top-level tests/ directory	2025-03-03 17:33:39 -08:00
Ashwin Bharambe	5736c7d682	refactor: move tests/client-sdk to tests/api (#1376 ) This PR moves the client-sdk tests to the api directory to better reflect their purpose and improve code organization.	2025-03-03 17:28:12 -08:00
Reid	5c9d12a206	chore: improve --port help text (#1346 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] It would be better to tell user env var usage in help text. ``` before: $ llama stack run --help --port PORT Port to run the server on. Defaults to 8321 after $ llama stack run --help --port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. Defaults to 8321 ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-03 16:49:03 -08:00
Ashwin Bharambe	0a76ece249	feat: add more logs to agent_instance.py	2025-03-03 16:15:47 -08:00
ehhuang	ee5e9b935a	feat: better using get_default_tool_prompt_format (#1360 ) Summary: https://github.com/meta-llama/llama-stack/pull/1214 introduced `get_default_tool_prompt_format` but tried to use it on the raw identifier. Here we move calling this func later in the stack and rely on the inference provider to resolve the raw identifier into llama model, then call get_default_tool_prompt_format. Test Plan: ``` LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming --inference-model=llama3.2:3b-instruct-fp16 --vision-inference-model="" ``` Before: <img width="1288" alt="image" src="https://github.com/user-attachments/assets/918c7839-1f45-4540-864e-4b842cc367df" /> After: <img width="1522" alt="image" src="https://github.com/user-attachments/assets/447d78af-b3b9-4837-8cb7-6ac549005efe" />	2025-03-03 14:50:06 -08:00
Ashwin Bharambe	816fdf289a	refactor: move generation.py to llama3	2025-03-03 13:50:19 -08:00
Ashwin Bharambe	02066591b8	refactor: move generation.py to llama3	2025-03-03 13:46:50 -08:00
Ashwin Bharambe	725423c95c	refactor: move llama3 impl to meta_reference provider (#1364 ) Just moving bits to a better place ## Test Plan ```bash torchrun $CONDA_PREFIX/bin/pytest -s -v test_text_inference.py ```	2025-03-03 13:22:57 -08:00
Sébastien Han	f86154dff5	refactor: restructure resolver logic and improve type safety (#1323 ) # What does this PR do? - Modularized `resolve_impls` by extracting helper functions for validation, sorting, and instantiation. - Improved readability by introducing `validate_and_prepare_providers`, `sort_providers_by_dependency`, and `instantiate_providers`. - Enhanced type safety with explicit type hints (`Tuple`, `Dict`, `Set`, etc.). - Fixed potential issues with provider module imports and added error handling. - Updated `pyproject.toml` to enforce type checking on `resolver.py` using `mypy`. Signed-off-by: Sébastien Han <seb@redhat.com> - [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run the server. [//]: # (## Documentation) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-03-03 10:45:12 -08:00
Daniele Martinoli	cae6c00d8a	fix: Fixed use of chunk.id (#1356 ) # What does this PR do? Closes #1355 ## Test Plan Start server and execute e`xamples/agents/rag_with_vector_db.py` from `llama-stack-apps`.	2025-03-03 10:42:59 -08:00
Xi Yan	7d111c7510	feat: unify max_infer_iters in client/server agent loop (#1309 ) # What does this PR do? We currently use `max_infer_iters` in 2 different ways 1/ Server: track number of times 2/ Client side: track number of times we send `resume_turn` request This PR gets rid of the need of (2) and makes server track total number of times we perform inference within a Turn NOTE The PR will assume StopReason is set to - end_of_message: turn is not finished, we could be waiting for client tool call responses - end_of_turn: if the entire turn is finished and there's no more things to be done. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct" ``` [//]: # (## Documentation)	2025-03-03 10:08:36 -08:00
Ashwin Bharambe	754feba61f	feat: add a configurable category-based logger (#1352 ) A self-respecting server needs good observability which starts with configurable logging. Llama Stack had little until now. This PR adds a `logcat` facility towards that. Callsites look like: ```python logcat.debug("inference", f"params to ollama: {params}") ``` - the first parameter is a category. there is a static list of categories in `llama_stack/logcat.py` - each category can be associated with a log-level which can be configured via the `LLAMA_STACK_LOGGING` env var. - a value `LLAMA_STACK_LOGGING=inference=debug;server=info"` does the obvious thing. there is a special key called `all` which is an alias for all categories ## Test Plan Ran with `LLAMA_STACK_LOGGING="all=debug" llama stack run fireworks` and saw the following: ![image](https://github.com/user-attachments/assets/d24b95ab-3941-426c-9ea0-a4c62542e6f0) Hit it with a client-sdk test case and saw this: ![image](https://github.com/user-attachments/assets/3fee8c6c-986e-4125-a09c-f5dc019682e2)	2025-03-02 18:51:14 -08:00
Reid	58586f4f8c	fix: update cmd check logic (#1347 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Sorry for the https://github.com/meta-llama/llama-stack/pull/1340 logic, it will cause issue if in `non-container` env. ``` Using conda <<<<<<<------ environment: stack + is_command_available docker + command -v docker + printf '\033[0;31mError: docker command not found. Is docker installed and in your PATH?\033[0m' Error: docker command not found. Is docker installed and in your PATH?+ exit 1 ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-02 18:26:59 -08:00
Ashwin Bharambe	46b0a404e8	chore: remove straggler references to llama-models (#1345 ) Straggler references cleanup	2025-03-01 14:26:03 -08:00
Ashwin Bharambe	8bbd52bb9f	chore: remove dependency on llama_models completely (#1344 )	2025-03-01 12:48:08 -08:00
Reid	7131d5ddeb	chore: remove start_venv.sh (#1341 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `start_venv.sh` lifecycle should be: `025f615868` >> `34e3faa4e8` >> `4684fd3f8d` Finally replaced by `start_stack.sh` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-01 11:22:06 -08:00
Ashwin Bharambe	6609d4ada4	feat: allow conditionally enabling providers in run.yaml (#1321 ) # What does this PR do? We want to bundle a bunch of (typically remote) providers in a distro template and be able to configure them "on the fly" via environment variables. So far, we have been able to do this with simple env var replacements. However, sometimes you want to only conditionally enable providers (because the relevant remote services may not be alive, or relevant.) This was not possible until now. To aid this, we add a simple (bash-like) env var replacement enhancement: `${env.FOO+bar}` evaluates to `bar` if the variable is SET and evaluates to empty string if it is not. On top of that, we update our main resolver to ignore any provider whose ID is null. This allows using the distro like this: ```bash llama stack run dev --env CHROMADB_URL=http://localhost:6001 --env ENABLE_CHROMADB=1 ``` when only Chroma is UP. This disables the other `pgvector` provider in the run configuration. ## Test Plan Hard code `chromadb` as the vector io provider inside `test_vector_io.py` and run: ```bash LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v tests/client-sdk/vector_io/ --embedding-model all-MiniLM-L6-v2 ```	2025-03-01 11:19:14 -08:00
ehhuang	81c6ef5c1c	fix: don't update tool_config inplace (#1338 ) Summary: messes tests up Test Plan: run agent tests	2025-03-01 10:40:00 -08:00
Reid	327b17e5f0	chore: add container cmd check in start_stack.sh (#1340 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-01 10:39:32 -08:00
ehhuang	7cff9f504f	fix: raise error when request param failed to convert (#1339 ) # Summary: This led to extremely hard to debug messages. Before: llama_stack/distribution/library_client.py:275: in request response = await self._call_non_streaming( llama_stack/distribution/library_client.py:322: in _call_non_streaming result = await matched_func(*body) llama_stack/providers/utils/telemetry/trace_protocol.py:102: in async_wrapper result = await method(self, args, **kwargs) llama_stack/providers/inline/agents/meta_reference/agents.py:80: in create_agent value=agent_config.model_dump_json(), E AttributeError: 'dict' object has no attribute 'model_dump_json' After: E ValueError: Failed to convert parameter {'model': 'meta-llama/Llama-3.1-8B-Instruct', 'instructions': 'You are a helpful assistant', 'sampling_params': {'strategy': {'type': 'top_p', 'temperature': 0.0001, 'top_p': 0.9}}, 'toolgroups': [{'name': 'builtin::rag'}], 'input_shields': ['meta-llama/Llama-Guard-3-8B'], 'output_shields': ['meta-llama/Llama-Guard-3-8B'], 'enable_session_persistence': False} into <class 'llama_stack.apis.agents.agents.AgentConfig'>: 2 validation errors for AgentConfig E toolgroups.0.str E Input should be a valid string [type=string_type, input_value={'name': 'builtin::rag'}, input_type=dict] E For further information visit https://errors.pydantic.dev/2.10/v/string_type E toolgroups.0.AgentToolGroupWithArgs.args E Field required [type=missing, input_value={'name': 'builtin::rag'}, input_type=dict] E For further information visit https://errors.pydantic.dev/2.10/v/missing # Test Plan: LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B	2025-03-01 10:39:05 -08:00
Reid	dc069025f5	chore: fix typo (#1343 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `21ec67356c/distributions` It should missed the `s`. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-03-01 10:36:04 -08:00
ehhuang	21ec67356c	fix: RAG with documents (#1337 ) Summary: This was broken by https://github.com/meta-llama/llama-stack/pull/1015/files#r1975394190 Test Plan: added e2e test	2025-02-28 16:51:00 -08:00
ehhuang	2faee24873	chore: better raise (#1335 ) Summary: addresses https://github.com/meta-llama/llama-stack/pull/1282#discussion_r1972546802 Test Plan:	2025-02-28 16:41:20 -08:00
Ashwin Bharambe	7ad7e3b970	fix: only install llama-stack package, deps are now correctly incorporated	2025-02-28 16:12:11 -08:00
Xi Yan	15f69e75ff	fix: replace eval with json decoding for format_adapter (#1328 ) # What does this PR do? - using `eval` is a security risk [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - see https://github.com/meta-llama/llama-stack/pull/1327 cc @SLR722 we will need to update the corresponding dataset via ```python def update_to_json_str(): dataset = datasets.load_dataset(...) processed_dataset = dataset[split].map( lambda x: { "column": json.dumps(eval(x["column"])) } ) processed_dataset.push_to_hub(...) ``` [//]: # (## Documentation)	2025-02-28 11:25:23 -08:00
Ashwin Bharambe	5547ef953c	feat: enhance OpenAPI spec to include Error types (#1320 ) # What does this PR do? An API spec must talk about Error handling. This was a pretty glaring omission so far. This PR begins to address it by adding a set of standard error responses we can attach to all our API calls. At a future point, we can add specific error types where necessary (although we should not hurry to do that; it is best done very late.) ## Test Plan Checked that Stainless SDK generation succeeds.	2025-02-28 11:16:12 -08:00
Xi Yan	6520baebed	fix: replace eval with json decoding (#1327 ) # What does this PR do? - Using `eval` on server is a security risk - Replace `eval` with `json.loads` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan ``` pytest -v -s --nbval-lax ./llama-stack/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="747" alt="image" src="https://github.com/user-attachments/assets/7aff3d95-0b12-4394-b9d0-aeff791eee38" /> [//]: # (## Documentation)	2025-02-28 11:10:45 -08:00
Reid	66cd128ab5	docs: update the downloaded list doc (#1266 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Since released the `--downloaded` option, so update the related documents. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 10:10:12 -08:00
Reid	14c442f177	chore: update cmd check (#1293 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 10:08:05 -08:00
Reid	ea4f13cc20	chore: add container cmd check (#1306 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 10:07:24 -08:00
Sébastien Han	c91548fe07	build(container): misc improvements (#1291 ) # What does this PR do? See individual commit messages. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Apply this diff: ``` diff --git a/llama_stack/templates/ollama/build.yaml b/llama_stack/templates/ollama/build.yaml index da33b8d5..4a702f6f 100644 --- a/llama_stack/templates/ollama/build.yaml +++ b/llama_stack/templates/ollama/build.yaml @@ -28,5 +28,5 @@ distribution_spec: - remote::tavily-search - inline::code-interpreter - inline::rag-runtime - - remote::model-context-protocol + container_image: "registry.access.redhat.com/ubi9" image_type: conda ``` Then run: ``` CONTAINER_BINARY=podman llama stack build --template ollama --image-type container --image-name registry.access.redhat.com/ubi9 Containerfile created successfully in /var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.I7E5V6zbVI/Containerfile FROM registry.access.redhat.com/ubi9 WORKDIR /app RUN dnf -y update && dnf install -y iputils net-tools wget vim-minimal python3.11 python3.11-pip python3.11-wheel python3.11-setuptools && ln -s /bin/pip3.11 /bin/pip && ln -s /bin/python3.11 /bin/python && dnf clean all ENV UV_SYSTEM_PYTHON=1 RUN pip install uv RUN uv pip install --no-cache ollama nltk opentelemetry-sdk aiosqlite matplotlib datasets sqlite-vec scipy chromadb-client psycopg2-binary numpy scikit-learn openai redis pandas tqdm blobfile sentencepiece aiohttp requests pillow pymongo transformers autoevals opentelemetry-exporter-otlp-proto-http pypdf chardet aiosqlite fastapi fire httpx uvicorn RUN uv pip install --no-cache llama-stack RUN pip uninstall -y uv ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "ollama"] # Allows running as non-root user RUN mkdir -p /.llama /.cache RUN chmod -R g+rw /app /.llama /.cache PWD: /Users/leseb/Documents/AI/llama-stack Containerfile: /var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.I7E5V6zbVI/Containerfile + podman build --platform linux/arm64 -t distribution-ollama:0.1.4 -f /var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.I7E5V6zbVI/Containerfile . --progress=plain STEP 1/11: FROM registry.access.redhat.com/ubi9 STEP 2/11: WORKDIR /app --> Using cache d73dafd4caddd75bc29242a9031258fea759dc571c5bb53a64b5e6d86b3b1335 --> d73dafd4cadd STEP 3/11: RUN dnf -y update && dnf install -y iputils net-tools wget vim-minimal python3.11 python3.11-pip python3.11-wheel python3.11-setuptools && ln -s /bin/pip3.11 /bin/pip && ln -s /bin/python3.11 /bin/python && dnf clean all --> Using cache b74ad682db149771612a3ea1e4796e0760ab8a4e07c26ad672b46a86d38178c2 --> b74ad682db14 STEP 4/11: ENV UV_SYSTEM_PYTHON=1 --> Using cache 0812a05e6576506aa2fe646cbf239d0cb504cac30a50cb5cf4dc88e49039466d --> 0812a05e6576 STEP 5/11: RUN pip install uv --> Using cache a0ce1705f87e52f70f6eb34e66f67b68ebc7c1a073f4d2a664b189cfa89a4e88 --> a0ce1705f87e STEP 6/11: RUN uv pip install --no-cache ollama nltk opentelemetry-sdk aiosqlite matplotlib datasets sqlite-vec scipy chromadb-client psycopg2-binary numpy scikit-learn openai redis pandas tqdm blobfile sentencepiece aiohttp requests pillow pymongo transformers autoevals opentelemetry-exporter-otlp-proto-http pypdf chardet aiosqlite fastapi fire httpx uvicorn Using Python 3.11.9 environment at: /usr Resolved 107 packages in 1.78s Downloading kiwisolver (1.4MiB) Downloading aiohttp (1.6MiB) Downloading grpcio (5.4MiB) Downloading nltk (1.4MiB) Downloading transformers (9.5MiB) Downloading pydantic-core (1.7MiB) Downloading lxml (4.6MiB) Downloading psycopg2-binary (2.7MiB) Downloading scipy (33.8MiB) Downloading scikit-learn (12.0MiB) Downloading tokenizers (2.8MiB) Downloading fonttools (4.6MiB) Downloading pymongo (1.3MiB) Downloading rapidfuzz (1.4MiB) Downloading sentencepiece (1.2MiB) Downloading pyarrow (38.7MiB) Downloading matplotlib (8.1MiB) Downloading pycryptodomex (2.1MiB) Downloading pillow (4.2MiB) Downloading pandas (14.9MiB) Downloading numpy (13.6MiB) Building fire==0.7.0 Downloaded sentencepiece Downloaded kiwisolver Downloaded pymongo Downloaded rapidfuzz Downloaded nltk Downloaded aiohttp Built fire==0.7.0 Downloaded pydantic-core Downloaded pycryptodomex Downloaded psycopg2-binary Downloaded tokenizers Downloaded pillow Downloaded lxml Downloaded fonttools Downloaded grpcio Downloaded matplotlib Downloaded transformers Downloaded scikit-learn Downloaded numpy Downloaded pandas Downloaded scipy Downloaded pyarrow Prepared 107 packages in 3.03s Installed 107 packages in 62ms + aiohappyeyeballs==2.4.6 + aiohttp==3.11.13 + aiosignal==1.3.2 + aiosqlite==0.21.0 + annotated-types==0.7.0 + anyio==4.8.0 + attrs==25.1.0 + autoevals==0.0.120 + backoff==2.2.1 + blobfile==3.0.0 + braintrust-core==0.0.58 + certifi==2025.1.31 + chardet==5.2.0 + charset-normalizer==3.4.1 + chevron==0.14.0 + chromadb-client==0.6.3 + click==8.1.8 + contourpy==1.3.1 + cycler==0.12.1 + datasets==3.3.2 + deprecated==1.2.18 + dill==0.3.8 + distro==1.9.0 + dnspython==2.7.0 + fastapi==0.115.8 + filelock==3.17.0 + fire==0.7.0 + fonttools==4.56.0 + frozenlist==1.5.0 + fsspec==2024.12.0 + googleapis-common-protos==1.68.0 + grpcio==1.70.0 + h11==0.14.0 + httpcore==1.0.7 + httpx==0.28.1 + huggingface-hub==0.29.1 + idna==3.10 + importlib-metadata==8.5.0 + jiter==0.8.2 + joblib==1.4.2 + jsonschema==4.23.0 + jsonschema-specifications==2024.10.1 + kiwisolver==1.4.8 + levenshtein==0.26.1 + lxml==5.3.1 + matplotlib==3.10.0 + monotonic==1.6 + multidict==6.1.0 + multiprocess==0.70.16 + nltk==3.9.1 + numpy==1.26.4 + ollama==0.4.7 + openai==1.64.0 + opentelemetry-api==1.30.0 + opentelemetry-exporter-otlp-proto-common==1.30.0 + opentelemetry-exporter-otlp-proto-grpc==1.30.0 + opentelemetry-exporter-otlp-proto-http==1.30.0 + opentelemetry-proto==1.30.0 + opentelemetry-sdk==1.30.0 + opentelemetry-semantic-conventions==0.51b0 + orjson==3.10.15 + overrides==7.7.0 + packaging==24.2 + pandas==2.2.3 + pillow==11.1.0 + posthog==3.16.0 + propcache==0.3.0 + protobuf==5.29.3 + psycopg2-binary==2.9.10 + pyarrow==19.0.1 + pycryptodomex==3.21.0 + pydantic==2.10.6 + pydantic-core==2.27.2 + pymongo==4.11.1 + pyparsing==3.2.1 + pypdf==5.3.0 + python-dateutil==2.9.0.post0 + pytz==2025.1 + pyyaml==6.0.2 + rapidfuzz==3.12.1 + redis==5.2.1 + referencing==0.36.2 + regex==2024.11.6 + requests==2.32.3 + rpds-py==0.23.1 + safetensors==0.5.3 + scikit-learn==1.6.1 + scipy==1.15.2 + sentencepiece==0.2.0 + six==1.17.0 + sniffio==1.3.1 + sqlite-vec==0.1.6 + starlette==0.45.3 + tenacity==9.0.0 + termcolor==2.5.0 + threadpoolctl==3.5.0 + tokenizers==0.21.0 + tqdm==4.67.1 + transformers==4.49.0 + typing-extensions==4.12.2 + tzdata==2025.1 + urllib3==2.3.0 + uvicorn==0.34.0 + wrapt==1.17.2 + xxhash==3.5.0 + yarl==1.18.3 + zipp==3.21.0 --> 5b5b823605a1 STEP 7/11: RUN uv pip install --no-cache llama-stack Using Python 3.11.9 environment at: /usr Resolved 55 packages in 1.08s Downloading setuptools (1.2MiB) Downloading pygments (1.2MiB) Downloading llama-models (1.5MiB) Downloading tiktoken (1.1MiB) Downloaded tiktoken Downloaded llama-models Downloaded pygments Downloaded setuptools Prepared 15 packages in 402ms Installed 15 packages in 15ms + jinja2==3.1.5 + llama-models==0.1.4 + llama-stack==0.1.4 + llama-stack-client==0.1.4 + markdown-it-py==3.0.0 + markupsafe==3.0.2 + mdurl==0.1.2 + prompt-toolkit==3.0.50 + pyaml==25.1.0 + pygments==2.19.1 + python-dotenv==1.0.1 + rich==13.9.4 + setuptools==75.8.2 + tiktoken==0.9.0 + wcwidth==0.2.13 --> 38a037443807 STEP 8/11: RUN pip uninstall -y uv Found existing installation: uv 0.6.3 Uninstalling uv-0.6.3: Successfully uninstalled uv-0.6.3 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv --> 54f749dc5ece STEP 9/11: ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "ollama"] --> 481c138b1982 STEP 10/11: RUN mkdir -p /.llama /.cache --> 0fc174f014a8 STEP 11/11: RUN chmod -R g+rw /app /.llama /.cache COMMIT distribution-ollama:0.1.4 --> d41b4ab4b136 Successfully tagged localhost/distribution-ollama:0.1.4 d41b4ab4b1363bfbaf6239e6f313bcb37873ef4b5f2fd816a4ee55acf2ac54d3 + set +x Success! Build Successful! ``` UBI9 container successfully builds. Run the container: ``` podman run d41b4ab4b1363bfbaf6239e6f313bcb37873ef4b5f2fd816a4ee55acf2ac54d3 --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:213: Resolved 30 providers INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-inference => ollama INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: models => __routing_table__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inference => __autorouted__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-vector_io => sqlite-vec INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-safety => llama-guard INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: shields => __routing_table__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: safety => __autorouted__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: vector_dbs => __routing_table__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: vector_io => __autorouted__ INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-tool_runtime => brave-search INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215: inner-tool_runtime => tavily-search ``` [//]: # (## Documentation) --------- Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-28 10:01:52 -08:00
Yuan Tang	18ab1985da	fix: Make remote::vllm compatible with vLLM <= v0.6.3 (#1325 ) # What does this PR do? This is to be consistent with OpenAI API and support vLLM <= v0.6.3 References: * https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice * https://github.com/vllm-project/vllm/pull/10000 This fixes the error when running older versions of vLLM: ``` 00:50:19.834 [START] /v1/inference/chat-completion INFO 2025-02-28 00:50:20,203 httpx:1025: HTTP Request: POST https://api-xeai-granite-3-1-8b-instruct.apps.int.stc.ai.preprod.us-east-1.aws.paas.redhat.com/v1/chat/completions "HTTP/1.1 400 Bad Request" Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 235, in endpoint return await maybe_await(value) File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 201, in maybe_await return await value File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 89, in async_wrapper result = await method(self, args, kwargs) File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 193, in chat_completion return await provider.chat_completion(params) File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 89, in async_wrapper result = await method(self, args, kwargs) File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 286, in chat_completion return await self._nonstream_chat_completion(request, self.client) File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 292, in _nonstream_chat_completion r = client.chat.completions.create(params) File "/usr/local/lib/python3.10/site-packages/openai/_utils/_utils.py", line 279, in wrapper return func(args, *kwargs) File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions/completions.py", line 879, in create return self._post( File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1290, in post return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in request return self._request( File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1071, in _request raise self._make_status_error_from_response(err.response) from None openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "[{'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, When using `tool_choice`, `tools` must be set.', 'input': {'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'What model are you?'}]}], 'model': 'granite-3-1-8b-instruct', 'max_tokens': 4096, 'stream': False, 'temperature': 0.0, 'tools': None, 'tool_choice': 'auto'}, 'ctx': {'error': ValueError('When using `tool_choice`, `tools` must be set.')}}]", 'type': 'BadRequestError', 'param': None, 'code': 400} INFO: 2600:1700:9d20:ac0::49:59736 - "POST /v1/inference/chat-completion HTTP/1.1" 500 Internal Server Error 00:50:20.266 [END] /v1/inference/chat-completion [StatusCode.OK] (431.99ms) ``` ## Test Plan All existing tests pass. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-28 12:48:49 -05:00
Sébastien Han	6fa257b475	chore(lint): update Ruff ignores for project conventions and maintainability (#1184 ) - Added new ignores from flake8-bugbear (`B007`, `B008`) - Ignored `C901` (high function complexity) for now, pending review - Maintained PyTorch conventions (`N812`, `N817`) - Allowed `E731` (lambda assignments) for flexibility - Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`) - Documented rationale for each ignored rule This keeps our linting aligned with project needs while tracking potential fixes. Signed-off-by: Sébastien Han <seb@redhat.com> Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-28 09:36:49 -08:00
Reid	3b57d8ee88	feat: add prompt-format list (#1222 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] `19ae4b35d9/llama_stack/cli/model/prompt_format.py (L47)` Based on the comment: `Only Llama 3.1 and 3.2 are supported`, even 3.1, 3.2 are not all models can show it with `prompt-format`, so cannot refer to `llama model list`, only refer to list when enter a invalid model, so it would be nice to help to check the valid models: ``` llama model prompt-format -m Llama3.1-405B-Instruct:bf16-mp8 usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] llama model prompt-format: error: Llama3.1-405B-Instruct:bf16-mp8 is not a valid Model <<<<---. Choose one from -- Llama3.1-8B Llama3.1-70B Llama3.1-405B Llama3.1-8B-Instruct Llama3.1-70B-Instruct Llama3.1-405B-Instruct Llama3.2-1B Llama3.2-3B Llama3.2-1B-Instruct Llama3.2-3B-Instruct Llama3.2-11B-Vision Llama3.2-90B-Vision Llama3.2-11B-Vision-Instruct Llama3.2-90B-Vision-Instruct before: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) Example: llama model prompt-format <options> after: $ llama model prompt-format --help usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l] Show llama model message formats options: -h, --help show this help message and exit -m MODEL_NAME, --model-name MODEL_NAME Model Family (llama3_1, llama3_X, etc.) -l, --list List the valid supported models Example: llama model prompt-format <options> $ llama model prompt-format -l ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Model ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ Llama3.1-8B │ ├──────────────────────────────┤ │ Llama3.1-70B │ ├──────────────────────────────┤ │ Llama3.1-405B │ ├──────────────────────────────┤ │ Llama3.1-8B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-70B-Instruct │ ├──────────────────────────────┤ │ Llama3.1-405B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-1B │ ├──────────────────────────────┤ │ Llama3.2-3B │ ├──────────────────────────────┤ │ Llama3.2-1B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-3B-Instruct │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision │ ├──────────────────────────────┤ │ Llama3.2-11B-Vision-Instruct │ ├──────────────────────────────┤ │ Llama3.2-90B-Vision-Instruct │ └──────────────────────────────┘ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) --------- Signed-off-by: reidliu <reid201711@gmail.com> Co-authored-by: reidliu <reid201711@gmail.com>	2025-02-28 09:27:22 -08:00
Dinesh Yeduguru	7f9b767277	fix: check conda env name using basepath in exec.py (#1301 ) # What does this PR do? check conda env name using basepath in exec.py The current logic for finding conda prefix does a `endswith` check with just the conda env name, but this will cause us to match incorrect if there is a different conda env which ends with same suffix. In my case, i had stack and llama-stack as the two conda envs. ## Test Plan llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml	2025-02-27 23:07:23 -08:00

... 3 4 5 6 7 ...

1038 commits