# What does this PR do?
This is the second attempt to switch to system packages by default. Now
with a hack to detect conda environment - in which case conda image-type
is used.
Note: Conda will only be used when --image-name is unset *and*
CONDA_DEFAULT_ENV is set. This means that users without conda will
correctly fall back to using system packages when no --image-* arguments
are passed at all.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Uses virtualenv:
```
$ llama stack build --template ollama --image-type venv
$ llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using virtual environment: /home/ec2-user/src/llama-stack/schedule/.local
[...]
```
Uses system packages (virtualenv already initialized):
```
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
INFO 2025-03-27 20:46:22,882 llama_stack.cli.stack.run:142 server: No image type or image name provided. Assuming environment packages.
[...]
```
Attempt to run from environment packages without necessary packages
installed:
```
$ python -m venv barebones
$ . ./barebones/bin/activate
$ pip install -e . # to install llama command
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
ModuleNotFoundError: No module named 'fastapi'
```
^ failed as expected because the environment doesn't have necessary
packages installed.
Now install some packages in the new environment:
```
$ pip install fastapi opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp aiosqlite ollama openai datasets faiss-cpu mcp autoevals
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```
Now see if setting CONDA_DEFAULT_ENV will change what happens by
default:
```
$ export CONDA_DEFAULT_ENV=base
$ llama stack run ~/.llama/distributions/ollama/ollama-run.yaml
[...]
Using conda environment: base
Conda environment base does not exist.
[...]
```
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
* Fix location of `run.yaml` relative to the cloned llama stack
repository
* Drop `-it` from `docker run` commands as its not needed running
services
## Test Plan
* Verified running the llama stack following updated instruction
CC: @ashwinb
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
# What does this PR do?
Set a formatter for log file handler that does not pollute log messages
with color tags.
## Test Plan
Successfully tested with `LLAMA_STACK_LOG_FILE=server.log llama stack
run ...`
Fixes multiple issues
1. llama stack build of dependencies was breaking with incompatible
numpy / pandas when importing datasets
Moved the notebook to start a local server instead of using library as a
client. This way the setup is cleaner since its all contained and by
using `uv run --with` we can test both the server setup process too in
CI and release time.
2. The change to [1] surfaced some other issues
- running `llama stack run` was defaulting to conda env name
- provider data was not being managed properly
- Some notebook cells (telemetry for evals) were not updated with latest
changes
Fixed all the issues and update the notebook.
### Test
1. Manually run it all in local env
2. `pytest -v -s --nbval-lax docs/getting_started.ipynb`
# What does this PR do?
This is to stay consistent with other APIs.
This change registers files in API, even though there are still no
providers. Removing tests that require a provider existing for a merged
API to enable it in API layer.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
This PR adds support for NVIDIA's NeMo Customizer API to the Llama Stack
post-training module. The integration enables users to fine-tune models
using NVIDIA's cloud-based customization service through a consistent
Llama Stack interface.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Yet to be done
Things pending under this PR:
- [x] Integration of fine-tuned model(new checkpoint) for inference with
nvidia llm distribution
- [x] distribution integration of API
- [x] Add test cases for customizer(In Progress)
- [x] Documentation
```
LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/post_training/test_supervised_fine_tuning.py
============================================================================================================================================================================ test session starts =============================================================================================================================================================================
platform linux -- Python 3.10.0, pytest-8.3.4, pluggy-1.5.0 -- /home/ubuntu/llama-stack/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.0', 'Platform': 'Linux-6.8.0-1021-gcp-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'nbval': '0.11.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'html': '4.1.1', 'asyncio': '0.25.3'}}
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: nbval-0.11.0, metadata-3.1.1, anyio-4.8.0, html-4.1.1, asyncio-0.25.3
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items
tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_post_training_provider_registration[txt=8B] PASSED [ 50%]
tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_list_training_jobs[txt=8B] PASSED [100%]
======================================================================================================================================================================== 2 passed, 1 warning in 0.10s ========================================================================================================================================================================
```
cc: @mattf @dglogo @sumitb
---------
Co-authored-by: Ubuntu <ubuntu@llama-stack-customizer-dev-inst-2tx95fyisatvlic4we8hidx5tfj.us-central1-a.c.brevdevprod.internal>
# What does this PR do?
This PR updates the sqlite-vec database calls to be non-blocking. Note
that each operation creates a new connection, which incurs some
performance overhead but is reasonable given [SQLite's threading and
connections constraints](https://www.sqlite.org/threadsafe.html).
Summary of changes:
- Refactored `SQLiteVecIndex` class to store database path instead of
connection object
- Added `_create_sqlite_connection()` helper function to create
connections on demand
- Ensured proper connection closure in all database operations
- Fixed test fixtures to use a file-based SQLite database for
thread-safety
- Updated the `SQLiteVecVectorIOAdapter` class to handle per-operation
connections
This PR helps chip away at
https://github.com/meta-llama/llama-stack/issues/1489
## Test Plan
sqlite-vec unit tests passed locally as well as a test script using the
client as a library.
## Misc
FYI @varshaprasad96 @kevincogan
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
**What**
Instead of adhoc creating a vectordb and chunking when documents ae sent
as an attachment to agent turn, we directly pass raw text from document
into messages to model for user context, and let model perform
summarization directly.
This removes the magic behaviour, and yields better performance than
existing approach.
**Improved Performance**
- RAG lifecycle notebook
- Model: 0.3 factuality score
- (+ websearch) Agent: 0.44 factuality score
- (+ vector db) Agent: 0.3 factuality score
- (+ raw context) Agent: 0.6 factuality score
Closes https://github.com/meta-llama/llama-stack/issues/1478
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
- [NEW] added section in RAG lifecycle notebook shows better performance
<img width="840" alt="image"
src="https://github.com/user-attachments/assets/a0c4e816-809a-41c0-9124-89825983e3f5"
/>
[//]: # (## Documentation)
# What does this PR do?
Gets rid of errors like the below, which is on all webmethod decorated
functions
llama_stack/apis/agents/agents.py:398: error: Value of type variable "T"
of function cannot be "Callable[[Agents, AgentConfig], Coroutine[Any,
Any, AgentCreateResponse]]" [type-var]
## Test Plan
Run mypy and observes mypy errors gone
# What does this PR do?
1) Uses otel compatible id generation for stack
2) Stack starts returning trace id info in the header of response
3) We inject the same trace id that we have into otel in order to force
it to use our trace ids.
## Test Plan
```
curl -i --request POST \
--url http://localhost:8321/v1/inference/chat-completion \
--header 'content-type: application/json' \
--data '{
"model_id": "meta-llama/Llama-3.1-70B-Instruct",
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "where do humans live"
}
}
],
"stream": false
}'
HTTP/1.1 200 OK
date: Fri, 21 Mar 2025 21:51:19 GMT
server: uvicorn
content-length: 1712
content-type: application/json
x-trace-id: 595101ede31ece116ebe35b26d67e8cf
{"metrics":[{"metric":"prompt_tokens","value":10,"unit":null},{"metric":"completion_tokens","value":320,"unit":null},{"metric":"total_tokens","value":330,"unit":null}],"completion_message":{"role":"assistant","content":"Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. **Continents:** Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. **Countries:** There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. **Cities and towns:** Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. **Rural areas:** Some humans live in rural areas, such as villages, farms, and countryside.\n5. **Islands:** Humans inhabit many islands around the world, including tropical islands, island nations, and islands in the Arctic and Antarctic regions.\n6. **Underwater habitats:** A few humans live in underwater habitats, such as research stations and submarines.\n7. **Space:** A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nIn terms of specific environments, humans live in a wide range of ecosystems, including:\n\n* Deserts\n* Forests\n* Grasslands\n* Mountains\n* Oceans\n* Rivers\n* Tundras\n* Wetlands\n\nOverall, humans are incredibly adaptable and can be found living in almost every corner of the globe.","stop_reason":"end_of_turn","tool_calls":[]},"logprobs":null}
```
Same trace id in Jaeger and sqlite:


# What does this PR do?
## Test Plan
LLAMA_STACK_CONFIG=dev pytest -s -v
tests/integration/agents/test_agents.py::test_custom_tool
--safety-shield meta-llama/Llama-Guard-3-8B --text-model
accounts/fireworks/models/llama-v3p1-8b-instruct
and verify trace in jaeger UI
https://llama-stack.readthedocs.io/en/latest/building_applications/telemetry.html#
# What does this PR do?
- We cannot directly return a literal type
> Note: this is not final jobs API change
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
<img width="837" alt="image"
src="https://github.com/user-attachments/assets/18a17561-35f9-443d-987d-54afdd6ff40c"
/>
[//]: # (## Documentation)
Required to startup a distribution with prompt guard
Closes: #1723
## Test Plan
distribution starts with patch applied
Signed-off-by: Derek Higgins <derekh@redhat.com>
# What does this PR do?
This is to avoid errors like the following when running inference
integration tests:
```
ERROR tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=8B-inference:completion:stop_sequence] - llama_stack.distribution.stack.EnvVarError: Environment variable 'VLLM_URL' not set or empty at providers.inference[0].config.url
```
It's also good to have a default, which is consistent with vLLM API
server.
## Test Plan
Integration tests can run without the error above.
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
## What does this PR do?
fix the template to make it compatible with the latest dataset and eval
api change
## test
run `llama stack run
llama_stack/templates/experimental-post-training/run.yaml` and spin up
the llama stack server successfully
# What does this PR do?
Since we now start recording and exporting metrics, we no longer can use
single OTEL endpoint to export both traces and metrics. This PR adds two
sinks: OTEL_TRACE and OTEL_METRIC to be able to selectively enable the
exporters.
## Test Plan
Start server with OTEL_TRACE as sink and verify traces show up in jaeger

We need to be able to handle stale registry entries gracefully. More
needs to be done when we are deleting important attributes from
resources which could have been persisted. But at the very least, the
server cannot die.
## Test Plan
Added unit tests
# What does this PR do?
Tracing middleware should not start tracing for lifespan events.
Lifespan event happens at server startup and shutdown and if we start
tracing for them, we will have an active trace for the lifetime of the
server, which messes up with regular tracing since we always expect the
traces to be never nested.
We started hitting this issue since
https://github.com/meta-llama/llama-stack/pull/1495.
## Test Plan
* llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
* Verify in sqlite store that the trace now has non null span id

# What does this PR do?
This should fix https://github.com/meta-llama/llama-stack/issues/1716
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
Clean up mypy violations for inline::{telemetry,tool_runtime,vector_io}.
This also makes API accept a tool call result without any content (like
RAG tool already may produce).
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Since Python 3.10, unions can be expressed as `type1 | type2`. Sadly,
while this is functionally equivalent to `Union[type1, type2]`, the type
of the expression is different (`types.UnionType`, not `typing.Union`).
We should handle both in schemas.
## Test Plan
Switch a schema type from Union to `|` and confirm the generator doesn't
crash with:
```
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 91, in <module>
fire.Fire(main)
File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/.cache/uv/archive-v0/FBgkcwcN-PaJ0NAur__7J/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/generate.py", line 55, in main
spec = Specification(
^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/utility.py", line 30, in __init__
self.document = generator.generate()
^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 782, in generate
operation = self._build_operation(op)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 648, in _build_operation
"application/json": builder.build_media_type(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 221, in build_media_type
schema = self.schema_builder.classdef_to_ref(item_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 135, in classdef_to_ref
type_schema = self.classdef_to_schema(typ)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/docs/openapi_generator/pyopenapi/generator.py", line 116, in classdef_to_schema
type_schema, type_definitions = self.schema_generator.classdef_to_schema(typ)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 607, in classdef_to_schema
types_defined[sub_name] = self._type_to_schema_with_lookup(sub_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 564, in _type_to_schema_with_lookup
type_schema = self.type_to_schema(data_type, force_expand=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 320, in type_to_schema
return self._type_to_schema(data_type, force_expand, json_schema_extra) | common_info
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 487, in _type_to_schema
property_docstrings = get_class_property_docstrings(typ, self.options.property_description_fun)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ihrachys/src/llama-stack/llama_stack/strong_typing/schema.py", line 94, in get_class_property_docstrings
for base in inspect.getmro(data_type):
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/w2wykgpkzidnnr6cpw8wf94ghb0p8big-python3-3.11.11/lib/python3.11/inspect.py", line 731, in getmro
return cls.__mro__
^^^^^^^^^^^
AttributeError: 'types.UnionType' object has no attribute '__mro__'. Did you mean: '__or__'?
```
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
closes#1584
This should be a rather innocuous change.
## Test Plan
Verify that there's no more tool call parsing error for example in issue
<img width="1216" alt="image"
src="https://github.com/user-attachments/assets/a5a6f4e8-2093-4ca2-bc06-794b707a0429"
/>
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/agents/test_agents.py --safety-shield
meta-llama/Llama-Guard-3-8B --text-model
meta-llama/Llama-3.1-8B-Instruct
This PR introduces a way to implement Attribute Based Access Control
(ABAC) for the Llama Stack server.
The rough design is:
- https://github.com/meta-llama/llama-stack/pull/1626 added a way for
the Llama Stack server to query an authenticator
- We build upon that and expect "access attributes" as part of the
response. These attributes indicate the scopes available for the
request.
- We use these attributes to perform access control for registered
resources as well as for constructing the default access control
policies for newly created resources.
- By default, if you support authentication but don't return access
attributes, we will add a unique namespace pointing to the API_KEY. That
way, all resources by default will be scoped to API_KEYs.
An important aspect of this design is that Llama Stack stays out of the
business of credential management or the CRUD for attributes. How you
manage your namespaces or projects is entirely up to you. The design
only implements access control checks for the metadata / book-keeping
information that the Stack tracks.
### Limitations
- Currently, read vs. write vs. admin permissions aren't made explicit,
but this can be easily extended by adding appropriate attributes to the
`AccessAttributes` data structure.
- This design does not apply to agent instances since they are not
considered resources the Stack knows about. Agent instances are
completely within the scope of the Agents API provider.
### Test Plan
Added unit tests, existing integration tests
# What does this PR do?
Don't set type variables from register_schema().
`mypy` is not happy about it since type variables are calculated at
runtime and hence the typing hints are not available during static
analysis.
Good news is there is no good reason to set the variables from the
return type.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
with the new /v1/providers API, /v1/inspect/providers is duplicative,
deprecate it by removing the route, and add a test for the full
/v1/providers API
resolves#1623
## Test Plan
`uv run pytest -v tests/integration/providers --stack-config=ollama
--text-model="meta-llama/Llama-3.2-3B-Instruct"
--embedding-model=all-MiniLM-L6-v2`
<img width="1512" alt="Screenshot 2025-03-18 at 9 18 38 AM"
src="https://github.com/user-attachments/assets/2db30f25-3ff6-4374-b39d-0047f093fe36"
/>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
currently if you have a run yaml without temeletry the following error
is hit:
TypeError: TelemetryAdapter.__init__() missing 1 required positional
argument: 'deps'
this is because the TelemetryAdapter requires a deps arg to be passed.
Pass {} to avoid errors.
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
In this PR, we added a new eval open benchmark IfEval based on paper
https://arxiv.org/abs/2311.07911 to measure the model capability of
instruction following.
## Test Plan
spin up a llama stack server with open-benchmark template
run `llama-stack-client --endpoint xxx eval run-benchmark
"meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct"
--output-dir "/home/markchen1015/" --num-examples 20` on client side and
get the eval aggregate results
# What does this PR do?
This PR updates `build_container.sh` to prevent an "unknown flag" error
when using the `BUILD_PLATFORM` environment variable during `llama stack
build`.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
Closes#1699
## Test Plan
Running the following code with out these changes results in an "unknown
flag" error.
```
CONTAINER_BINARY=podman BUILD_PLATFORM=linux/amd64 llama stack build --template ollama --image-type container
```
With these changes, the same command should build the image correctly.
Signed-off-by: Michael Clifford <mcliffor@redhat.com>
# What does this PR do?
DocVQA asks model to look a a picture, then answer a question given in
text, with a text answer by text information in the picture. these
questions often require understanding of relative positions of texts
within the picture.
original dataset is defined in the "Task1" of
https://www.docvqa.org/datasets
## Test Plan
setup llama server with
```
llama stack run ./llama_stack/templates/open-benchmark/run.yaml
```
then send traffic:
```
llama-stack-client eval run-benchmark "meta-reference-docvqa" --model-id meta-llama/Llama-3.3-70B-Instruct --output-dir /tmp/gpqa --num-examples 200
```
# What does this PR do?
FAILED
tests/integration/tools/test_tools.py::test_toolsgroups_unregister[None]
- AttributeError: 'coroutine' object has no attribute 'data'
## Test Plan
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/tools/test_tools.py
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1704).
* #1705
* __->__ #1704
## What does this PR do?
open-benchmark templated is broken after the datasets api refactor due
to 2 reasons
- provider_id and provider_resource_id are no longer needed
- the type in run.yaml will be resolved as dict
this PR is to fix the above 2 issues
## Test
spin up a llama stack server successfully with llama stack run
`llama_stack/templates/open-benchmark/run.yaml`
These block on io reads which in turn block the
server. Move them to their own thread.
Closes: #1697
# What does this PR do?
To avoid blocking the main eventloop, updates datasetio/localfs to load
data in a seperate thread
Signed-off-by: Derek Higgins <derekh@redhat.com>
### What does this PR do?
Currently, `ToolCall.arguments` is a `Dict[str, RecursiveType]`.
However, on the client SDK side -- the `RecursiveType` gets deserialized
into a number ( both int and float get collapsed ) and hence when params
are `int` they get converted to float which might break client side
tools that might be doing type checking.
Closes: https://github.com/meta-llama/llama-stack/issues/1683
### Test Plan
Stainless changes --
https://github.com/meta-llama/llama-stack-client-python/pull/204
```
pytest -s -v --stack-config=fireworks tests/integration/agents/test_agents.py --text-model meta-llama/Llama-3.1-8B-Instruct
```
This PR adds support (or is a proposal for) for supporting API KEY
authentication on the Llama Stack server end. `llama-stack-client`
already supports accepting an api_key parameter and passes it down
through every request as an `Authentication: ` header.
Currently, Llama Stack does not propose APIs for handling authentication
or authorization for resources of any kind. Given that, and the fact
that any deployment will typically have _some_ authentication system
present, we simply adopt a delegation mechanism: delegate to an HTTPS
endpoint performing key management / authentication.
It is configured via:
```yaml
server:
auth:
endpoint: <...>
```
in the run.yaml configuration.
## How It Works
When authentication is enabled:
1. Every API request must include an `Authorization: Bearer <token>`
header
2. The server will send a _POST_ validation request to the configured
endpoint with the following payload:
```json
{
"api_key": "<token>",
"request": {
"path": "/api/path",
"headers": { "header1": "value1", ... },
"params": { "param1": "value1", ... }
}
}
```
3. If the authentication endpoint returns a 200 status code, the request
is allowed to proceed
4. If the authentication endpoint returns any other status code, a 401
Unauthorized response is returned
## Test Plan
Unit tests