Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.
This is the first part:
- delete other kinds of memory banks (keyvalue, keyword, graph) for now;
we will introduce a keyvalue store API as part of this design but not
use it in the RAG tool yet.
- renaming of the APIs
# Context
For test automation, the end goal is to run a single pytest command from
root test directory (llama_stack/providers/tests/.) such that we execute
push-blocking tests
The work plan:
1) trigger pytest from llama_stack/providers/tests/.
2) use config file to determine what tests and parametrization we want
to run
# What does this PR do?
1) consolidates the "inference-models" / "embedding-model" /
"judge-model" ... options in root conftest.py. Without this change, we
will hit into error when trying to run `pytest
/Users/sxyi/llama-stack/llama_stack/providers/tests/.` because of
duplicated `addoptions` definitions across child conftest files.
2) Add a `config` option to specify test config in YAML. (see
[`ci_test_config.yaml`](https://gist.github.com/sixianyi0721/5b37fbce4069139445c2f06f6e42f87e)
for example config file)
For provider_fixtures, we allow users to use either a default fixture
combination or define their own {api:provider} combinations.
```
memory:
....
fixtures:
provider_fixtures:
- default_fixture_param_id: ollama // use default fixture combination with param_id="ollama" in [providers/tests/memory/conftest.py](https://fburl.com/mtjzwsmk)
- inference: sentence_transformers
memory: faiss
- default_fixture_param_id: chroma
```
3) generate tests according to the config. Logic lives in two places:
a) in `{api}/conftest.py::pytest_generate_tests`, we read from config to
do parametrization.
b) after test collection, in `pytest_collection_modifyitems`, we filter
the tests to include only functions listed in config.
## Test Plan
1) `pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/.
--collect-only --config=ci_test_config.yaml`
Using `--collect-only` tag to print the pytests listed in the config
file (`ci_test_config.yaml`).
output:
[gist](https://gist.github.com/sixianyi0721/05145e60d4d085c17cfb304beeb1e60e)
2) sanity check on `--inference-model` option
```
pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
```
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
This PR changes our API to follow more idiomatic REST API approaches of
having paths being resources and methods indicating the action being
performed.
Changes made to generator:
1) removed the prefix check of "get" as its not required and is actually
needed for other method types too
2) removed _ check on path since variables can have "_"
## Test Plan
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v
tests/client-sdk/agents/test_agents.py
# What does this PR do?
We are setting a default value of json for tool prompt format, which
conflicts with llama 3.2/3.3 models since they use python list. This PR
changes the defaults to None and in the code, we infer default based on
the model.
Addresses: #695
Tests:
❯ LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v
tests/client-sdk/inference/test_inference.py -k
"test_text_chat_completion"
pytest llama_stack/providers/tests/inference/test_prompt_adapter.py
# What does this PR do?
PR #639 introduced the notion of Tools API and ability to invoke tools
through API just as any resource. This PR changes the Agents to start
using the Tools API to invoke tools. Major changes include:
1) Ability to specify tool groups with AgentConfig
2) Agent gets the corresponding tool definitions for the specified tools
and pass along to the model
3) Attachements are now named as Documents and their behavior is mostly
unchanged from user perspective
4) You can specify args that can be injected to a tool call through
Agent config. This is especially useful in case of memory tool, where
you want the tool to operate on a specific memory bank.
5) You can also register tool groups with args, which lets the agent
inject these as well into the tool call.
6) All tests have been migrated to use new tools API and fixtures
including client SDK tests
7) Telemetry just works with tools API because of our trace protocol
decorator
## Test Plan
```
pytest -s -v -k fireworks llama_stack/providers/tests/agents/test_agents.py \
--safety-shield=meta-llama/Llama-Guard-3-8B \
--inference-model=meta-llama/Llama-3.1-8B-Instruct
pytest -s -v -k together llama_stack/providers/tests/tools/test_tools.py \
--safety-shield=meta-llama/Llama-Guard-3-8B \
--inference-model=meta-llama/Llama-3.1-8B-Instruct
LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py
```
run.yaml:
https://gist.github.com/dineshyv/0365845ad325e1c2cab755788ccc5994
Notebook:
https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d?usp=sharing
## What does this PR do?
This is a long-pending change and particularly important to get done
now.
Specifically:
- we cannot "localize" (aka download) any URLs from media attachments
anywhere near our modeling code. it must be done within llama-stack.
- `PIL.Image` is infesting all our APIs via `ImageMedia ->
InterleavedTextMedia` and that cannot be right at all. Anything in the
API surface must be "naturally serializable". We need a standard `{
type: "image", image_url: "<...>" }` which is more extensible
- `UserMessage`, `SystemMessage`, etc. are moved completely to
llama-stack from the llama-models repository.
See https://github.com/meta-llama/llama-models/pull/244 for the
corresponding PR in llama-models.
## Test Plan
```bash
cd llama_stack/providers/tests
pytest -s -v -k "fireworks or ollama or together" inference/test_vision_inference.py
pytest -s -v -k "(fireworks or ollama or together) and llama_3b" inference/test_text_inference.py
pytest -s -v -k chroma memory/test_memory.py \
--env EMBEDDING_DIMENSION=384 --env CHROMA_DB_PATH=/tmp/foobar
pytest -s -v -k fireworks agents/test_agents.py \
--safety-shield=meta-llama/Llama-Guard-3-8B \
--inference-model=meta-llama/Llama-3.1-8B-Instruct
```
Updated the client sdk (see PR ...), installed the SDK in the same
environment and then ran the SDK tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=together pytest -s -v agents/test_agents.py
LLAMA_STACK_CONFIG=ollama pytest -s -v memory/test_memory.py
# this one needed a bit of hacking in the run.yaml to ensure I could register the vision model correctly
INFERENCE_MODEL=llama3.2-vision:latest LLAMA_STACK_CONFIG=ollama pytest -s -v inference/test_inference.py
```
# What does this PR do?
Adds the sentence transformer provider and the `all-MiniLM-L6-v2`
embedding model to the default models to register in the run.yaml for
all providers.
## Test Plan
llama stack build --template together --image-type conda
llama stack run
~/.llama/distributions/llamastack-together/together-run.yaml
This PR does the following:
1) adds the ability to generate embeddings in all supported inference
providers.
2) Moves all the memory providers to use the inference API and improved
the memory tests to setup the inference stack correctly and use the
embedding models
This is a merge from #589 and #598
# What does this PR do?
This PR adds a new model type field to support embedding models to be
registered. Summary of changes:
1) Each registered model by default is an llm model.
2) User can specify an embedding model type, while registering.If
specified, the model bypass the llama model checks since embedding
models can by of any type and based on llama.
3) User needs to include the required embedding dimension in metadata.
This will be used by embedding generation to generate the requried size
of embeddings.
## Test Plan
This PR will go together will need to be merged with two follow up PRs
that will include test plans.
# What does this PR do?
1) Implement `unregister_dataset(dataset_id)` API in both llama stack
routing table and providers: It removes {dataset_id -> Dataset} mapping
from routing table and removes the dataset_id references in provider as
well (ex. for huggingface, we use a KV store to store the dataset id =>
dataset. we delete it during unregistering as well)
2) expose the datasets/unregister_dataset api endpoint
## Test Plan
**Unit test:**
`
pytest llama_stack/providers/tests/datasetio/test_datasetio.py -m
"huggingface" -v -s --tb=short --disable-warnings
`
**Test on endpoint:**
tested llama stack using an ollama distribution template:
1) start an ollama server
2) Start a llama stack server with the default ollama distribution
config + dataset/datasetsio APIs + datasetio provider
```
---- .../ollama-run.yaml
...
apis:
- agents
- inference
- memory
- safety
- telemetry
- datasetio
- datasets
providers:
datasetio:
- provider_id: localfs
provider_type: inline::localfs
config: {}
...
```
saw that the new API showed up in startup script
```
Serving API datasets
GET /alpha/datasets/get
GET /alpha/datasets/list
POST /alpha/datasets/register
POST /alpha/datasets/unregister
```
3) query `/alpha/datasets/unregister` through curl (since we have not implemented unregister api in llama stack client)
```
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets register
--dataset-id sixian --url
https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/chat.rst
--schema {}
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets list
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ metadata ┃ type ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ sixian │ localfs │ {} │ dataset │
└────────────┴─────────────┴──────────┴─────────┘
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets register
--dataset-id sixian2 --url
https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/chat.rst
--schema {}
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets list
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ metadata ┃ type ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ sixian │ localfs │ {} │ dataset │
│ sixian2 │ localfs │ {} │ dataset │
└────────────┴─────────────┴──────────┴─────────┘
(base) sxyi@sxyi-mbp llama-stack % curl
http://localhost:5001/alpha/datasets/unregister \
-H "Content-Type: application/json" \
-d '{"dataset_id": "sixian"}'
null%
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets list
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ metadata ┃ type ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ sixian2 │ localfs │ {} │ dataset │
└────────────┴─────────────┴──────────┴─────────┘
(base) sxyi@sxyi-mbp llama-stack % curl
http://localhost:5001/alpha/datasets/unregister \
-H "Content-Type: application/json" \
-d '{"dataset_id": "sixian2"}'
null%
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets list
```
## Sources
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
Remove a check which skips provider registration if a resource is
already in stack registry. Since we do not reconcile state with
provider, register should always call into provider's register endpoint.
## Test Plan
```
# stack run
╰─❯ llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml
#register memory bank
❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0
Memory Bank Configuration:
{
│ 'memory_bank_type': 'vector',
│ 'chunk_size_in_tokens': 512,
│ 'embedding_model': 'all-MiniLM-L6-v2',
│ 'overlap_size_in_tokens': 64
}
#register again
❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0
Memory Bank Configuration:
{
│ 'memory_bank_type': 'vector',
│ 'chunk_size_in_tokens': 512,
│ 'embedding_model': 'all-MiniLM-L6-v2',
│ 'overlap_size_in_tokens': 64
}
```
The semantics of an Update on resources is very tricky to reason about
especially for memory banks and models. The best way to go forward here
is for the user to unregister and register a new resource. We don't have
a compelling reason to support update APIs.
Tests:
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m
"chroma" --env CHROMA_HOST=localhost --env CHROMA_PORT=8000
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m
"pgvector" --env PGVECTOR_DB=postgres --env PGVECTOR_USER=postgres --env
PGVECTOR_PASSWORD=mysecretpassword --env PGVECTOR_HOST=0.0.0.0
$CONDA_PREFIX/bin/pytest -v -s -m "ollama"
llama_stack/providers/tests/inference/test_model_registration.py
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
We are calling the initialize function on the registery in the common
routing table impl, which is incorrect as the common routing table is
the base class inherited by each resource's routing table. this change
moves remove that and add the initialize to the creation, where it inits
once server run.
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
This PR makes the following changes:
1) Fixes the get_all and initialize impl to actually read the values
returned from the range call to kvstore and not keys.
2) The start_key and end_key are fixed to correct perform the range
query after the key format changes
3) Made the cache registry thread safe since there are multiple
initializes called for each routing table.
Tests:
* Start stack
* Register dataset
* Kill stack
* Bring stack up
* dataset list
```
llama-stack-client datasets list
+--------------+---------------+---------------------------------------------------------------------------------+---------+
| identifier | provider_id | metadata | type |
+==============+===============+=================================================================================+=========+
| alpaca | huggingface-0 | {} | dataset |
+--------------+---------------+---------------------------------------------------------------------------------+---------+
| mmlu | huggingface-0 | {'path': 'llama-stack/evals', 'name': 'evals__mmlu__details', 'split': 'train'} | dataset |
+--------------+---------------+---------------------------------------------------------------------------------+---------+
```
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
# What does this PR do?
- API updates: change schema to dataset_schema for register_dataset for
resolving pydantic naming conflict
- Note: this OpenAPI update will be synced with
llama-stack-client-python SDK.
cc @dineshyv
## Test Plan
```
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
```
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
- `schema` should not a field w/ pydantic warnings
- change `schema` to `dataset_schema`
<img width="855" alt="image"
src="https://github.com/user-attachments/assets/47cb6bb9-4be0-46a5-8701-24d24e2eaabd">
## Test Plan
```
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
```
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
This PR kills the notion of "pure passthrough" remote providers. You
cannot specify a single provider you must specify a whole distribution
(stack) as remote.
This PR also significantly fixes / upgrades testing infrastructure so
you can now test against a remotely hosted stack server by just doing
```bash
pytest -s -v -m remote test_agents.py \
--inference-model=Llama3.1-8B-Instruct --safety-shield=Llama-Guard-3-1B \
--env REMOTE_STACK_URL=http://localhost:5001
```
Also fixed `test_agents_persistence.py` (which was broken) and killed
some deprecated testing functions.
## Test Plan
All the tests.
This PR changes the way model id gets translated to the final model name
that gets passed through the provider.
Major changes include:
1) Providers are responsible for registering an object and as part of
the registration returning the object with the correct provider specific
name of the model provider_resource_id
2) To help with the common look ups different names a new ModelLookup
class is created.
Tested all inference providers including together, fireworks, vllm,
ollama, meta reference and bedrock
# What does this PR do?
This PR kills the notion of "ShieldType". The impetus for this is the
realization:
> Why is keyword llama-guard appearing so many times everywhere,
sometimes with hyphens, sometimes with underscores?
Now that we have a notion of "provider specific resource identifiers"
and "user specific aliases" for those and the fact that this works with
models ("Llama3.1-8B-Instruct" <> "fireworks/llama-3pv1-..."), we can
follow the same rules for Shields.
So each Safety provider can make up a notion of identifiers it has
registered. This already happens with Bedrock correctly. We just
generalize it for Llama Guard, Prompt Guard, etc.
For Llama Guard, we further simplify by just adopting the underlying
model name itself as the identifier! No confusion necessary.
While doing this, I noticed a bug in our DistributionRegistry where we
weren't scoping identifiers by type. Fixed.
## Feature/Issue validation/testing/test plan
Ran (inference, safety, memory, agents) tests with ollama and fireworks
providers.
# What does this PR do?
This is a follow-up to #425. That PR allows for specifying models in the
registry, but each entry needs to look like:
```yaml
- identifier: ...
provider_id: ...
provider_resource_identifier: ...
```
This is headache-inducing.
The current PR makes this situation better by adopting the shape of our
APIs. Namely, we need the user to only specify `model-id`. The rest
should be optional and figured out by the Stack. You can always override
it.
Here's what example `ollama` "full stack" registry looks like (we still
need to kill or simplify shield_type crap):
```yaml
models:
- model_id: Llama3.2-3B-Instruct
- model_id: Llama-Guard-3-1B
shields:
- shield_id: llama_guard
shield_type: llama_guard
```
## Test Plan
See test plan for #425. Re-ran it.
* migrate evals to resource
* remove listing of providers's evals
* change the order of params in register
* fix after rebase
* linter fix
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
* migrate dataset to resource
* remove auto discovery
* remove listing of providers's datasets
* fix after rebase
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
* init
* working bedrock tests
* bedrock test for inference fixes
* use env vars for bedrock guardrail vars
* add register in meta reference
* use correct shield impl in meta ref
* dont add together fixture
* right naming
* minor updates
* improved registration flow
* address feedback
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
* Significantly simpler and malleable test setup
* convert memory tests
* refactor fixtures and add support for composable fixtures
* Fix memory to use the newer fixture organization
* Get agents tests working
* Safety tests work
* yet another refactor to make this more general
now it accepts --inference-model, --safety-model options also
* get multiple providers working for meta-reference (for inference + safety)
* Add README.md
---------
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
* persist registered objects with distribution
* linter fixes
* comment
* use annotate and field discriminator
* workign tests
* donot use global state
* precommit failures fixed
* add back Any
* fix imports
* remove unnecessary changes in ollama
* precommit failures fixed
* make kvstore configurable for dist and rename registry
* add comment about registry list return
* fix linter errors
* use registry to hydrate
* remove debug print
* linter fixes
* remove kvstore.db
* rename distribution_registry_store
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
This PR makes several core changes to the developer experience surrounding Llama Stack.
Background: PR #92 introduced the notion of "routing" to the Llama Stack. It introduces three object types: (1) models, (2) shields and (3) memory banks. Each of these objects can be associated with a distinct provider. So you can get model A to be inferenced locally while model B, C can be inference remotely (e.g.)
However, this had a few drawbacks:
you could not address the provider instances -- i.e., if you configured "meta-reference" with a given model, you could not assign an identifier to this instance which you could re-use later.
the above meant that you could not register a "routing_key" (e.g. model) dynamically and say "please use this existing provider I have already configured" for a new model.
the terms "routing_table" and "routing_key" were exposed directly to the user. in my view, this is way too much overhead for a new user (which almost everyone is.) people come to the stack wanting to do ML and encounter a completely unexpected term.
What this PR does: This PR structures the run config with only a single prominent key:
- providers
Providers are instances of configured provider types. Here's an example which shows two instances of the remote::tgi provider which are serving two different models.
providers:
inference:
- provider_id: foo
provider_type: remote::tgi
config: { ... }
- provider_id: bar
provider_type: remote::tgi
config: { ... }
Secondly, the PR adds dynamic registration of { models | shields | memory_banks } to the API surface. The distribution still acts like a "routing table" (as previously) except that it asks the backing providers for a listing of these objects. For example it asks a TGI or Ollama inference adapter what models it is serving. Only the models that are being actually served can be requested by the user for inference. Otherwise, the Stack server will throw an error.
When dynamically registering these objects, you can use the provider IDs shown above. Info about providers can be obtained using the Api.inspect set of endpoints (/providers, /routes, etc.)
The above examples shows the correspondence between inference providers and models registry items. Things work similarly for the safety <=> shields and memory <=> memory_banks pairs.
Registry: This PR also makes it so that Providers need to implement additional methods for registering and listing objects. For example, each Inference provider is now expected to implement the ModelsProtocolPrivate protocol (naming is not great!) which consists of two methods
register_model
list_models
The goal is to inform the provider that a certain model needs to be supported so the provider can make any relevant backend changes if needed (or throw an error if the model cannot be supported.)
There are many other cleanups included some of which are detailed in a follow-up comment.
Test Plan:
First, start a TGI container with `meta-llama/Llama-Guard-3-8B` model
serving on port 5099. See https://github.com/meta-llama/llama-stack/pull/53 and its
description for how.
Then run llama-stack with the following run config:
```
image_name: safety
docker_image: null
conda_env: safety
apis_to_serve:
- models
- inference
- shields
- safety
api_providers:
inference:
providers:
- remote::tgi
safety:
providers:
- meta-reference
telemetry:
provider_id: meta-reference
config: {}
routing_table:
inference:
- provider_id: remote::tgi
config:
url: http://localhost:5099
api_token: null
hf_endpoint_name: null
routing_key: Llama-Guard-3-8B
safety:
- provider_id: meta-reference
config:
llama_guard_shield:
model: Llama-Guard-3-8B
excluded_categories: []
disable_input_check: false
disable_output_check: false
prompt_guard_shield: null
routing_key: llama_guard
```
Now simply run `python -m llama_stack.apis.safety.client localhost
<port>` and check that the llama_guard shield calls run correctly. (The
injection_shield calls fail as expected since we have not set up a
router for them.)
This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.
Most important bits:
* Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.
* We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
- routing model A to ollama and model B to a remote provider like Together
- routing shield A to local impl while shield B to a remote provider like Bedrock
- routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis
* Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.