Commit graph

24 commits

Author SHA1 Message Date
m-misiura
87d209d6ef Squashed commit of the following:
commit a95d2b15b83057e194cf69e57a03deeeeeadd7c2
Author: m-misiura <mmisiura@redhat.com>
Date:   Mon Mar 24 14:33:50 2025 +0000

    🚧 working on the config file so that it is inheriting from pydantic base models

commit 0546379f817e37bca030247b48c72ce84899a766
Author: m-misiura <mmisiura@redhat.com>
Date:   Mon Mar 24 09:14:31 2025 +0000

    🚧 dealing with ruff checks

commit 8abe39ee4cb4b8fb77c7252342c4809fa6ddc432
Author: m-misiura <mmisiura@redhat.com>
Date:   Mon Mar 24 09:03:18 2025 +0000

    🚧 dealing with mypy errors in `base.py`

commit 045f833e79c9a25af3d46af6c8896da91a0e6e62
Author: m-misiura <mmisiura@redhat.com>
Date:   Fri Mar 21 17:31:25 2025 +0000

    🚧 fixing mypy errors in content.py

commit a9c1ee4e92ad1b5db89039317555cd983edbde65
Author: m-misiura <mmisiura@redhat.com>
Date:   Fri Mar 21 17:09:02 2025 +0000

    🚧 fixing mypy errors in chat.py

commit 69e8ddc2f8a4e13cecbab30272fd7d685d7864ec
Author: m-misiura <mmisiura@redhat.com>
Date:   Fri Mar 21 16:57:28 2025 +0000

    🚧 fixing mypy errors

commit 56739d69a145c55335ac2859ecbe5b43d556e3b1
Author: m-misiura <mmisiura@redhat.com>
Date:   Fri Mar 21 14:01:03 2025 +0000

    🚧 fixing mypy errors in `__init__.py`

commit 4d2e3b55c4102ed75d997c8189847bbc5524cb2c
Author: m-misiura <mmisiura@redhat.com>
Date:   Fri Mar 21 12:58:06 2025 +0000

    🚧 ensuring routing_tables.py do not fail the ci

commit c0cc7b4b09ef50d5ec95fdb0a916c7ed228bf366
Author: m-misiura <mmisiura@redhat.com>
Date:   Fri Mar 21 12:09:24 2025 +0000

    🐛 fixing linter problems

commit 115a50211b604feb4106275204fe7f863da865f6
Author: m-misiura <mmisiura@redhat.com>
Date:   Fri Mar 21 11:47:04 2025 +0000

    🐛 fixing ruff errors

commit 29b5bfaabc77a35ea036b57f75fded711228dbbf
Author: m-misiura <mmisiura@redhat.com>
Date:   Fri Mar 21 11:33:31 2025 +0000

    🎨 automatic ruff fixes

commit 7c5a334c7d4649c2fc297993f89791c1e5643e5b
Author: m-misiura <mmisiura@redhat.com>
Date:   Fri Mar 21 11:15:02 2025 +0000

    Squashed commit of the following:

    commit e671aae5bcd4ea57d601ee73c9e3adf5e223e830
    Merge: b0dd9a4f 9114bef4
    Author: Mac Misiura <82826099+m-misiura@users.noreply.github.com>
    Date:   Fri Mar 21 09:45:08 2025 +0000

        Merge branch 'meta-llama:main' into feat_fms_remote_safety_provider

    commit b0dd9a4f746b0c8c54d1189d381a7ff8e51c812c
    Author: m-misiura <mmisiura@redhat.com>
    Date:   Fri Mar 21 09:27:21 2025 +0000

        📝 updated `provider_id`

    commit 4c8906c1a4e960968b93251d09d5e5735db15026
    Author: m-misiura <mmisiura@redhat.com>
    Date:   Thu Mar 20 16:54:46 2025 +0000

        📝 renaming from `fms` to `trustyai_fms`

    commit 4c0b62abc51b02143b5c818f2d30e1a1fee9e4f3
    Merge: bb842d69 54035825
    Author: m-misiura <mmisiura@redhat.com>
    Date:   Thu Mar 20 16:35:52 2025 +0000

        Merge branch 'main' into feat_fms_remote_safety_provider

    commit bb842d69548df256927465792e0cd107a267d2a0
    Author: m-misiura <mmisiura@redhat.com>
    Date:   Wed Mar 19 15:03:17 2025 +0000

         added a better way of handling params from the configs

    commit 58b6beabf0994849ac50317ed00b748596e8961d
    Merge: a22cf36c 7c044845
    Author: m-misiura <mmisiura@redhat.com>
    Date:   Wed Mar 19 09:19:57 2025 +0000

        Merge main into feat_fms_remote_safety_provider, resolve conflicts by keeping main version

    commit a22cf36c8757f74ed656c1310a4be6b288bf923a
    Author: m-misiura <mmisiura@redhat.com>
    Date:   Wed Mar 5 16:17:46 2025 +0000

        🎉 added a new remote safety provider compatible with FMS Orchestrator API and Detectors API

        Signed-off-by: m-misiura <mmisiura@redhat.com>
2025-03-24 14:46:03 +00:00
Derek Higgins
00917ef5b2
fix: Add 'accelerate' dependency to 'prompt-guard' (#1724)
Required to startup a distribution with prompt guard

Closes: #1723

## Test Plan
distribution starts with patch applied

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-03-21 07:37:20 -07:00
cdgamarose-nv
252a487085
feat: added nvidia as safety provider (#1248)
# What does this PR do?
Adds nvidia as a safety provider by interfacing with the nemo guardrails
microservice.
This enables checking user’s input or the LLM’s output against input and
output guardrails by using the `/v1/guardrails/checks` endpoint of the[
guardrails
API.](https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/guides/checks-guide.html)

## Test Plan
Deploy nemo guardrails service following the documentation:
https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/getting-started/deploy-docker.html

### Standalone:
```bash
(venv) local-cdgamarose@a1u1g-rome-0153:~/llama-stack$ pytest -v -s llama_stack/providers/tests/safety/test_safety.py --providers inference=nvidia,safety=nvidia --safety-shield meta/llama-3.1-8b-instruct

=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /localhome/local-cdgamarose/llama-stack/venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.10.12', 'Platform': 'Linux-5.15.0-122-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'html': '4.1.1'}}
rootdir: /localhome/local-cdgamarose/llama-stack
configfile: pyproject.toml
plugins: metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, html-4.1.1
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items

llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_shield_list[--inference=nvidia:safety=nvidia] Initializing NVIDIASafetyAdapter(http://0.0.0.0:7331)...
PASSED
llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_run_shield[--inference=nvidia:safety=nvidia] PASSED

============================================================================== 2 passed, 2 warnings in 4.78s ==============================================================================

```
### Distribution:
```
llama stack run llama_stack/templates/nvidia/run-with-safety.yaml
curl -v -X 'POST' "http://localhost:8321/v1/safety/run-shield" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"shield_id": "meta/llama-3.1-8b-instruct", "messages":[{"role": "user", "content": "you are stupid"}]}'
{"violation":{"violation_level":"error","user_message":"Sorry I cannot do this.","metadata":{"self check input":{"status":"blocked"}}}}
```

[//]: # (## Documentation)

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-17 14:39:23 -07:00
Ashwin Bharambe
d072b5fa0c
test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
Xi Yan
3c72c034e6
[remove import *] clean up import *'s (#689)
# What does this PR do?

- as title, cleaning up `import *`'s
- upgrade tests to make them more robust to bad model outputs
- remove import *'s in llama_stack/apis/* (skip __init__ modules)
<img width="465" alt="image"
src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2"
/>

- run `sh run_openapi_generator.sh`, no types gets affected

## Test Plan

### Providers Tests

**agents**
```
pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8
```

**inference**
```bash
# meta-reference
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

# together
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py 
```

**safety**
```
pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B
```

**memory**
```
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384
```

**scoring**
```
pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
```


**datasetio**
```
pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py
pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py
```


**eval**
```
pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py
```

### Client-SDK Tests
```
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk
```

### llama-stack-apps
```
PORT=5000
LOCALHOST=localhost

python -m examples.agents.hello $LOCALHOST $PORT
python -m examples.agents.inflation $LOCALHOST $PORT
python -m examples.agents.podcast_transcript $LOCALHOST $PORT
python -m examples.agents.rag_as_attachments $LOCALHOST $PORT
python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT
python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT
python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT

# Vision model
python -m examples.interior_design_assistant.app
python -m examples.agent_store.app $LOCALHOST $PORT
```

### CLI
```
which llama
llama model prompt-format -m Llama3.2-11B-Vision-Instruct
llama model list
llama stack list-apis
llama stack list-providers inference

llama stack build --template ollama --image-type conda
```

### Distributions Tests
**ollama**
```
llama stack build --template ollama --image-type conda
ollama run llama3.2:1b-instruct-fp16
llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct
```

**fireworks**
```
llama stack build --template fireworks --image-type conda
llama stack run ./llama_stack/templates/fireworks/run.yaml
```

**together**
```
llama stack build --template together --image-type conda
llama stack run ./llama_stack/templates/together/run.yaml
```

**tgi**
```
llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-27 15:45:44 -08:00
Dalton Flanagan
b007b062f3
Fix llama stack build in 0.0.54 (#505)
# What does this PR do?

Safety provider `inline::meta-reference` is now deprecated. However, we 

* aren't checking / printing the deprecation message in `llama stack
build`
* make the deprecated (unusable) provider

So I (1) added checking and (2) made `inline::llama-guard` the default

## Test Plan

Before

```
Traceback (most recent call last):
  File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module>
    sys.exit(main())
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main
    parser.run(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run
    args.func(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command
    self._run_stack_build_command_from_build_config(build_config)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 305, in _run_stack_build_command_from_build_config
    self._generate_run_config(build_config, build_dir)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 226, in _generate_run_config
    config_type = instantiate_class_type(
  File "/home/dalton/all/llama-stack/llama_stack/distribution/utils/dynamic.py", line 12, in instantiate_class_type
    module = importlib.import_module(module_name)
  File "/home/dalton/.conda/envs/nov22/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'llama_stack.providers.inline.safety.meta_reference'
```

After

```
Traceback (most recent call last):
  File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module>
    sys.exit(main())
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main
    parser.run(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run
    args.func(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command
    self._run_stack_build_command_from_build_config(build_config)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 309, in _run_stack_build_command_from_build_config
    self._generate_run_config(build_config, build_dir)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 228, in _generate_run_config
    raise InvalidProviderError(p.deprecation_error)
llama_stack.distribution.resolver.InvalidProviderError: 
Provider `inline::meta-reference` for API `safety` does not work with the latest Llama Stack.
- if you are using Llama Guard v3, please use the `inline::llama-guard` provider instead.
- if you are using Prompt Guard, please use the `inline::prompt-guard` provider instead.
- if you are using Code Scanner, please use the `inline::code-scanner` provider instead.
```

<img width="469" alt="Screenshot 2024-11-22 at 4 10 24 PM"
src="https://github.com/user-attachments/assets/8c2e09fe-379a-4504-b246-7925f80a6ed6">

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-22 16:23:44 -05:00
Ashwin Bharambe
3d7561e55c
Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00
Ashwin Bharambe
c1f7ba3aed
Split safety into (llama-guard, prompt-guard, code-scanner) (#400)
Splits the meta-reference safety implementation into three distinct providers:

- inline::llama-guard
- inline::prompt-guard
- inline::code-scanner

Note that this PR is a backward incompatible change to the llama stack server. I have added deprecation_error field to ProviderSpec -- the server reads it and immediately barfs. This is used to direct the user with a specific message on what action to perform. An automagical "config upgrade" is a bit too much work to implement right now :/

(Note that we will be gradually prefixing all inline providers with inline:: -- I am only doing this for this set of new providers because otherwise existing configuration files will break even more badly.)
2024-11-11 09:29:18 -08:00
Ashwin Bharambe
694c142b89
Add provider deprecation support; change directory structure (#397)
* Add provider deprecation support; change directory structure

* fix a couple dangling imports

* move the meta_reference safety dir also
2024-11-07 13:04:53 -08:00
Ashwin Bharambe
064d2a5287
Remove the safety adapter for Together; we can just use "meta-reference" (#387) 2024-11-06 17:36:57 -08:00
Ashwin Bharambe
994732e2e0
impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00
Ashwin Bharambe
6bb57e72a7
Remove "routing_table" and "routing_key" concepts for the user (#201)
This PR makes several core changes to the developer experience surrounding Llama Stack.

Background: PR #92 introduced the notion of "routing" to the Llama Stack. It introduces three object types: (1) models, (2) shields and (3) memory banks. Each of these objects can be associated with a distinct provider. So you can get model A to be inferenced locally while model B, C can be inference remotely (e.g.)

However, this had a few drawbacks:

you could not address the provider instances -- i.e., if you configured "meta-reference" with a given model, you could not assign an identifier to this instance which you could re-use later.
the above meant that you could not register a "routing_key" (e.g. model) dynamically and say "please use this existing provider I have already configured" for a new model.
the terms "routing_table" and "routing_key" were exposed directly to the user. in my view, this is way too much overhead for a new user (which almost everyone is.) people come to the stack wanting to do ML and encounter a completely unexpected term.
What this PR does: This PR structures the run config with only a single prominent key:

- providers
Providers are instances of configured provider types. Here's an example which shows two instances of the remote::tgi provider which are serving two different models.

providers:
  inference:
  - provider_id: foo
    provider_type: remote::tgi
    config: { ... }
  - provider_id: bar
    provider_type: remote::tgi
    config: { ... }
Secondly, the PR adds dynamic registration of { models | shields | memory_banks } to the API surface. The distribution still acts like a "routing table" (as previously) except that it asks the backing providers for a listing of these objects. For example it asks a TGI or Ollama inference adapter what models it is serving. Only the models that are being actually served can be requested by the user for inference. Otherwise, the Stack server will throw an error.

When dynamically registering these objects, you can use the provider IDs shown above. Info about providers can be obtained using the Api.inspect set of endpoints (/providers, /routes, etc.)

The above examples shows the correspondence between inference providers and models registry items. Things work similarly for the safety <=> shields and memory <=> memory_banks pairs.

Registry: This PR also makes it so that Providers need to implement additional methods for registering and listing objects. For example, each Inference provider is now expected to implement the ModelsProtocolPrivate protocol (naming is not great!) which consists of two methods

register_model
list_models
The goal is to inform the provider that a certain model needs to be supported so the provider can make any relevant backend changes if needed (or throw an error if the model cannot be supported.)

There are many other cleanups included some of which are detailed in a follow-up comment.
2024-10-10 10:24:13 -07:00
Ashwin Bharambe
fe4aabd690 provider_id => provider_type, adapter_id => adapter_type 2024-10-02 14:05:59 -07:00
Ashwin Bharambe
0a3999a9a4
Use inference APIs for executing Llama Guard (#121)
We should use Inference APIs to execute Llama Guard instead of directly needing to use HuggingFace modeling related code. The actual inference consideration is handled by Inference.
2024-09-28 15:40:06 -07:00
Xi Yan
82f420c4f0
fix safety using inference (#99) 2024-09-25 11:30:27 -07:00
Ashwin Bharambe
4fcda00872 Re-apply revert 2024-09-25 11:00:43 -07:00
Ashwin Bharambe
56aed59eb4
Support for Llama3.2 models and Swift SDK (#98) 2024-09-25 10:29:58 -07:00
Ashwin Bharambe
f45705cd10 Some lightweight cleanup and renaming for bedrock safety adapter 2024-09-24 19:29:56 -07:00
Ashwin Bharambe
a2465f3f9c Revert parts of 0d2eb3bd25 2024-09-24 19:20:51 -07:00
rsgrewal-aws
059e50b389
[aws-bedrock] Support for Bedrock Safety adapter (#96) 2024-09-24 19:16:55 -07:00
Yogish Baliga
b85d675c6f Adding safety adapter for Together 2024-09-24 18:35:48 -07:00
Ashwin Bharambe
0d2eb3bd25 Use inference APIs for running llama guard
Test Plan:

First, start a TGI container with `meta-llama/Llama-Guard-3-8B` model
serving on port 5099. See https://github.com/meta-llama/llama-stack/pull/53 and its
description for how.

Then run llama-stack with the following run config:

```
image_name: safety
docker_image: null
conda_env: safety
apis_to_serve:
- models
- inference
- shields
- safety
api_providers:
  inference:
    providers:
    - remote::tgi
  safety:
    providers:
    - meta-reference
  telemetry:
    provider_id: meta-reference
    config: {}
routing_table:
  inference:
  - provider_id: remote::tgi
    config:
      url: http://localhost:5099
      api_token: null
      hf_endpoint_name: null
    routing_key: Llama-Guard-3-8B
  safety:
  - provider_id: meta-reference
    config:
      llama_guard_shield:
        model: Llama-Guard-3-8B
        excluded_categories: []
        disable_input_check: false
        disable_output_check: false
      prompt_guard_shield: null
    routing_key: llama_guard
```

Now simply run `python -m llama_stack.apis.safety.client localhost
<port>` and check that the llama_guard shield calls run correctly. (The
injection_shield calls fail as expected since we have not set up a
router for them.)
2024-09-24 17:02:57 -07:00
Ashwin Bharambe
ec4fc800cc
[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92)
This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.

Most important bits:

* Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.

* We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
  - routing model A to ollama and model B to a remote provider like Together
  - routing shield A to local impl while shield B to a remote provider like Bedrock
  - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis

* Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.
2024-09-23 14:22:22 -07:00
Ashwin Bharambe
9487ad8294
API Updates (#73)
* API Keys passed from Client instead of distro configuration

* delete distribution registry

* Rename the "package" word away

* Introduce a "Router" layer for providers

Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:

- The inference API should be a routing layer over inference providers,
  routed using the "model" key
- The memory banks API is another instance where various memory bank
  types will be provided by independent providers (e.g., a vector store
  is served by Chroma while a keyvalue memory can be served by Redis or
  PGVector)

This commit introduces a generalized routing layer for this purpose.

* update `apis_to_serve`

* llama_toolchain -> llama_stack

* Codemod from llama_toolchain -> llama_stack

- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent

* Moved some stuff out of common/; re-generated OpenAPI spec

* llama-toolchain -> llama-stack (hyphens)

* add control plane API

* add redis adapter + sqlite provider

* move core -> distribution

* Some more toolchain -> stack changes

* small naming shenanigans

* Removing custom tool and agent utilities and moving them client side

* Move control plane to distribution server for now

* Remove control plane from API list

* no codeshield dependency randomly plzzzzz

* Add "fire" as a dependency

* add back event loggers

* stack configure fixes

* use brave instead of bing in the example client

* add init file so it gets packaged

* add init files so it gets packaged

* Update MANIFEST

* bug fix

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00
Renamed from llama_toolchain/safety/providers.py (Browse further)