Commit graph

67 commits

Author SHA1 Message Date
Ihar Hrachyshka
9e6561a1ec
chore: enable pyupgrade fixes (#1806)
# What does this PR do?

The goal of this PR is code base modernization.

Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)

Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-05-01 14:23:50 -07:00
Rashmi Pawar
e6bbf8d20b
feat: Add NVIDIA NeMo datastore (#1852)
# What does this PR do?
Implemetation of NeMO Datastore register, unregister API.

Open Issues: 
- provider_id gets set to `localfs` in client.datasets.register() as it
is specified in routing_tables.py: DatasetsRoutingTable
see: #1860

Currently I have passed `"provider_id":"nvidia"` in metadata and have
parsed that in `DatasetsRoutingTable`
(Not the best approach, but just a quick workaround to make it work for
now.)

## Test Plan
- Unit test cases: `pytest
tests/unit/providers/nvidia/test_datastore.py`
```bash
========================================================== test session starts ===========================================================
platform linux -- Python 3.10.0, pytest-8.3.5, pluggy-1.5.0
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-0.26.0, nbval-0.11.0, metadata-3.1.1, html-4.1.1, cov-6.1.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                                        

tests/unit/providers/nvidia/test_datastore.py ..                                                                                   [100%]

============================================================ warnings summary ============================================================

====================================================== 2 passed, 1 warning in 0.84s ======================================================
```

cc: @dglogo, @mattf, @yanxi0830
2025-04-28 09:41:59 -07:00
Ben Browning
2b2db5fbda
feat: OpenAI-Compatible models, completions, chat/completions (#1894)
# What does this PR do?

This stubs in some OpenAI server-side compatibility with three new
endpoints:

/v1/openai/v1/models
/v1/openai/v1/completions
/v1/openai/v1/chat/completions

This gives common inference apps using OpenAI clients the ability to
talk to Llama Stack using an endpoint like
http://localhost:8321/v1/openai/v1 .

The two "v1" instances in there isn't awesome, but the thinking is that
Llama Stack's API is v1 and then our OpenAI compatibility layer is
compatible with OpenAI V1. And, some OpenAI clients implicitly assume
the URL ends with "v1", so this gives maximum compatibility.

The openai models endpoint is implemented in the routing layer, and just
returns all the models Llama Stack knows about.

The following providers should be working with the new OpenAI
completions and chat/completions API:
* remote::anthropic (untested)
* remote::cerebras-openai-compat (untested)
* remote::fireworks (tested)
* remote::fireworks-openai-compat (untested)
* remote::gemini (untested)
* remote::groq-openai-compat (untested)
* remote::nvidia (tested)
* remote::ollama (tested)
* remote::openai (untested)
* remote::passthrough (untested)
* remote::sambanova-openai-compat (untested)
* remote::together (tested)
* remote::together-openai-compat (untested)
* remote::vllm (tested)

The goal to support this for every inference provider - proxying
directly to the provider's OpenAI endpoint for OpenAI-compatible
providers. For providers that don't have an OpenAI-compatible API, we'll
add a mixin to translate incoming OpenAI requests to Llama Stack
inference requests and translate the Llama Stack inference responses to
OpenAI responses.

This is related to #1817 but is a bit larger in scope than just chat
completions, as I have real use-cases that need the older completions
API as well.

## Test Plan

### vLLM

```
VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" llama stack build --template remote-vllm --image-type venv --run

LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "meta-llama/Llama-3.2-3B-Instruct"
```

### ollama
```
INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" llama stack build --template ollama --image-type venv --run

LLAMA_STACK_CONFIG=http://localhost:8321 INFERENCE_MODEL="llama3.2:3b-instruct-q8_0" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-q8_0"
```



## Documentation

Run a Llama Stack distribution that uses one of the providers mentioned
in the list above. Then, use your favorite OpenAI client to send
completion or chat completion requests with the base_url set to
http://localhost:8321/v1/openai/v1 . Replace "localhost:8321" with the
host and port of your Llama Stack server, if different.

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-11 13:14:17 -07:00
Paolo Dettori
22814299b0
fix: solve unregister_toolgroup error (#1608)
# What does this PR do?
Fixes issue #1537 that causes "500 Internal Server Error" when
unregistering a toolgroup

# (Closes #1537 )

## Test Plan

```console
$ pytest -s -v tests/integration/tool_runtime/test_registration.py --stack-config=ollama --env INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"
INFO     2025-03-14 21:15:03,999 tests.integration.conftest:41 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS          
/opt/homebrew/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
===================================================== test session starts =====================================================
platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /opt/homebrew/opt/python@3.10/bin/python3.10
cachedir: .pytest_cache
rootdir: /Users/paolo/Projects/aiplatform/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.25.3, anyio-4.8.0
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 1 item                                                                                                              

tests/integration/tool_runtime/test_registration.py::test_register_and_unregister_toolgroup[None-None-None-None-None] INFO     2025-03-14 21:15:04,478 llama_stack.providers.remote.inference.ollama.ollama:75 inference: checking            
         connectivity to Ollama at `http://localhost:11434`...                                                          
INFO     2025-03-14 21:15:05,350 llama_stack.providers.remote.inference.ollama.ollama:294 inference: Pulling embedding  
         model `all-minilm:latest` if necessary...                                                                      
INFO:     Started server process [78391]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     127.0.0.1:57424 - "GET /sse HTTP/1.1" 200 OK
INFO:     127.0.0.1:57434 - "GET /sse HTTP/1.1" 200 OK
INFO     2025-03-14 21:15:16,129 mcp.client.sse:51 uncategorized: Connecting to SSE endpoint: http://localhost:8000/sse 
INFO:     127.0.0.1:57445 - "GET /sse HTTP/1.1" 200 OK
INFO     2025-03-14 21:15:16,146 mcp.client.sse:71 uncategorized: Received endpoint URL:                                
         http://localhost:8000/messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b                                    
INFO     2025-03-14 21:15:16,147 mcp.client.sse:140 uncategorized: Starting post writer with endpoint URL:              
         http://localhost:8000/messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b                                    
INFO:     127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted
INFO:     127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted
INFO:     127.0.0.1:57447 - "POST /messages/?session_id=c5b6fc01f8dc4b5e80e38eb1c1b22a9b HTTP/1.1" 202 Accepted
INFO     2025-03-14 21:15:16,155 mcp.server.lowlevel.server:535 uncategorized: Processing request of type               
         ListToolsRequest                                                                                               
PASSED

=============================================== 1 passed, 4 warnings in 12.17s ================================================
```

---------

Signed-off-by: Paolo Dettori <dettori@us.ibm.com>
2025-04-09 10:56:07 +02:00
Ihar Hrachyshka
0a895c70d1
fix(api): don't return list for runtime tools (#1686)
# What does this PR do?

Don't return list for runtime tools. Instead return Response object for
pagination and consistency with other APIs.

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-04-01 09:53:11 +02:00
Ashwin Bharambe
03b5c61bfc
feat: make sure agent sessions are under access control (#1737)
This builds on top of #1703.

Agent sessions are now properly access controlled.

## Test Plan

Added unit tests
2025-03-21 07:31:16 -07:00
Ashwin Bharambe
01a25d9744
feat(server): add attribute based access control for resources (#1703)
This PR introduces a way to implement Attribute Based Access Control
(ABAC) for the Llama Stack server.

The rough design is:
- https://github.com/meta-llama/llama-stack/pull/1626 added a way for
the Llama Stack server to query an authenticator
- We build upon that and expect "access attributes" as part of the
response. These attributes indicate the scopes available for the
request.
- We use these attributes to perform access control for registered
resources as well as for constructing the default access control
policies for newly created resources.
- By default, if you support authentication but don't return access
attributes, we will add a unique namespace pointing to the API_KEY. That
way, all resources by default will be scoped to API_KEYs.

An important aspect of this design is that Llama Stack stays out of the
business of credential management or the CRUD for attributes. How you
manage your namespaces or projects is entirely up to you. The design
only implements access control checks for the metadata / book-keeping
information that the Stack tracks.

### Limitations

- Currently, read vs. write vs. admin permissions aren't made explicit,
but this can be easily extended by adding appropriate attributes to the
`AccessAttributes` data structure.
- This design does not apply to agent instances since they are not
considered resources the Stack knows about. Agent instances are
completely within the scope of the Agents API provider.

### Test Plan

Added unit tests, existing integration tests
2025-03-19 21:28:52 -07:00
ehhuang
1902e5754c
fix: toolgroups unregister (#1704)
# What does this PR do?
FAILED
tests/integration/tools/test_tools.py::test_toolsgroups_unregister[None]
- AttributeError: 'coroutine' object has no attribute 'data'

## Test Plan
LLAMA_STACK_CONFIG=fireworks pytest -s -v
tests/integration/tools/test_tools.py
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1704).
* #1705
* __->__ #1704
2025-03-19 13:43:51 -07:00
Botao Chen
ab777ef5cd
fix: fix open-benchmark template (#1695)
## What does this PR do?
open-benchmark templated is broken after the datasets api refactor due
to 2 reasons
- provider_id and provider_resource_id are no longer needed 
- the type in run.yaml will be resolved as dict

this PR is to fix the above 2 issues 

## Test 
spin up a llama stack server successfully with llama stack run
`llama_stack/templates/open-benchmark/run.yaml`
2025-03-19 11:27:11 -07:00
Sébastien Han
c029fbcd13
fix: return 4xx for non-existent resources in GET requests (#1635)
# What does this PR do?

- Removed Optional return types for GET methods
- Raised ValueError when requested resource is not found
- Ensures proper 4xx response for missing resources
- Updated the API generator to check for wrong signatures

```
$ uv run --with ".[dev]" ./docs/openapi_generator/run_openapi_generator.sh
Validating API method return types...

API Method Return Type Validation Errors:

Method ScoringFunctions.get_scoring_function returns Optional type
```

Closes: https://github.com/meta-llama/llama-stack/issues/1630

## Test Plan

Run the server then:

```
curl http://127.0.0.1:8321/v1/models/foo     
{"detail":"Invalid value: Model 'foo' not found"}%  
```

Server log:

```
INFO:     127.0.0.1:52307 - "GET /v1/models/foo HTTP/1.1" 400 Bad Request
09:51:42.654 [END] /v1/models/foo [StatusCode.OK] (134.65ms)
 09:51:42.651 [ERROR] Error executing endpoint route='/v1/models/{model_id:path}' method='get'
Traceback (most recent call last):
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 193, in endpoint
    return await maybe_await(value)
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/server/server.py", line 156, in maybe_await
    return await value
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/routers/routing_tables.py", line 217, in get_model
    raise ValueError(f"Model '{model_id}' not found")
ValueError: Model 'foo' not found
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-18 14:06:53 -07:00
Xi Yan
5287b437ae
feat(api): (1/n) datasets api clean up (#1573)
## PR Stack
- https://github.com/meta-llama/llama-stack/pull/1573
- https://github.com/meta-llama/llama-stack/pull/1625
- https://github.com/meta-llama/llama-stack/pull/1656
- https://github.com/meta-llama/llama-stack/pull/1657
- https://github.com/meta-llama/llama-stack/pull/1658
- https://github.com/meta-llama/llama-stack/pull/1659
- https://github.com/meta-llama/llama-stack/pull/1660

**Client SDK**
- https://github.com/meta-llama/llama-stack-client-python/pull/203

**CI**
- 1391130488
<img width="1042" alt="image"
src="https://github.com/user-attachments/assets/69636067-376d-436b-9204-896e2dd490ca"
/>
-- the test_rag_agent_with_attachments is flaky and not related to this
PR

## Doc
<img width="789" alt="image"
src="https://github.com/user-attachments/assets/b88390f3-73d6-4483-b09a-a192064e32d9"
/>


## Client Usage
```python
client.datasets.register(
    source={
        "type": "uri",
        "uri": "lsfs://mydata.jsonl",
    },
    schema="jsonl_messages",
    # optional 
    dataset_id="my_first_train_data"
)

# quick prototype debugging
client.datasets.register(
    data_reference={
        "type": "rows",
        "rows": [
                "messages": [...],
        ],
    },
    schema="jsonl_messages",
)
```

## Test Plan
- CI:
1387805545

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/datasets/test_datasets.py
```

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py
```

```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
2025-03-17 16:55:45 -07:00
Daniele Martinoli
fb998683e0
fix: Agent uses the first configured vector_db_id when documents are provided (#1276)
# What does this PR do?
The agent API allows to query multiple DBs using the `vector_db_ids`
argument of the `rag` tool:
```py
        toolgroups=[
            {
                "name": "builtin::rag",
                "args": {"vector_db_ids": [vector_db_id]},
            }
        ],
```
This means that multiple DBs can be used to compose an aggregated
context by executing the query on each of them.

When documents are passed to the next agent turn, there is no explicit
way to configure the vector DB where the embeddings will be ingested. In
such cases, we can assume that:
- if any `vector_db_ids` is given, we use the first one (it probably
makes sense to assume that it's the only one in the list, otherwise we
should loop on all the given DBs to have a consistent ingestion)
- if no `vector_db_ids` is given, we can use the current logic to
generate a default DB using the default provider. If multiple providers
are defined, the API will fail as expected: the user has to provide
details on where to ingest the documents.

(Closes #1270)

## Test Plan
The issue description details how to replicate the problem.

[//]: # (## Documentation)

---------

Signed-off-by: Daniele Martinoli <dmartino@redhat.com>
2025-03-04 21:44:13 -08:00
Ashwin Bharambe
abfbaf3c1b
refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401)
All of the tests from `llama_stack/providers/tests/` are now moved to
`tests/integration`.

I converted the `tools`, `scoring` and `datasetio` tests to use API.
However, `eval` and `post_training` proved to be a bit challenging to
leaving those. I think `post_training` should be relatively
straightforward also.

As part of this, I noticed that `wolfram_alpha` tool wasn't added to
some of our commonly used distros so I added it. I am going to remove a
lot of code duplication from distros next so while this looks like a
one-off right now, it will go away and be there uniformly for all
distros.
2025-03-04 14:53:47 -08:00
Ashwin Bharambe
4c8a0fa8dc fix: ensure ollama embedding model is registered properly in the template 2025-02-27 22:49:06 -08:00
Xi Yan
ea1faae50e
chore!: deprecate eval/tasks (#1186)
# What does this PR do?
- Fully deprecate eval/tasks

[//]: # (If resolving an issue, uncomment and update the line below)
Closes #1088 

NOTE: this will be a breaking change. We have introduced the new API in
0.1.3 .

Notebook has been updated to use the new endpoints.

## Test Plan
```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb 
```
<img width="611" alt="image"
src="https://github.com/user-attachments/assets/79f6efe1-81ba-494e-bf36-1fc0c2b9bc6f"
/>



cc @SLR722  for awareness

[//]: # (## Documentation)
2025-02-20 14:06:21 -08:00
Xi Yan
8b655e3cd2
fix!: update eval-tasks -> benchmarks (#1032)
# What does this PR do?

- Update `/eval-tasks` to `/benchmarks`
- ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task
config. Now we only have `BenchmarkConfig`. The overloaded `benchmark`
is confusing and do not add any value. Backward compatibility is being
kept as the "type" is not being used anywhere.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- This change is backward compatible 
- Run notebook test with

```
pytest -v -s --nbval-lax ./docs/getting_started.ipynb
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```

<img width="846" alt="image"
src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d"
/>



[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Co-authored-by: Ben Browning <ben324@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu <reid201711@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-13 16:40:58 -08:00
Sébastien Han
418645696a
fix: improve signal handling and update dependencies (#1044)
# What does this PR do?
This commit enhances the signal handling mechanism in the server by
improving the `handle_signal` (previously handle_sigint) function. It
now properly retrieves the signal name, ensuring clearer logging when a
termination signal is received. Additionally, it cancels all running
tasks and waits for their completion before stopping the event loop,
allowing for a more graceful shutdown. Support for handling
SIGTERM has also been added alongside SIGINT.

Before the changes, handle_sigint used asyncio.run(run_shutdown()).
However, asyncio.run() is meant to start a new event loop, and calling
it inside an existing one (like when running Uvicorn) raises an error.
The fix replaces asyncio.run(run_shutdown()) with an async function
scheduled on the existing loop using loop.create_task(shutdown()). This
ensures that the shutdown coroutine runs within the current event loop
instead of trying to create a new one.

Furthermore, this commit updates the project dependencies. `fastapi` and
`uvicorn` have been added to the development dependencies in
`pyproject.toml` and `uv.lock`, ensuring that the necessary packages are
available for development and execution.

Closes: https://github.com/meta-llama/llama-stack/issues/1043
Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run a server and send SIGINT:

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
  db_path: /Users/leseb/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-3B-Instruct
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - llm
  provider_id: ollama
  provider_model_id: null
- metadata:
    embedding_dimension: 384
  model_id: all-MiniLM-L6-v2
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - embedding
  provider_id: sentence-transformers
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  datasetio:
  - config: {}
    provider_id: huggingface
    provider_type: remote::huggingface
  - config: {}
    provider_id: localfs
    provider_type: inline::localfs
  eval:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  - config: {}
    provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  scoring:
  - config: {}
    provider_id: basic
    provider_type: inline::basic
  - config: {}
    provider_id: llm-as-judge
    provider_type: inline::llm-as-judge
  - config:
      openai_api_key: '********'
    provider_id: braintrust
    provider_type: inline::braintrust
  telemetry:
  - config:
      service_name: llama-stack
      sinks: console,sqlite
      sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
    provider_id: meta-reference
    provider_type: inline::meta-reference
  tool_runtime:
  - config:
      api_key: '********'
      max_results: 3
    provider_id: brave-search
    provider_type: remote::brave-search
  - config:
      api_key: '********'
      max_results: 3
    provider_id: tavily-search
    provider_type: remote::tavily-search
  - config: {}
    provider_id: code-interpreter
    provider_type: inline::code-interpreter
  - config: {}
    provider_id: rag-runtime
    provider_type: inline::rag-runtime
  vector_io:
  - config:
      kvstore:
        db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
scoring_fns: []
server:
  port: 8321
  tls_certfile: null
  tls_keyfile: null
shields: []
tool_groups:
- args: null
  mcp_endpoint: null
  provider_id: tavily-search
  toolgroup_id: builtin::websearch
- args: null
  mcp_endpoint: null
  provider_id: rag-runtime
  toolgroup_id: builtin::rag
- args: null
  mcp_endpoint: null
  provider_id: code-interpreter
  toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'

INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:213: Resolved 31 providers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-inference => ollama
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-inference => sentence-transformers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  models => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inference => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-vector_io => faiss
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-safety => llama-guard
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  shields => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  safety => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  vector_dbs => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  vector_io => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => brave-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => tavily-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => code-interpreter
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-tool_runtime => rag-runtime
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  tool_groups => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  tool_runtime => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  agents => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-datasetio => huggingface
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-datasetio => localfs
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  datasets => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  datasetio => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  telemetry => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-scoring => basic
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-scoring => llm-as-judge
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-scoring => braintrust
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  scoring_functions => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  scoring => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inner-eval => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  eval_tasks => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  eval => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215:  inspect => __builtin__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:216: 
INFO 2025-02-12 10:21:03,723 llama_stack.providers.remote.inference.ollama.ollama:148: checking connectivity to Ollama at `http://localhost:11434`...
INFO 2025-02-12 10:21:03,734 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:03,843 faiss.loader:148: Loading faiss.
INFO 2025-02-12 10:21:03,865 faiss.loader:150: Successfully loaded faiss.
INFO 2025-02-12 10:21:03,868 faiss:173: Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes.
Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
INFO 2025-02-12 10:21:04,315 datasets:54: PyTorch version 2.6.0 available.
INFO 2025-02-12 10:21:04,556 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:04,557 llama_stack.providers.utils.inference.embedding_mixin:42: Loading sentence transformer for all-MiniLM-L6-v2...
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:210: Use pytorch device_name: mps
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:218: Load pretrained SentenceTransformer: all-MiniLM-L6-v2
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: all-MiniLM-L6-v2 served by sentence-transformers
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: meta-llama/Llama-3.2-3B-Instruct served by ollama
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::equality served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::subset_of served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-correctness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-similarity served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-entity-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-precision served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::factuality served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::faithfulness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::405b-simpleqa served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::base served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::code_interpreter served by code-interpreter
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::rag served by rag-runtime
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::websearch served by tavily-search
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:106: 
Serving API eval
 POST /v1/eval/tasks/{task_id}/evaluations
 DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}
 POST /v1/eval/tasks/{task_id}/jobs
Serving API agents
 POST /v1/agents
 POST /v1/agents/{agent_id}/session
 POST /v1/agents/{agent_id}/session/{session_id}/turn
 DELETE /v1/agents/{agent_id}
 DELETE /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API scoring_functions
 GET /v1/scoring-functions/{scoring_fn_id}
 GET /v1/scoring-functions
 POST /v1/scoring-functions
Serving API safety
 POST /v1/safety/run-shield
Serving API inspect
 GET /v1/health
 GET /v1/inspect/providers
 GET /v1/inspect/routes
 GET /v1/version
Serving API tool_runtime
 POST /v1/tool-runtime/invoke
 GET /v1/tool-runtime/list-tools
 POST /v1/tool-runtime/rag-tool/insert
 POST /v1/tool-runtime/rag-tool/query
Serving API datasetio
 POST /v1/datasetio/rows
 GET /v1/datasetio/rows
Serving API shields
 GET /v1/shields/{identifier}
 GET /v1/shields
 POST /v1/shields
Serving API eval_tasks
 GET /v1/eval-tasks/{eval_task_id}
 GET /v1/eval-tasks
 POST /v1/eval-tasks
Serving API models
 GET /v1/models/{model_id}
 GET /v1/models
 POST /v1/models
 DELETE /v1/models/{model_id}
Serving API datasets
 GET /v1/datasets/{dataset_id}
 GET /v1/datasets
 POST /v1/datasets
 DELETE /v1/datasets/{dataset_id}
Serving API vector_io
 POST /v1/vector-io/insert
 POST /v1/vector-io/query
Serving API inference
 POST /v1/inference/chat-completion
 POST /v1/inference/completion
 POST /v1/inference/embeddings
Serving API tool_groups
 GET /v1/tools/{tool_name}
 GET /v1/toolgroups/{toolgroup_id}
 GET /v1/toolgroups
 GET /v1/tools
 POST /v1/toolgroups
 DELETE /v1/toolgroups/{toolgroup_id}
Serving API vector_dbs
 GET /v1/vector-dbs/{vector_db_id}
 GET /v1/vector-dbs
 POST /v1/vector-dbs
 DELETE /v1/vector-dbs/{vector_db_id}
Serving API scoring
 POST /v1/scoring/score
 POST /v1/scoring/score-batch
Serving API telemetry
 GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
 GET /v1/telemetry/spans/{span_id}/tree
 GET /v1/telemetry/traces/{trace_id}
 POST /v1/telemetry/events
 GET /v1/telemetry/spans
 GET /v1/telemetry/traces
 POST /v1/telemetry/spans/export

Listening on ['::', '0.0.0.0']:5001
INFO:     Started server process [65372]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
^CINFO:     Shutting down
INFO:     Finished server process [65372]
Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-12 10:21:11,215 __main__:151: Shutting down ModelsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down InferenceRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ShieldsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down SafetyRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorDBsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolGroupsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolRuntimeRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down MetaReferenceAgentsImpl
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down TelemetryAdapter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringFunctionsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalTasksRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DistributionInspectImpl
```

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 08:07:59 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Ashwin Bharambe
3ae8585b65
[memory refactor][1/n] Rename Memory -> VectorIO, MemoryBanks -> VectorDBs (#828)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

This is the first part:

- delete other kinds of memory banks (keyvalue, keyword, graph) for now;
we will introduce a keyvalue store API as part of this design but not
use it in the RAG tool yet.
- renaming of the APIs
2025-01-22 09:59:30 -08:00
Sixian Yi
c79b087552
[test automation] support run tests on config file (#730)
# Context
For test automation, the end goal is to run a single pytest command from
root test directory (llama_stack/providers/tests/.) such that we execute
push-blocking tests

The work plan: 
1) trigger pytest from llama_stack/providers/tests/.
2) use config file to determine what tests and parametrization we want
to run

# What does this PR do?
1) consolidates the "inference-models" / "embedding-model" /
"judge-model" ... options in root conftest.py. Without this change, we
will hit into error when trying to run `pytest
/Users/sxyi/llama-stack/llama_stack/providers/tests/.` because of
duplicated `addoptions` definitions across child conftest files.

2) Add a `config` option to specify test config in YAML. (see
[`ci_test_config.yaml`](https://gist.github.com/sixianyi0721/5b37fbce4069139445c2f06f6e42f87e)
for example config file)

For provider_fixtures, we allow users to use either a default fixture
combination or define their own {api:provider} combinations.

```

memory:
....
  fixtures:
    provider_fixtures:
    - default_fixture_param_id: ollama // use default fixture combination with param_id="ollama" in [providers/tests/memory/conftest.py](https://fburl.com/mtjzwsmk)
    - inference: sentence_transformers
      memory: faiss
    - default_fixture_param_id: chroma

```
3) generate tests according to the config. Logic lives in two places: 
a) in `{api}/conftest.py::pytest_generate_tests`, we read from config to
do parametrization.
b) after test collection, in `pytest_collection_modifyitems`, we filter
the tests to include only functions listed in config.

## Test Plan
1) `pytest /Users/sxyi/llama-stack/llama_stack/providers/tests/.
--collect-only --config=ci_test_config.yaml`

Using `--collect-only` tag to print the pytests listed in the config
file (`ci_test_config.yaml`).

output:
[gist](https://gist.github.com/sixianyi0721/05145e60d4d085c17cfb304beeb1e60e)


2) sanity check on `--inference-model` option

```
pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
```


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-16 12:05:49 -08:00
Dinesh Yeduguru
7fb2c1c48d
More idiomatic REST API (#765)
# What does this PR do?

This PR changes our API to follow more idiomatic REST API approaches of
having paths being resources and methods indicating the action being
performed.

Changes made to generator:
1) removed the prefix check of "get" as its not required and is actually
needed for other method types too
2) removed _ check on path since variables can have "_"



## Test Plan

LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v
tests/client-sdk/agents/test_agents.py
2025-01-15 13:20:09 -08:00
Dinesh Yeduguru
8af6951106
remove conflicting default for tool prompt format in chat completion (#742)
# What does this PR do?
We are setting a default value of json for tool prompt format, which
conflicts with llama 3.2/3.3 models since they use python list. This PR
changes the defaults to None and in the code, we infer default based on
the model.

Addresses: #695 

Tests:
❯ LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v
tests/client-sdk/inference/test_inference.py -k
"test_text_chat_completion"

 pytest llama_stack/providers/tests/inference/test_prompt_adapter.py
2025-01-10 10:41:53 -08:00
Dinesh Yeduguru
a5c57cd381
agents to use tools api (#673)
# What does this PR do?

PR #639 introduced the notion of Tools API and ability to invoke tools
through API just as any resource. This PR changes the Agents to start
using the Tools API to invoke tools. Major changes include:
1) Ability to specify tool groups with AgentConfig
2) Agent gets the corresponding tool definitions for the specified tools
and pass along to the model
3) Attachements are now named as Documents and their behavior is mostly
unchanged from user perspective
4) You can specify args that can be injected to a tool call through
Agent config. This is especially useful in case of memory tool, where
you want the tool to operate on a specific memory bank.
5) You can also register tool groups with args, which lets the agent
inject these as well into the tool call.
6) All tests have been migrated to use new tools API and fixtures
including client SDK tests
7) Telemetry just works with tools API because of our trace protocol
decorator


## Test Plan
```
pytest -s -v -k fireworks llama_stack/providers/tests/agents/test_agents.py  \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct

pytest -s -v -k together  llama_stack/providers/tests/tools/test_tools.py \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct

LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py
```
run.yaml:
https://gist.github.com/dineshyv/0365845ad325e1c2cab755788ccc5994

Notebook:
https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d?usp=sharing
2025-01-08 19:01:00 -08:00
Xi Yan
3c72c034e6
[remove import *] clean up import *'s (#689)
# What does this PR do?

- as title, cleaning up `import *`'s
- upgrade tests to make them more robust to bad model outputs
- remove import *'s in llama_stack/apis/* (skip __init__ modules)
<img width="465" alt="image"
src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2"
/>

- run `sh run_openapi_generator.sh`, no types gets affected

## Test Plan

### Providers Tests

**agents**
```
pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8
```

**inference**
```bash
# meta-reference
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

# together
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py 
```

**safety**
```
pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B
```

**memory**
```
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384
```

**scoring**
```
pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
```


**datasetio**
```
pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py
pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py
```


**eval**
```
pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py
```

### Client-SDK Tests
```
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk
```

### llama-stack-apps
```
PORT=5000
LOCALHOST=localhost

python -m examples.agents.hello $LOCALHOST $PORT
python -m examples.agents.inflation $LOCALHOST $PORT
python -m examples.agents.podcast_transcript $LOCALHOST $PORT
python -m examples.agents.rag_as_attachments $LOCALHOST $PORT
python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT
python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT
python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT

# Vision model
python -m examples.interior_design_assistant.app
python -m examples.agent_store.app $LOCALHOST $PORT
```

### CLI
```
which llama
llama model prompt-format -m Llama3.2-11B-Vision-Instruct
llama model list
llama stack list-apis
llama stack list-providers inference

llama stack build --template ollama --image-type conda
```

### Distributions Tests
**ollama**
```
llama stack build --template ollama --image-type conda
ollama run llama3.2:1b-instruct-fp16
llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct
```

**fireworks**
```
llama stack build --template fireworks --image-type conda
llama stack run ./llama_stack/templates/fireworks/run.yaml
```

**together**
```
llama stack build --template together --image-type conda
llama stack run ./llama_stack/templates/together/run.yaml
```

**tgi**
```
llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-27 15:45:44 -08:00
Dinesh Yeduguru
c8be0bf1c9
Tools API with brave and MCP providers (#639)
This PR adds a new Tools api and adds two tool runtime providers: brave
and MCP.

Test plan:
```
curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \
-H 'Content-Type: application/json' \
-d '{ "tool_group_id": "simple_tool",
  "tool_group": {
    "type": "model_context_protocol",
    "endpoint": {"uri": "http://localhost:56000/sse"}
  },
  "provider_id": "model-context-protocol"
}'

 curl -X POST 'http://localhost:5000/alpha/toolgroups/register' \
-H 'Content-Type: application/json' \
-d '{
  "tool_group_id": "search", "provider_id": "brave-search",
  "tool_group": {
    "type": "user_defined",
    "tools": [
      {
        "name": "brave_search",
        "description": "A web search tool",
        "parameters": [
          {
            "name": "query",
            "parameter_type": "string",
            "description": "The query to search"
          }
        ],
        "metadata": {},
        "tool_prompt_format": "json"
      }
    ]
  }
}'

 curl -X GET http://localhost:5000/alpha/tools/list | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   662  100   662    0     0   333k      0 --:--:-- --:--:-- --:--:--  646k
[
  {
    "identifier": "brave_search",
    "provider_resource_id": "brave_search",
    "provider_id": "brave-search",
    "type": "tool",
    "tool_group": "search",
    "description": "A web search tool",
    "parameters": [
      {
        "name": "query",
        "parameter_type": "string",
        "description": "The query to search"
      }
    ],
    "metadata": {},
    "tool_prompt_format": "json"
  },
  {
    "identifier": "fetch",
    "provider_resource_id": "fetch",
    "provider_id": "model-context-protocol",
    "type": "tool",
    "tool_group": "simple_tool",
    "description": "Fetches a website and returns its content",
    "parameters": [
      {
        "name": "url",
        "parameter_type": "string",
        "description": "URL to fetch"
      }
    ],
    "metadata": {
      "endpoint": "http://localhost:56000/sse"
    },
    "tool_prompt_format": "json"
  }
]

curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \
-H 'Content-Type: application/json' \
-d '{
    "tool_name": "fetch",
    "args": {
        "url": "http://google.com/"
    }
}'

 curl -X POST 'http://localhost:5000/alpha/tool-runtime/invoke' \
-H 'Content-Type: application/json' -H 'X-LlamaStack-ProviderData: {"api_key": "<KEY>"}' \
-d '{
    "tool_name": "brave_search",
    "args": {
        "query": "who is meta ceo"
    }
}'
```
2024-12-19 21:25:17 -08:00
Ashwin Bharambe
8de8eb03c8
Update the "InterleavedTextMedia" type (#635)
## What does this PR do?

This is a long-pending change and particularly important to get done
now.

Specifically:
- we cannot "localize" (aka download) any URLs from media attachments
anywhere near our modeling code. it must be done within llama-stack.
- `PIL.Image` is infesting all our APIs via `ImageMedia ->
InterleavedTextMedia` and that cannot be right at all. Anything in the
API surface must be "naturally serializable". We need a standard `{
type: "image", image_url: "<...>" }` which is more extensible
- `UserMessage`, `SystemMessage`, etc. are moved completely to
llama-stack from the llama-models repository.

See https://github.com/meta-llama/llama-models/pull/244 for the
corresponding PR in llama-models.

## Test Plan

```bash
cd llama_stack/providers/tests

pytest -s -v -k "fireworks or ollama or together" inference/test_vision_inference.py
pytest -s -v -k "(fireworks or ollama or together) and llama_3b" inference/test_text_inference.py
pytest -s -v -k chroma memory/test_memory.py \
  --env EMBEDDING_DIMENSION=384 --env CHROMA_DB_PATH=/tmp/foobar

pytest -s -v -k fireworks agents/test_agents.py  \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct
```

Updated the client sdk (see PR ...), installed the SDK in the same
environment and then ran the SDK tests:

```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=together pytest -s -v agents/test_agents.py
LLAMA_STACK_CONFIG=ollama pytest -s -v memory/test_memory.py

# this one needed a bit of hacking in the run.yaml to ensure I could register the vision model correctly
INFERENCE_MODEL=llama3.2-vision:latest LLAMA_STACK_CONFIG=ollama pytest -s -v inference/test_inference.py
```
2024-12-17 11:18:31 -08:00
Dinesh Yeduguru
516e1a3e59
add embedding model by default to distribution templates (#617)
# What does this PR do?
Adds the sentence transformer provider and the `all-MiniLM-L6-v2`
embedding model to the default models to register in the run.yaml for
all providers.

## Test Plan
llama stack build --template together --image-type conda
llama stack run
~/.llama/distributions/llamastack-together/together-run.yaml
2024-12-13 12:48:00 -08:00
Dinesh Yeduguru
96e158eaac
Make embedding generation go through inference (#606)
This PR does the following:
1) adds the ability to generate embeddings in all supported inference
providers.
2) Moves all the memory providers to use the inference API and improved
the memory tests to setup the inference stack correctly and use the
embedding models

This is a merge from #589 and #598
2024-12-12 11:47:50 -08:00
Dinesh Yeduguru
47b2dc8ae3
Revert "add model type to APIs" (#605)
Reverts meta-llama/llama-stack#588
2024-12-11 10:17:54 -08:00
Dinesh Yeduguru
8e33db6015
add model type to APIs (#588)
# What does this PR do?

This PR adds a new model type field to support embedding models to be
registered. Summary of changes:
1) Each registered model by default is an llm model. 
2) User can specify an embedding model type, while registering.If
specified, the model bypass the llama model checks since embedding
models can by of any type and based on llama.
3) User needs to include the required embedding dimension in metadata.
This will be used by embedding generation to generate the requried size
of embeddings.


## Test Plan

This PR will go together will need to be merged with two follow up PRs
that will include test plans.
2024-12-11 10:16:53 -08:00
Sixian Yi
caf1dac114
unregister API for dataset (#507)
# What does this PR do?

1) Implement `unregister_dataset(dataset_id)` API in both llama stack
routing table and providers: It removes {dataset_id -> Dataset} mapping
from routing table and removes the dataset_id references in provider as
well (ex. for huggingface, we use a KV store to store the dataset id =>
dataset. we delete it during unregistering as well)

2) expose the datasets/unregister_dataset api endpoint 

## Test Plan

**Unit test:** 

`
pytest llama_stack/providers/tests/datasetio/test_datasetio.py -m
"huggingface" -v -s --tb=short --disable-warnings
`

**Test on endpoint:**
tested llama stack using an ollama distribution template:
1) start an ollama server 
2) Start a llama stack server with the default ollama distribution
config + dataset/datasetsio APIs + datasetio provider
```
---- .../ollama-run.yaml
...
apis:
- agents
- inference
- memory
- safety
- telemetry
- datasetio
- datasets
providers:
  datasetio:
  - provider_id: localfs
    provider_type: inline::localfs
    config: {}
...
```
   saw that the new API showed up in startup script
   
  ```
Serving API datasets
 GET /alpha/datasets/get
 GET /alpha/datasets/list
 POST /alpha/datasets/register
 POST /alpha/datasets/unregister
```

3) query `/alpha/datasets/unregister` through curl (since we have not implemented unregister api in llama stack client)

```
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets register
--dataset-id sixian --url
https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/chat.rst
--schema {}
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets list
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ metadata ┃ type    ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ sixian     │ localfs     │ {}       │ dataset │
└────────────┴─────────────┴──────────┴─────────┘
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets register
--dataset-id sixian2 --url
https://raw.githubusercontent.com/pytorch/torchtune/main/docs/source/tutorials/chat.rst
--schema {}
(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets list
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ metadata ┃ type    ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ sixian     │ localfs     │ {}       │ dataset │
│ sixian2    │ localfs     │ {}       │ dataset │
└────────────┴─────────────┴──────────┴─────────┘
(base) sxyi@sxyi-mbp llama-stack % curl
http://localhost:5001/alpha/datasets/unregister \
-H "Content-Type: application/json" \
-d '{"dataset_id": "sixian"}'
null%

(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets list
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ metadata ┃ type    ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━┩
│ sixian2    │ localfs     │ {}       │ dataset │
└────────────┴─────────────┴──────────┴─────────┘
(base) sxyi@sxyi-mbp llama-stack % curl
http://localhost:5001/alpha/datasets/unregister \
-H "Content-Type: application/json" \
-d '{"dataset_id": "sixian2"}'
null%

(base) sxyi@sxyi-mbp llama-stack % llama-stack-client datasets list
```

## Sources


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-03 21:18:30 -08:00
Dinesh Yeduguru
1d8d0593af
register with provider even if present in stack (#491)
# What does this PR do?

Remove a check which skips provider registration if a resource is
already in stack registry. Since we do not reconcile state with
provider, register should always call into provider's register endpoint.


## Test Plan
```
# stack run
╰─❯ llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml

#register memory bank
❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0

Memory Bank Configuration:
{
│   'memory_bank_type': 'vector',
│   'chunk_size_in_tokens': 512,
│   'embedding_model': 'all-MiniLM-L6-v2',
│   'overlap_size_in_tokens': 64
}

#register again
❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0

Memory Bank Configuration:
{
│   'memory_bank_type': 'vector',
│   'chunk_size_in_tokens': 512,
│   'embedding_model': 'all-MiniLM-L6-v2',
│   'overlap_size_in_tokens': 64
}
```
2024-11-20 11:05:50 -08:00
Dinesh Yeduguru
0850ad656a
unregister for memory banks and remove update API (#458)
The semantics of an Update on resources is very tricky to reason about
especially for memory banks and models. The best way to go forward here
is for the user to unregister and register a new resource. We don't have
a compelling reason to support update APIs.


Tests:
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m
"chroma" --env CHROMA_HOST=localhost --env CHROMA_PORT=8000

pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m
"pgvector" --env PGVECTOR_DB=postgres --env PGVECTOR_USER=postgres --env
PGVECTOR_PASSWORD=mysecretpassword --env PGVECTOR_HOST=0.0.0.0

$CONDA_PREFIX/bin/pytest -v -s -m "ollama"
llama_stack/providers/tests/inference/test_model_registration.py

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-14 17:12:11 -08:00
Dinesh Yeduguru
46f0b6606a
init registry once (#450)
We are calling the initialize function on the registery in the common
routing table impl, which is incorrect as the common routing table is
the base class inherited by each resource's routing table. this change
moves remove that and add the initialize to the creation, where it inits
once server run.

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-13 22:20:57 -08:00
Dinesh Yeduguru
efe791bab7
Support model resource updates and deletes (#452)
# What does this PR do?
* Changes the registry to store only one RoutableObject per identifier.
Before it was a list, which is not really required.
* Adds impl for updates and deletes
* Updates routing table to handle updates correctly



## Test Plan
```
❯ llama-stack-client models list
+------------------------+---------------+------------------------------------+------------+
| identifier             | provider_id   | provider_resource_id               | metadata   |
+========================+===============+====================================+============+
| Llama3.1-405B-Instruct | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.1-8B-Instruct   | fireworks-0   | fireworks/llama-v3p1-8b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.2-3B-Instruct   | fireworks-0   | fireworks/llama-v3p2-1b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
❯ llama-stack-client models register dineshyv-model --provider-model-id=fireworks/llama-v3p1-70b-instruct
Successfully registered model dineshyv-model
❯ llama-stack-client models list
+------------------------+---------------+------------------------------------+------------+
| identifier             | provider_id   | provider_resource_id               | metadata   |
+========================+===============+====================================+============+
| Llama3.1-405B-Instruct | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.1-8B-Instruct   | fireworks-0   | fireworks/llama-v3p1-8b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.2-3B-Instruct   | fireworks-0   | fireworks/llama-v3p2-1b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| dineshyv-model         | fireworks-0   | fireworks/llama-v3p1-70b-instruct  | {}         |
+------------------------+---------------+------------------------------------+------------+
❯ llama-stack-client models update dineshyv-model --provider-model-id=fireworks/llama-v3p1-405b-instruct
Successfully updated model dineshyv-model
❯ llama-stack-client models list
+------------------------+---------------+------------------------------------+------------+
| identifier             | provider_id   | provider_resource_id               | metadata   |
+========================+===============+====================================+============+
| Llama3.1-405B-Instruct | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.1-8B-Instruct   | fireworks-0   | fireworks/llama-v3p1-8b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.2-3B-Instruct   | fireworks-0   | fireworks/llama-v3p2-1b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| dineshyv-model         | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
llama-stack-client models delete dineshyv-model
❯ llama-stack-client models list
+------------------------+---------------+------------------------------------+------------+
| identifier             | provider_id   | provider_resource_id               | metadata   |
+========================+===============+====================================+============+
| Llama3.1-405B-Instruct | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.1-8B-Instruct   | fireworks-0   | fireworks/llama-v3p1-8b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.2-3B-Instruct   | fireworks-0   | fireworks/llama-v3p2-1b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+

```

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-13 21:55:41 -08:00
Dinesh Yeduguru
e90ea1ab1e
make distribution registry thread safe and other fixes (#449)
This PR makes the following changes:
1) Fixes the get_all and initialize impl to actually read the values
returned from the range call to kvstore and not keys.
2) The start_key and end_key are fixed to correct perform the range
query after the key format changes
3) Made the cache registry thread safe since there are multiple
initializes called for each routing table.

Tests:
* Start stack
* Register dataset
* Kill stack
* Bring stack up
* dataset list
```
 llama-stack-client datasets list
+--------------+---------------+---------------------------------------------------------------------------------+---------+
| identifier   | provider_id   | metadata                                                                        | type    |
+==============+===============+=================================================================================+=========+
| alpaca       | huggingface-0 | {}                                                                              | dataset |
+--------------+---------------+---------------------------------------------------------------------------------+---------+
| mmlu         | huggingface-0 | {'path': 'llama-stack/evals', 'name': 'evals__mmlu__details', 'split': 'train'} | dataset |
+--------------+---------------+---------------------------------------------------------------------------------+---------+
```

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-13 15:12:34 -08:00
Xi Yan
94a6f57812
change schema -> dataset_schema for register_dataset api (#443)
# What does this PR do?

- API updates: change schema to dataset_schema for register_dataset for
resolving pydantic naming conflict
- Note: this OpenAPI update will be synced with
llama-stack-client-python SDK.

cc @dineshyv 

## Test Plan

```
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-13 11:17:46 -05:00
Xi Yan
d5b1202c83
change schema -> dataset_schema (#442)
# What does this PR do?

- `schema` should not a field w/ pydantic warnings
- change `schema` to `dataset_schema`

<img width="855" alt="image"
src="https://github.com/user-attachments/assets/47cb6bb9-4be0-46a5-8701-24d24e2eaabd">


## Test Plan

```
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
```


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-13 10:58:12 -05:00
Ashwin Bharambe
12947ac19e
Kill "remote" providers and fix testing with a remote stack properly (#435)
# What does this PR do?

This PR kills the notion of "pure passthrough" remote providers. You
cannot specify a single provider you must specify a whole distribution
(stack) as remote.

This PR also significantly fixes / upgrades testing infrastructure so
you can now test against a remotely hosted stack server by just doing

```bash
pytest -s -v -m remote  test_agents.py \
  --inference-model=Llama3.1-8B-Instruct --safety-shield=Llama-Guard-3-1B \
  --env REMOTE_STACK_URL=http://localhost:5001
```

Also fixed `test_agents_persistence.py` (which was broken) and killed
some deprecated testing functions.

## Test Plan

All the tests.
2024-11-12 21:51:29 -08:00
Dinesh Yeduguru
fdff24e77a
Inference to use provider resource id to register and validate (#428)
This PR changes the way model id gets translated to the final model name
that gets passed through the provider.
Major changes include:
1) Providers are responsible for registering an object and as part of
the registration returning the object with the correct provider specific
name of the model provider_resource_id
2) To help with the common look ups different names a new ModelLookup
class is created.



Tested all inference providers including together, fireworks, vllm,
ollama, meta reference and bedrock
2024-11-12 20:02:00 -08:00
Ashwin Bharambe
983d6ce2df
Remove the "ShieldType" concept (#430)
# What does this PR do?

This PR kills the notion of "ShieldType". The impetus for this is the
realization:

> Why is keyword llama-guard appearing so many times everywhere,
sometimes with hyphens, sometimes with underscores?

Now that we have a notion of "provider specific resource identifiers"
and "user specific aliases" for those and the fact that this works with
models ("Llama3.1-8B-Instruct" <> "fireworks/llama-3pv1-..."), we can
follow the same rules for Shields.

So each Safety provider can make up a notion of identifiers it has
registered. This already happens with Bedrock correctly. We just
generalize it for Llama Guard, Prompt Guard, etc.

For Llama Guard, we further simplify by just adopting the underlying
model name itself as the identifier! No confusion necessary.

While doing this, I noticed a bug in our DistributionRegistry where we
weren't scoping identifiers by type. Fixed.

## Feature/Issue validation/testing/test plan

Ran (inference, safety, memory, agents) tests with ollama and fireworks
providers.
2024-11-12 12:37:24 -08:00
Ashwin Bharambe
09269e2a44
Enable sane naming of registered objects with defaults (#429)
# What does this PR do? 

This is a follow-up to #425. That PR allows for specifying models in the
registry, but each entry needs to look like:

```yaml
- identifier: ...
  provider_id: ...
  provider_resource_identifier: ...
```

This is headache-inducing.

The current PR makes this situation better by adopting the shape of our
APIs. Namely, we need the user to only specify `model-id`. The rest
should be optional and figured out by the Stack. You can always override
it.

Here's what example `ollama` "full stack" registry looks like (we still
need to kill or simplify shield_type crap):
```yaml
models:
- model_id: Llama3.2-3B-Instruct
- model_id: Llama-Guard-3-1B
shields:
- shield_id: llama_guard
  shield_type: llama_guard
```

## Test Plan

See test plan for #425. Re-ran it.
2024-11-12 11:18:05 -08:00
Dinesh Yeduguru
0a3b3d5fb6
migrate scoring fns to resource (#422)
* fix after rebase

* remove print

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-11 17:28:48 -08:00
Dinesh Yeduguru
3802edfc50
migrate evals to resource (#421)
* migrate evals to resource

* remove listing of providers's evals

* change the order of params in register

* fix after rebase

* linter fix

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-11 17:24:03 -08:00
Dinesh Yeduguru
b95cb5308f
migrate dataset to resource (#420)
* migrate dataset to resource

* remove auto discovery

* remove listing of providers's datasets

* fix after rebase

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-11 17:14:41 -08:00
Dinesh Yeduguru
38cce97597
migrate memory banks to Resource and new registration (#411)
* migrate memory banks to Resource and new registration

* address feedback

* address feedback

* fix tests

* pgvector fix

* pgvector fix v2

* remove auto discovery

* change register signature to make params required

* update client

* client fix

* use annotated union to parse

* remove base MemoryBank inheritence

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-11 17:10:44 -08:00
Dinesh Yeduguru
ec644d3418
migrate model to Resource and new registration signature (#410)
* resource oriented object design for models

* add back llama_model field

* working tests

* register singature fix

* address feedback

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-08 16:12:57 -08:00
Dinesh Yeduguru
d800a16acd
Resource oriented design for shields (#399)
* init

* working bedrock tests

* bedrock test for inference fixes

* use env vars for bedrock guardrail vars

* add register in meta reference

* use correct shield impl in meta ref

* dont add together fixture

* right naming

* minor updates

* improved registration flow

* address feedback

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-08 12:16:11 -08:00
Xi Yan
6192bf43a4
[Evals API][10/n] API updates for EvalTaskDef + new test migration (#379)
* wip

* scoring fn api

* eval api

* eval task

* evaluate api update

* pre commit

* unwrap context -> config

* config field doc

* typo

* naming fix

* separate benchmark / app eval

* api name

* rename

* wip tests

* wip

* datasetio test

* delete unused

* fixture

* scoring resolve

* fix scoring register

* scoring test pass

* score batch

* scoring fix

* fix eval

* test eval works

* remove type ignore

* api refactor

* add default task_eval_id for routing

* add eval_id for jobs

* remove type ignore

* only keep 1 run_eval

* fix optional

* register task required

* register task required

* delete old tests

* delete old tests

* fixture return impl
2024-11-07 21:24:12 -08:00
Dinesh Yeduguru
093c9f1987
add bedrock distribution code (#358)
* add bedrock distribution code

* fix linter error

* add bedrock shields support

* linter fixes

* working bedrock safety

* change to return only one violation

* remove env var reading

* refereshable boto credentials

* remove env vars

* address raghu's feedback

* fix session_ttl passing

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-06 14:39:11 -08:00