Commit graph

7 commits

Author SHA1 Message Date
Dinesh Yeduguru
12c994b5b2
REST API fixes (#789)
# What does this PR do?

Client SDK fixes

## Test Plan


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/safety/test_safety.py


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/memory/test_memory.py
2025-01-16 13:47:08 -08:00
Dinesh Yeduguru
678ab29129
Idiomatic REST API: Inspect (#779)
# What does this PR do?

Since provider list returns a map grouping providers by API, we should
not be using data. This PR fixes the types to just be the plain dict,
basically reverting back to previous behavior



## Test Plan

llama-stack on  fix-provider-list [$] 🅒 stack❯
LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml"
pytest -v tests/client-sdk/safety/test_safety.py
2025-01-16 10:39:42 -08:00
Dinesh Yeduguru
7fb2c1c48d
More idiomatic REST API (#765)
# What does this PR do?

This PR changes our API to follow more idiomatic REST API approaches of
having paths being resources and methods indicating the action being
performed.

Changes made to generator:
1) removed the prefix check of "get" as its not required and is actually
needed for other method types too
2) removed _ check on path since variables can have "_"



## Test Plan

LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v
tests/client-sdk/agents/test_agents.py
2025-01-15 13:20:09 -08:00
Xi Yan
596afc6497
add --version to llama stack CLI & /version endpoint (#732)
# What does this PR do?

- add --version to llama stack CLI 
- add /version endpoint
- run OpenAPI generator for the new endpoint

## Test Plan

**CLI**
<img width="184" alt="image"
src="https://github.com/user-attachments/assets/3acb1d22-453e-4b79-baf6-e98e88d0671c"
/>



**endpoint**
<img width="430" alt="image"
src="https://github.com/user-attachments/assets/79cdd670-493b-40cf-8f9e-28a4ac0988ac"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-08 16:30:06 -08:00
Xi Yan
78e2bfbe7a
[tests] add client-sdk pytests & delete client.py (#638)
# What does this PR do?

**Why**
- Clean up examples which we will not maintain; reduce the surface area
to the minimal showcases

**What**
- Delete `client.py` in /apis/*
- Move all scripts to unit tests
  - SDK sync in the future will just require running pytests

**Side notes**
- `bwrap` not available on Mac so code_interpreter will not work

## Test Plan

```
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk
```
<img width="725" alt="image"
src="https://github.com/user-attachments/assets/36bfe537-628d-43c3-8479-dcfcfe2e4035"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-16 12:04:56 -08:00
Ashwin Bharambe
6bb57e72a7
Remove "routing_table" and "routing_key" concepts for the user (#201)
This PR makes several core changes to the developer experience surrounding Llama Stack.

Background: PR #92 introduced the notion of "routing" to the Llama Stack. It introduces three object types: (1) models, (2) shields and (3) memory banks. Each of these objects can be associated with a distinct provider. So you can get model A to be inferenced locally while model B, C can be inference remotely (e.g.)

However, this had a few drawbacks:

you could not address the provider instances -- i.e., if you configured "meta-reference" with a given model, you could not assign an identifier to this instance which you could re-use later.
the above meant that you could not register a "routing_key" (e.g. model) dynamically and say "please use this existing provider I have already configured" for a new model.
the terms "routing_table" and "routing_key" were exposed directly to the user. in my view, this is way too much overhead for a new user (which almost everyone is.) people come to the stack wanting to do ML and encounter a completely unexpected term.
What this PR does: This PR structures the run config with only a single prominent key:

- providers
Providers are instances of configured provider types. Here's an example which shows two instances of the remote::tgi provider which are serving two different models.

providers:
  inference:
  - provider_id: foo
    provider_type: remote::tgi
    config: { ... }
  - provider_id: bar
    provider_type: remote::tgi
    config: { ... }
Secondly, the PR adds dynamic registration of { models | shields | memory_banks } to the API surface. The distribution still acts like a "routing table" (as previously) except that it asks the backing providers for a listing of these objects. For example it asks a TGI or Ollama inference adapter what models it is serving. Only the models that are being actually served can be requested by the user for inference. Otherwise, the Stack server will throw an error.

When dynamically registering these objects, you can use the provider IDs shown above. Info about providers can be obtained using the Api.inspect set of endpoints (/providers, /routes, etc.)

The above examples shows the correspondence between inference providers and models registry items. Things work similarly for the safety <=> shields and memory <=> memory_banks pairs.

Registry: This PR also makes it so that Providers need to implement additional methods for registering and listing objects. For example, each Inference provider is now expected to implement the ModelsProtocolPrivate protocol (naming is not great!) which consists of two methods

register_model
list_models
The goal is to inform the provider that a certain model needs to be supported so the provider can make any relevant backend changes if needed (or throw an error if the model cannot be supported.)

There are many other cleanups included some of which are detailed in a follow-up comment.
2024-10-10 10:24:13 -07:00
Ashwin Bharambe
8d049000e3 Add an introspection "Api.inspect" API 2024-10-02 15:41:14 -07:00