Commit graph

16 commits

Author SHA1 Message Date
Ashwin Bharambe
2e5bfcd42a
Update Telemetry API so OpenAPI generation can work (#640)
We cannot use recursive types because not only does our OpenAPI
generator not like them, even if it did, it is not easy for all client
languages to automatically construct proper APIs (especially considering
garbage collection) around them. For now, we can return a `Dict[str,
SpanWithStatus]` instead of `SpanWithChildren` and rely on the client to
reconstruct the tree.

Also fixed a super subtle issue with the OpenAPI generation process
(monkey-patching of json_schema_type wasn't working because of import
reordering.)
2024-12-16 13:00:14 -08:00
Xi Yan
d97cfaa9d9
[docs] add openapi spec to docs (#508)
# What does this PR do?
- modify openapi generator to add coming soon tag for unimplemented api
- sphinx-redocs extension for openapi spec to readthedocs page

## Test Plan



https://github.com/user-attachments/assets/b4c7eebc-2361-4198-a987-dbfbcff914cf






## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-22 17:54:32 -08:00
Ashwin Bharambe
8ed79ad0f3 Fix the pyopenapi generator avoid potential circular imports 2024-11-18 23:37:52 -08:00
Ashwin Bharambe
0dc7f5fa89
Add version to REST API url (#478)
# What does this PR do? 

Adds a `/alpha/` prefix to all the REST API urls.

Also makes them all use hyphens instead of underscores as is more
standard practice.

(This is based on feedback from our partners.)

## Test Plan 

The Stack itself does not need updating. However, client SDKs and
documentation will need to be updated.
2024-11-18 22:44:14 -08:00
Ashwin Bharambe
bba6edd06b Fix OpenAPI generation to have text/event-stream for streamable methods 2024-11-14 12:51:38 -08:00
Ashwin Bharambe
d9d271a684
Allow specifying resources in StackRunConfig (#425)
# What does this PR do? 

This PR brings back the facility to not force registration of resources
onto the user. This is not just annoying but actually not feasible
sometimes. For example, you may have a Stack which boots up with private
providers for inference for models A and B. There is no way for the user
to actually know which model is being served by these providers now (to
be able to register it.)

How will this avoid the users needing to do registration? In a follow-up
diff, I will make sure I update the sample run.yaml files so they list
the models served by the distributions explicitly. So when users do
`llama stack build --template <...>` and run it, their distributions
come up with the right set of models they expect.

For self-hosted distributions, it also allows us to have a place to
explicit list the models that need to be served to make the "complete"
stack (including safety, e.g.)

## Test Plan

Started ollama locally with two lightweight models: Llama3.2-3B-Instruct
and Llama-Guard-3-1B.

Updated all the tests including agents. Here's the tests I ran so far:

```bash
pytest -s -v -m "fireworks and llama_3b" test_text_inference.py::TestInference \
  --env FIREWORKS_API_KEY=...

pytest -s -v -m "ollama and llama_3b" test_text_inference.py::TestInference 

pytest -s -v -m ollama test_safety.py

pytest -s -v -m faiss test_memory.py

pytest -s -v -m ollama  test_agents.py \
  --inference-model=Llama3.2-3B-Instruct --safety-model=Llama-Guard-3-1B
```

Found a few bugs here and there pre-existing that these test runs fixed.
2024-11-12 10:58:49 -08:00
Ashwin Bharambe
47e7c2dc15 Fix openapi generator and regenerator OpenAPI types 2024-11-11 18:44:38 -08:00
Xi Yan
2b7d70ba86
[Evals API][11/n] huggingface dataset provider + mmlu scoring fn (#392)
* wip

* scoring fn api

* eval api

* eval task

* evaluate api update

* pre commit

* unwrap context -> config

* config field doc

* typo

* naming fix

* separate benchmark / app eval

* api name

* rename

* wip tests

* wip

* datasetio test

* delete unused

* fixture

* scoring resolve

* fix scoring register

* scoring test pass

* score batch

* scoring fix

* fix eval

* test eval works

* huggingface provider

* datasetdef files

* mmlu scoring fn

* test wip

* remove type ignore

* api refactor

* add default task_eval_id for routing

* add eval_id for jobs

* remove type ignore

* huggingface provider

* wip huggingface register

* only keep 1 run_eval

* fix optional

* register task required

* register task required

* delete old tests

* fix

* mmlu loose

* refactor

* msg

* fix tests

* move benchmark task def to file

* msg

* gen openapi

* openapi gen

* move dataset to hf llamastack repo

* remove todo

* refactor

* add register model to unit test

* rename

* register to client

* delete preregistered dataset/eval task

* comments

* huggingface -> remote adapter

* openapi gen
2024-11-11 14:49:50 -05:00
Ashwin Bharambe
37b330b4ef
add dynamic clients for all APIs (#348)
* add dynamic clients for all APIs

* fix openapi generator

* inference + memory + agents tests now pass with "remote" providers

* Add docstring which fixes openapi generator :/
2024-10-31 14:46:25 -07:00
Xi Yan
abdf7cddf3
[Evals API][4/n] evals with generation meta-reference impl (#303)
* wip

* dataset validation

* test_scoring

* cleanup

* clean up test

* comments

* error checking

* dataset client

* test client:

* datasetio client

* clean up

* basic scoring function works

* scorer wip

* equality scorer

* score batch impl

* score batch

* update scoring test

* refactor

* validate scorer input

* address comments

* evals with generation

* add all rows scores to ScoringResult

* minor typing

* bugfix

* scoring function def rename

* rebase name

* refactor

* address comments

* Update iOS inference instructions for new quantization

* Small updates to quantization config

* Fix score threshold in faiss

* Bump version to 0.0.45

* Handle both ipv6 and ipv4 interfaces together

* update manifest for build templates

* Update getting_started.md

* chatcompletion & completion input type validation

* inclusion->subsetof

* error checking

* scoring_function -> scoring_fn rename, scorer -> scoring_fn rename

* address comments

* [Evals API][5/n] fixes to generate openapi spec (#323)

* generate openapi

* typing comment, dataset -> dataset_id

* remove custom type

* sample eval run.yaml

---------

Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-10-25 13:12:39 -07:00
Dalton Flanagan
8c3010553f
Fix agents path in generate.py 2024-10-10 11:41:03 -04:00
Ashwin Bharambe
8d049000e3 Add an introspection "Api.inspect" API 2024-10-02 15:41:14 -07:00
Ashwin Bharambe
ec4fc800cc
[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92)
This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack.

Most important bits:

* Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis.

* We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are:
  - routing model A to ollama and model B to a remote provider like Together
  - routing shield A to local impl while shield B to a remote provider like Bedrock
  - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis

* Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.
2024-09-23 14:22:22 -07:00
Ashwin Bharambe
942cb87a3c remove apis/stack.py 2024-09-20 09:37:08 -07:00
Xi Yan
f3f5873e9e regenerate openapi spec 2024-09-18 19:28:05 -07:00
Xi Yan
2c1ad10710 move openapi from rfcs->docs 2024-09-18 16:09:17 -07:00