Commit graph

150 commits

Author SHA1 Message Date
Xi Yan
d97cfaa9d9
[docs] add openapi spec to docs (#508)
# What does this PR do?
- modify openapi generator to add coming soon tag for unimplemented api
- sphinx-redocs extension for openapi spec to readthedocs page

## Test Plan



https://github.com/user-attachments/assets/b4c7eebc-2361-4198-a987-dbfbcff914cf






## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-22 17:54:32 -08:00
Ashwin Bharambe
1b2b32f959 Minor updates to docs 2024-11-22 17:44:05 -08:00
Ashwin Bharambe
6229562760 Organize references 2024-11-22 16:46:45 -08:00
Ashwin Bharambe
6fbf526d5c Move gitignore from docs/ to the main gitignore 2024-11-22 15:55:34 -08:00
Ashwin Bharambe
526a8dcfe0 Minor edit to zero_to_hero_guide 2024-11-22 15:52:56 -08:00
Ashwin Bharambe
5acb15d2bf Make quickstart.md -> README.md so it shows up as default 2024-11-22 15:50:25 -08:00
Justin Lee
9928405e2c
Docs improvement v3 (#433)
# What does this PR do?

- updated the notebooks to reflect past changes up to llama-stack 0.0.53
- updated readme to  provide accurate and up-to-date info
- improve the current zero to hero by integrating an example using
together api


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Co-authored-by: Sanyam Bhutani <sanyambhutani@meta.com>
2024-11-22 15:43:31 -08:00
Ashwin Bharambe
97dc5b68e5 model -> model_id for TGI 2024-11-22 15:40:08 -08:00
Ashwin Bharambe
c2c53d0272 More doc cleanup 2024-11-22 14:37:22 -08:00
Ashwin Bharambe
900b0556e7 Much more documentation work, things are getting a bit consumable right now 2024-11-22 14:06:18 -08:00
Ashwin Bharambe
98e213e96c More docs work 2024-11-22 14:06:18 -08:00
Ashwin Bharambe
eb2063bc3d Updates to the main doc page 2024-11-22 14:06:18 -08:00
Ashwin Bharambe
a0a00f1345 Update telemetry to have TEXT be the default log format 2024-11-21 15:18:45 -08:00
Ashwin Bharambe
55c55b9f51 Update Quick Start significantly 2024-11-21 13:20:55 -08:00
Ashwin Bharambe
cf079a22a0 Plurals 2024-11-20 23:24:59 -08:00
Ashwin Bharambe
cd6ccb664c Integrate distro docs into the restructured docs 2024-11-20 23:20:05 -08:00
Ashwin Bharambe
2411a44833 Update more distribution docs to be simpler and partially codegen'ed 2024-11-20 22:03:44 -08:00
Dinesh Yeduguru
b3f9e8b2f2
Restructure docs (#494)
Rendered docs at: https://llama-stack.readthedocs.io/en/doc-simplify/
2024-11-20 15:54:47 -08:00
Justin Lee
ae49a4cb97
Reorganizing Zero to Hero Folder structure (#447)
Putting Zero to Hero Guide to root for increased visibility
2024-11-20 10:27:29 -08:00
Xi Yan
b0fdf7552a docs 2024-11-19 16:41:45 -08:00
Xi Yan
c49acc5226 docs 2024-11-19 16:39:40 -08:00
Xi Yan
f78200b189 docs 2024-11-19 16:37:30 -08:00
Xi Yan
2da93c8835 fix 3.2-1b fireworks 2024-11-19 14:20:07 -08:00
Xi Yan
189df6358a codegen docs 2024-11-19 14:16:00 -08:00
Henry Tai
39e99b39fe
update quick start to have the working instruction (#467)
# What does this PR do?

Fix the instruction in quickstart readme so the new developers/users can
run it without issues.

## Test Plan
None

## Sources

Please link relevant resources if necessary.


## Before submitting

- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [X] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Co-authored-by: Henry Tai <henrytai@fb.com>
2024-11-19 10:32:19 -08:00
Xi Yan
1b0f5fff5a fix curl endpoint 2024-11-19 10:26:05 -08:00
Ashwin Bharambe
5e4ac1b7c1 Make sure server code uses version prefixed routes 2024-11-19 09:15:05 -08:00
Ashwin Bharambe
e8d3eee095 Fix docs yet again 2024-11-18 23:51:35 -08:00
Ashwin Bharambe
8ed79ad0f3 Fix the pyopenapi generator avoid potential circular imports 2024-11-18 23:37:52 -08:00
Ashwin Bharambe
d463d68e1e Update docs 2024-11-18 23:21:25 -08:00
Ashwin Bharambe
0dc7f5fa89
Add version to REST API url (#478)
# What does this PR do? 

Adds a `/alpha/` prefix to all the REST API urls.

Also makes them all use hyphens instead of underscores as is more
standard practice.

(This is based on feedback from our partners.)

## Test Plan 

The Stack itself does not need updating. However, client SDKs and
documentation will need to be updated.
2024-11-18 22:44:14 -08:00
Ashwin Bharambe
7693786322 Use HF names for registering fireworks and together models 2024-11-18 22:34:47 -08:00
Riandy
2108a779f2
Update kotlin client docs (#476)
# What does this PR do?

In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.

Add Kotlin package link into readme docs
2024-11-19 08:43:20 +05:30
Ashwin Bharambe
939056e265 More documentation fixes 2024-11-18 17:06:13 -08:00
Ashwin Bharambe
e40404625b Update to docs 2024-11-18 16:52:48 -08:00
Ashwin Bharambe
afa4f0b19f Update remote vllm docs 2024-11-18 16:34:33 -08:00
Ashwin Bharambe
47c37fd831 Fixes 2024-11-18 16:03:53 -08:00
Ashwin Bharambe
3aedde2ab4 Add a pre-commit for distro_codegen but it does not work yet 2024-11-18 15:21:13 -08:00
Ashwin Bharambe
2a31163178
Auto-generate distro yamls + docs (#468)
# What does this PR do?

Automatically generates
- build.yaml
- run.yaml
- run-with-safety.yaml
- parts of markdown docs

for the distributions.

## Test Plan

At this point, this only updates the YAMLs and the docs. Some testing
(especially with ollama and vllm) has been performed but needs to be
much more tested.
2024-11-18 14:57:06 -08:00
Dinesh Yeduguru
0850ad656a
unregister for memory banks and remove update API (#458)
The semantics of an Update on resources is very tricky to reason about
especially for memory banks and models. The best way to go forward here
is for the user to unregister and register a new resource. We don't have
a compelling reason to support update APIs.


Tests:
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m
"chroma" --env CHROMA_HOST=localhost --env CHROMA_PORT=8000

pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m
"pgvector" --env PGVECTOR_DB=postgres --env PGVECTOR_USER=postgres --env
PGVECTOR_PASSWORD=mysecretpassword --env PGVECTOR_HOST=0.0.0.0

$CONDA_PREFIX/bin/pytest -v -s -m "ollama"
llama_stack/providers/tests/inference/test_model_registration.py

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-14 17:12:11 -08:00
Ashwin Bharambe
bba6edd06b Fix OpenAPI generation to have text/event-stream for streamable methods 2024-11-14 12:51:38 -08:00
Dinesh Yeduguru
efe791bab7
Support model resource updates and deletes (#452)
# What does this PR do?
* Changes the registry to store only one RoutableObject per identifier.
Before it was a list, which is not really required.
* Adds impl for updates and deletes
* Updates routing table to handle updates correctly



## Test Plan
```
❯ llama-stack-client models list
+------------------------+---------------+------------------------------------+------------+
| identifier             | provider_id   | provider_resource_id               | metadata   |
+========================+===============+====================================+============+
| Llama3.1-405B-Instruct | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.1-8B-Instruct   | fireworks-0   | fireworks/llama-v3p1-8b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.2-3B-Instruct   | fireworks-0   | fireworks/llama-v3p2-1b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
❯ llama-stack-client models register dineshyv-model --provider-model-id=fireworks/llama-v3p1-70b-instruct
Successfully registered model dineshyv-model
❯ llama-stack-client models list
+------------------------+---------------+------------------------------------+------------+
| identifier             | provider_id   | provider_resource_id               | metadata   |
+========================+===============+====================================+============+
| Llama3.1-405B-Instruct | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.1-8B-Instruct   | fireworks-0   | fireworks/llama-v3p1-8b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.2-3B-Instruct   | fireworks-0   | fireworks/llama-v3p2-1b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| dineshyv-model         | fireworks-0   | fireworks/llama-v3p1-70b-instruct  | {}         |
+------------------------+---------------+------------------------------------+------------+
❯ llama-stack-client models update dineshyv-model --provider-model-id=fireworks/llama-v3p1-405b-instruct
Successfully updated model dineshyv-model
❯ llama-stack-client models list
+------------------------+---------------+------------------------------------+------------+
| identifier             | provider_id   | provider_resource_id               | metadata   |
+========================+===============+====================================+============+
| Llama3.1-405B-Instruct | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.1-8B-Instruct   | fireworks-0   | fireworks/llama-v3p1-8b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.2-3B-Instruct   | fireworks-0   | fireworks/llama-v3p2-1b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| dineshyv-model         | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
llama-stack-client models delete dineshyv-model
❯ llama-stack-client models list
+------------------------+---------------+------------------------------------+------------+
| identifier             | provider_id   | provider_resource_id               | metadata   |
+========================+===============+====================================+============+
| Llama3.1-405B-Instruct | fireworks-0   | fireworks/llama-v3p1-405b-instruct | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.1-8B-Instruct   | fireworks-0   | fireworks/llama-v3p1-8b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+
| Llama3.2-3B-Instruct   | fireworks-0   | fireworks/llama-v3p2-1b-instruct   | {}         |
+------------------------+---------------+------------------------------------+------------+

```

---------

Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
2024-11-13 21:55:41 -08:00
Jeff Tang
15dee2b8b8
Added link to the Colab notebook of the Llama Stack lesson on the Llama 3.2 course on DLAI (#445)
# What does this PR do?

It shows a complete zero-setup Colab using the Llama Stack server
implemented and powered by together.ai: using Llama Stack Client API to
run inference, agent and 3.2 models. Good for a quick start guide.



- [ ] Addresses issue (#issue)


## Test Plan

Please describe:
 - tests you ran to verify your changes with result summaries.
 - provide instructions so it can be reproduced.


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-13 13:59:41 -08:00
Xi Yan
94a6f57812
change schema -> dataset_schema for register_dataset api (#443)
# What does this PR do?

- API updates: change schema to dataset_schema for register_dataset for
resolving pydantic naming conflict
- Note: this OpenAPI update will be synced with
llama-stack-client-python SDK.

cc @dineshyv 

## Test Plan

```
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-13 11:17:46 -05:00
Xi Yan
59a65e34d3
Update new_api_provider.md 2024-11-13 00:02:13 -05:00
Dinesh Yeduguru
fdff24e77a
Inference to use provider resource id to register and validate (#428)
This PR changes the way model id gets translated to the final model name
that gets passed through the provider.
Major changes include:
1) Providers are responsible for registering an object and as part of
the registration returning the object with the correct provider specific
name of the model provider_resource_id
2) To help with the common look ups different names a new ModelLookup
class is created.



Tested all inference providers including together, fireworks, vllm,
ollama, meta reference and bedrock
2024-11-12 20:02:00 -08:00
Ashwin Bharambe
983d6ce2df
Remove the "ShieldType" concept (#430)
# What does this PR do?

This PR kills the notion of "ShieldType". The impetus for this is the
realization:

> Why is keyword llama-guard appearing so many times everywhere,
sometimes with hyphens, sometimes with underscores?

Now that we have a notion of "provider specific resource identifiers"
and "user specific aliases" for those and the fact that this works with
models ("Llama3.1-8B-Instruct" <> "fireworks/llama-3pv1-..."), we can
follow the same rules for Shields.

So each Safety provider can make up a notion of identifiers it has
registered. This already happens with Bedrock correctly. We just
generalize it for Llama Guard, Prompt Guard, etc.

For Llama Guard, we further simplify by just adopting the underlying
model name itself as the identifier! No confusion necessary.

While doing this, I noticed a bug in our DistributionRegistry where we
weren't scoping identifiers by type. Fixed.

## Feature/Issue validation/testing/test plan

Ran (inference, safety, memory, agents) tests with ollama and fireworks
providers.
2024-11-12 12:37:24 -08:00
Ashwin Bharambe
09269e2a44
Enable sane naming of registered objects with defaults (#429)
# What does this PR do? 

This is a follow-up to #425. That PR allows for specifying models in the
registry, but each entry needs to look like:

```yaml
- identifier: ...
  provider_id: ...
  provider_resource_identifier: ...
```

This is headache-inducing.

The current PR makes this situation better by adopting the shape of our
APIs. Namely, we need the user to only specify `model-id`. The rest
should be optional and figured out by the Stack. You can always override
it.

Here's what example `ollama` "full stack" registry looks like (we still
need to kill or simplify shield_type crap):
```yaml
models:
- model_id: Llama3.2-3B-Instruct
- model_id: Llama-Guard-3-1B
shields:
- shield_id: llama_guard
  shield_type: llama_guard
```

## Test Plan

See test plan for #425. Re-ran it.
2024-11-12 11:18:05 -08:00
Ashwin Bharambe
d9d271a684
Allow specifying resources in StackRunConfig (#425)
# What does this PR do? 

This PR brings back the facility to not force registration of resources
onto the user. This is not just annoying but actually not feasible
sometimes. For example, you may have a Stack which boots up with private
providers for inference for models A and B. There is no way for the user
to actually know which model is being served by these providers now (to
be able to register it.)

How will this avoid the users needing to do registration? In a follow-up
diff, I will make sure I update the sample run.yaml files so they list
the models served by the distributions explicitly. So when users do
`llama stack build --template <...>` and run it, their distributions
come up with the right set of models they expect.

For self-hosted distributions, it also allows us to have a place to
explicit list the models that need to be served to make the "complete"
stack (including safety, e.g.)

## Test Plan

Started ollama locally with two lightweight models: Llama3.2-3B-Instruct
and Llama-Guard-3-1B.

Updated all the tests including agents. Here's the tests I ran so far:

```bash
pytest -s -v -m "fireworks and llama_3b" test_text_inference.py::TestInference \
  --env FIREWORKS_API_KEY=...

pytest -s -v -m "ollama and llama_3b" test_text_inference.py::TestInference 

pytest -s -v -m ollama test_safety.py

pytest -s -v -m faiss test_memory.py

pytest -s -v -m ollama  test_agents.py \
  --inference-model=Llama3.2-3B-Instruct --safety-model=Llama-Guard-3-1B
```

Found a few bugs here and there pre-existing that these test runs fixed.
2024-11-12 10:58:49 -08:00
Ashwin Bharambe
3d7561e55c
Rename all inline providers with an inline:: prefix (#423) 2024-11-11 22:19:16 -08:00