Commit graph

10 commits

Author SHA1 Message Date
Ashwin Bharambe
4d0bfbf984
feat: add api.llama provider, llama-guard-4 model (#2058)
This PR adds a llama-stack inference provider for `api.llama.com`, as
well as adds entries for Llama-Guard-4 and updated Prompt-Guard models.
2025-04-29 10:07:41 -07:00
Reid
187524d4ae
feat: add substring search for model list (#1099)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

`llama model list` or `llama model list --show-all` will list more or
all for the models, so add the `search` option to simplify the output.
```
$ llama model list --help
usage: llama model list [-h] [--show-all] [-s SEARCH]

Show available llama models

options:
  -h, --help            show this help message and exit
  --show-all            Show all models (not just defaults)
  -s SEARCH, --search SEARCH
                        Search for the input string as a substring in the model descriptor(ID)

$ llama model list -s 70b
+-----------------------+-----------------------------------+----------------+
| Model Descriptor(ID)  | Hugging Face Repo                 | Context Length |
+-----------------------+-----------------------------------+----------------+
| Llama3.1-70B          | meta-llama/Llama-3.1-70B          | 128K           |
+-----------------------+-----------------------------------+----------------+
| Llama3.1-70B-Instruct | meta-llama/Llama-3.1-70B-Instruct | 128K           |
+-----------------------+-----------------------------------+----------------+
| Llama3.3-70B-Instruct | meta-llama/Llama-3.3-70B-Instruct | 128K           |
+-----------------------+-----------------------------------+----------------+

$ llama model list -s 3.1-8b
+----------------------+----------------------------------+----------------+
| Model Descriptor(ID) | Hugging Face Repo                | Context Length |
+----------------------+----------------------------------+----------------+
| Llama3.1-8B          | meta-llama/Llama-3.1-8B          | 128K           |
+----------------------+----------------------------------+----------------+
| Llama3.1-8B-Instruct | meta-llama/Llama-3.1-8B-Instruct | 128K           |
+----------------------+----------------------------------+----------------+

$ llama model list --show-all -s pro
+----------------------+-----------------------------+----------------+
| Model Descriptor(ID) | Hugging Face Repo           | Context Length |
+----------------------+-----------------------------+----------------+
| Prompt-Guard-86M     | meta-llama/Prompt-Guard-86M | 2K             |
+----------------------+-----------------------------+----------------+

$ llama model list -s k
Not found for search.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 16:38:10 -08:00
Reid
9898589f12
fix: convert back to model descriptor for model in list --downloaded (#1201)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Currently , `model` in `--downloaded` just use the directory(already
replace `:`), so covert back to descriptor keep the same with ` llama
model list`, and remove command also use `descriptor`.
```
before:
$ llama model list --downloaded
+-------------------------------------+----------+---------------------+
| Model                               | Size     | Modified Time       |
+-------------------------------------+----------+---------------------+
| Llama3.2-1B-Instruct-int4-qlora-eo8 | 1.53 GB  | 2025-02-20 16:32:49 |
+-------------------------------------+----------+---------------------+

after:
$ llama model list --downloaded
+-------------------------------------+----------+---------------------+
| Model                               | Size     | Modified Time       |
+-------------------------------------+----------+---------------------+
| Llama3.2-1B-Instruct:int4-qlora-eo8 | 1.53 GB  | 2025-02-20 16:32:49 |
+-------------------------------------+----------+---------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-21 08:10:34 -08:00
Reid
af377e844d
feat: add a option to list the downloaded models (#1127)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
$ llama model list --help
usage: llama model list [-h] [--show-all] [--downloaded]

Show available llama models

options:
  -h, --help            show this help message and exit
  --show-all            Show all models (not just defaults)
  --downloaded          List the downloaded models

$ llama model list --downloaded
+-------------+----------+---------------------+
| Model       | Size     | Modified Time       |
+-------------+----------+---------------------+
| Llama3.2-1B | 2.31 GB  | 2025-02-16 13:38:04 |
+-------------+----------+---------------------+
| Llama3.1-8B | 14.97 GB | 2025-02-16 10:36:37 |
+-------------+----------+---------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-19 22:17:39 -08:00
Reid
4e76d312fa
fix: modify the model id title for model list (#1095)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Re-check and based on the doc, the download model id, actually is model
descriptor(also without `meta-llama/`).


https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html
```
$ llama download --source huggingface --model-id  Llama-Guard-3-1B:int4 --hf-token xxx  # model descriptor
Fetching 8 files:   0%|                                                                                                                   | 0/8 [00:00<?, ?it/s]
LICENSE.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.71k/7.71k [00:00<00:00, 10.5MB/s]

$ llama download --source huggingface --model-id  Llama-Guard-3-1B-INT4 --hf-token xxxx  # hugging face repo without meta-llama/
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
                      [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<---


$ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
                      [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found

$ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8
Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...): ^CTraceback (most recent call last):

$ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL]
                      [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:26:41 -08:00
Ashwin Bharambe
314ee09ae3
chore: move all Llama Stack types from llama-models to llama-stack (#1098)
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.

This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279

## Test Plan

Ensure all `llama` CLI `model` sub-commands work:

```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```

Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```

Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs

Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.

```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
2025-02-14 09:10:59 -08:00
Reid
47fccf0d03
style: update model id in model list title (#1072)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Since the subcommands used `MODEL_ID`, it would be better to use it in
`model list` and make it easy to find it.
```
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID <<

$ llama model describe --help
usage: llama model describe [-h] -m MODEL_ID  <<

$ llama download --help
--model-id MODEL_ID   See `llama model list` or `llama model list --show-all` for the list of available models


before:
$ llama model list
+-----------------------------------------+-----------------------------------------------------+----------------+
| Model Descriptor                        | Hugging Face Repo                                   | Context Length |
+-----------------------------------------+-----------------------------------------------------+----------------+

after:
$ llama model list
+-----------------------------------------+-----------------------------------------------------+----------------+
| Model Descriptor                        | Model ID                                            | Context Length |
+-----------------------------------------+-----------------------------------------------------+----------------+
| Llama3.1-8B                             | meta-llama/Llama-3.1-8B                             | 128K           |
+-----------------------------------------+-----------------------------------------------------+----------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-13 08:33:11 -08:00
Ashwin Bharambe
cc5029a716 Add special case for prompt guard 2024-10-02 08:43:12 -07:00
Mark Sze
3c99f08267
minor typo and HuggingFace -> Hugging Face (#113) 2024-09-26 09:48:23 -07:00
Ashwin Bharambe
9487ad8294
API Updates (#73)
* API Keys passed from Client instead of distro configuration

* delete distribution registry

* Rename the "package" word away

* Introduce a "Router" layer for providers

Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:

- The inference API should be a routing layer over inference providers,
  routed using the "model" key
- The memory banks API is another instance where various memory bank
  types will be provided by independent providers (e.g., a vector store
  is served by Chroma while a keyvalue memory can be served by Redis or
  PGVector)

This commit introduces a generalized routing layer for this purpose.

* update `apis_to_serve`

* llama_toolchain -> llama_stack

* Codemod from llama_toolchain -> llama_stack

- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent

* Moved some stuff out of common/; re-generated OpenAPI spec

* llama-toolchain -> llama-stack (hyphens)

* add control plane API

* add redis adapter + sqlite provider

* move core -> distribution

* Some more toolchain -> stack changes

* small naming shenanigans

* Removing custom tool and agent utilities and moving them client side

* Move control plane to distribution server for now

* Remove control plane from API list

* no codeshield dependency randomly plzzzzz

* Add "fire" as a dependency

* add back event loggers

* stack configure fixes

* use brave instead of bing in the example client

* add init file so it gets packaged

* add init files so it gets packaged

* Update MANIFEST

* bug fix

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00
Renamed from llama_toolchain/cli/model/list.py (Browse further)