forked from phoenix-oss/llama-stack-mirror
# What does this PR do? Cleans up how we provide sampling params. Earlier, strategy was an enum and all params (top_p, temperature, top_k) across all strategies were grouped. We now have a strategy union object with each strategy (greedy, top_p, top_k) having its corresponding params. Earlier, ``` class SamplingParams: strategy: enum () top_p, temperature, top_k and other params ``` However, the `strategy` field was not being used in any providers making it confusing to know the exact sampling behavior purely based on the params since you could pass temperature, top_p, top_k and how the provider would interpret those would not be clear. Hence we introduced -- a union where the strategy and relevant params are all clubbed together to avoid this confusion. Have updated all providers, tests, notebooks, readme and otehr places where sampling params was being used to use the new format. ## Test Plan `pytest llama_stack/providers/tests/inference/groq/test_groq_utils.py` // inference on ollama, fireworks and together `with-proxy pytest -v -s -k "ollama" --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/inference/test_text_inference.py ` // agents on fireworks `pytest -v -s -k 'fireworks and create_agent' --inference-model="meta-llama/Llama-3.1-8B-Instruct" llama_stack/providers/tests/agents/test_agents.py --safety-shield="meta-llama/Llama-Guard-3-8B"` ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [X] Ran pre-commit to handle lint / formatting issues. - [X] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [X] Updated relevant documentation. - [X] Wrote necessary unit or integration tests. --------- Co-authored-by: Hardik Shah <hjshah@fb.com>
8.4 KiB
8.4 KiB
llama (client-side) CLI Reference
The llama-stack-client
CLI allows you to query information about the distribution.
Basic Commands
llama-stack-client
$ llama-stack-client -h
usage: llama-stack-client [-h] {models,memory_banks,shields} ...
Welcome to the LlamaStackClient CLI
options:
-h, --help show this help message and exit
subcommands:
{models,memory_banks,shields}
llama-stack-client configure
$ llama-stack-client configure
> Enter the host name of the Llama Stack distribution server: localhost
> Enter the port number of the Llama Stack distribution server: 5000
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:5000
llama-stack-client providers list
$ llama-stack-client providers list
+-----------+----------------+-----------------+
| API | Provider ID | Provider Type |
+===========+================+=================+
| scoring | meta0 | meta-reference |
+-----------+----------------+-----------------+
| datasetio | meta0 | meta-reference |
+-----------+----------------+-----------------+
| inference | tgi0 | remote::tgi |
+-----------+----------------+-----------------+
| memory | meta-reference | meta-reference |
+-----------+----------------+-----------------+
| agents | meta-reference | meta-reference |
+-----------+----------------+-----------------+
| telemetry | meta-reference | meta-reference |
+-----------+----------------+-----------------+
| safety | meta-reference | meta-reference |
+-----------+----------------+-----------------+
Model Management
llama-stack-client models list
$ llama-stack-client models list
+----------------------+----------------------+---------------+----------------------------------------------------------+
| identifier | llama_model | provider_id | metadata |
+======================+======================+===============+==========================================================+
| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | tgi0 | {'huggingface_repo': 'meta-llama/Llama-3.1-8B-Instruct'} |
+----------------------+----------------------+---------------+----------------------------------------------------------+
llama-stack-client models get
$ llama-stack-client models get Llama3.1-8B-Instruct
+----------------------+----------------------+----------------------------------------------------------+---------------+
| identifier | llama_model | metadata | provider_id |
+======================+======================+==========================================================+===============+
| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | {'huggingface_repo': 'meta-llama/Llama-3.1-8B-Instruct'} | tgi0 |
+----------------------+----------------------+----------------------------------------------------------+---------------+
$ llama-stack-client models get Random-Model
Model RandomModel is not found at distribution endpoint host:port. Please ensure endpoint is serving specified model.
llama-stack-client models register
$ llama-stack-client models register <model_id> [--provider-id <provider_id>] [--provider-model-id <provider_model_id>] [--metadata <metadata>]
llama-stack-client models update
$ llama-stack-client models update <model_id> [--provider-id <provider_id>] [--provider-model-id <provider_model_id>] [--metadata <metadata>]
llama-stack-client models delete
$ llama-stack-client models delete <model_id>
Memory Bank Management
llama-stack-client memory_banks list
$ llama-stack-client memory_banks list
+--------------+----------------+--------+-------------------+------------------------+--------------------------+
| identifier | provider_id | type | embedding_model | chunk_size_in_tokens | overlap_size_in_tokens |
+==============+================+========+===================+========================+==========================+
| test_bank | meta-reference | vector | all-MiniLM-L6-v2 | 512 | 64 |
+--------------+----------------+--------+-------------------+------------------------+--------------------------+
llama-stack-client memory_banks register
$ llama-stack-client memory_banks register <memory-bank-id> --type <type> [--provider-id <provider-id>] [--provider-memory-bank-id <provider-memory-bank-id>] [--chunk-size <chunk-size>] [--embedding-model <embedding-model>] [--overlap-size <overlap-size>]
Options:
--type
: Required. Type of memory bank. Choices: "vector", "keyvalue", "keyword", "graph"--provider-id
: Optional. Provider ID for the memory bank--provider-memory-bank-id
: Optional. Provider's memory bank ID--chunk-size
: Optional. Chunk size in tokens (for vector type). Default: 512--embedding-model
: Optional. Embedding model (for vector type). Default: "all-MiniLM-L6-v2"--overlap-size
: Optional. Overlap size in tokens (for vector type). Default: 64
llama-stack-client memory_banks unregister
$ llama-stack-client memory_banks unregister <memory-bank-id>
Shield Management
llama-stack-client shields list
$ llama-stack-client shields list
+--------------+----------+----------------+-------------+
| identifier | params | provider_id | type |
+==============+==========+================+=============+
| llama_guard | {} | meta-reference | llama_guard |
+--------------+----------+----------------+-------------+
llama-stack-client shields register
$ llama-stack-client shields register --shield-id <shield-id> [--provider-id <provider-id>] [--provider-shield-id <provider-shield-id>] [--params <params>]
Options:
--shield-id
: Required. ID of the shield--provider-id
: Optional. Provider ID for the shield--provider-shield-id
: Optional. Provider's shield ID--params
: Optional. JSON configuration parameters for the shield
Eval Task Management
llama-stack-client eval_tasks list
$ llama-stack-client eval_tasks list
llama-stack-client eval_tasks register
$ llama-stack-client eval_tasks register --eval-task-id <eval-task-id> --dataset-id <dataset-id> --scoring-functions <function1> [<function2> ...] [--provider-id <provider-id>] [--provider-eval-task-id <provider-eval-task-id>] [--metadata <metadata>]
Options:
--eval-task-id
: Required. ID of the eval task--dataset-id
: Required. ID of the dataset to evaluate--scoring-functions
: Required. One or more scoring functions to use for evaluation--provider-id
: Optional. Provider ID for the eval task--provider-eval-task-id
: Optional. Provider's eval task ID--metadata
: Optional. Metadata for the eval task in JSON format
Eval execution
llama-stack-client eval run-benchmark
$ llama-stack-client eval run-benchmark <eval-task-id1> [<eval-task-id2> ...] --eval-task-config <config-file> --output-dir <output-dir> [--num-examples <num>] [--visualize]
Options:
--eval-task-config
: Required. Path to the eval task config file in JSON format--output-dir
: Required. Path to the directory where evaluation results will be saved--num-examples
: Optional. Number of examples to evaluate (useful for debugging)--visualize
: Optional flag. If set, visualizes evaluation results after completion
Example eval_task_config.json:
{
"type": "benchmark",
"eval_candidate": {
"type": "model",
"model": "Llama3.1-405B-Instruct",
"sampling_params": {
"strategy": {
"type": "greedy"
},
"max_tokens": 0,
"repetition_penalty": 1.0
}
}
}
llama-stack-client eval run-scoring
$ llama-stack-client eval run-scoring <eval-task-id> --eval-task-config <config-file> --output-dir <output-dir> [--num-examples <num>] [--visualize]
Options:
--eval-task-config
: Required. Path to the eval task config file in JSON format--output-dir
: Required. Path to the directory where scoring results will be saved--num-examples
: Optional. Number of examples to evaluate (useful for debugging)--visualize
: Optional flag. If set, visualizes scoring results after completion