# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to replace the Llama Stack's default embedding model by nomic-embed-text-v1.5. These are the key reasons why Llama Stack community decided to switch from all-MiniLM-L6-v2 to nomic-embed-text-v1.5: 1. The training data for [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) includes a lot of data sets with various licensing terms, so it is tricky to know when/whether it is appropriate to use this model for commercial applications. 2. The model is not particularly competitive on major benchmarks. For example, if you look at the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click on Miscellaneous/BEIR to see English information retrieval accuracy, you see that the top of the leaderboard is dominated by enormous models but also that there are many, many models of relatively modest size whith much higher Retrieval scores. If you want to look closely at the data, I recommend clicking "Download Table" because it is easier to browse that way. More discussion info can be founded [here](https://github.com/llamastack/llama-stack/issues/2418) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2418 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> 1. Run `./scripts/unit-tests.sh` 2. Integration tests via CI wokrflow --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com>
23 KiB
llama (client-side) CLI Reference
The llama-stack-client
CLI allows you to query information about the distribution.
Basic Commands
llama-stack-client
llama-stack-client
Usage: llama-stack-client [OPTIONS] COMMAND [ARGS]...
Welcome to the llama-stack-client CLI - a command-line interface for
interacting with Llama Stack
Options:
--version Show the version and exit.
--endpoint TEXT Llama Stack distribution endpoint
--api-key TEXT Llama Stack distribution API key
--config TEXT Path to config file
--help Show this message and exit.
Commands:
configure Configure Llama Stack Client CLI.
datasets Manage datasets.
eval Run evaluation tasks.
eval_tasks Manage evaluation tasks.
inference Inference (chat).
inspect Inspect server configuration.
models Manage GenAI models.
post_training Post-training.
providers Manage API providers.
scoring_functions Manage scoring functions.
shields Manage safety shield services.
toolgroups Manage available tool groups.
vector_dbs Manage vector databases.
llama-stack-client configure
Configure Llama Stack Client CLI.
llama-stack-client configure
> Enter the host name of the Llama Stack distribution server: localhost
> Enter the port number of the Llama Stack distribution server: 8321
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321
Optional arguments:
--endpoint
: Llama Stack distribution endpoint--api-key
: Llama Stack distribution API key
llama-stack-client inspect version
Inspect server configuration.
llama-stack-client inspect version
VersionInfo(version='0.2.14')
llama-stack-client providers list
Show available providers on distribution endpoint
llama-stack-client providers list
+-----------+----------------+-----------------+
| API | Provider ID | Provider Type |
+===========+================+=================+
| scoring | meta0 | meta-reference |
+-----------+----------------+-----------------+
| datasetio | meta0 | meta-reference |
+-----------+----------------+-----------------+
| inference | tgi0 | remote::tgi |
+-----------+----------------+-----------------+
| memory | meta-reference | meta-reference |
+-----------+----------------+-----------------+
| agents | meta-reference | meta-reference |
+-----------+----------------+-----------------+
| telemetry | meta-reference | meta-reference |
+-----------+----------------+-----------------+
| safety | meta-reference | meta-reference |
+-----------+----------------+-----------------+
llama-stack-client providers inspect
Show specific provider configuration on distribution endpoint
llama-stack-client providers inspect <provider_id>
Inference
Inference (chat).
llama-stack-client inference chat-completion
Show available inference chat completion endpoints on distribution endpoint
llama-stack-client inference chat-completion --message <message> [--stream] [--session] [--model-id]
OpenAIChatCompletion(
id='chatcmpl-aacd11f3-8899-4ec5-ac5b-e655132f6891',
choices=[
OpenAIChatCompletionChoice(
finish_reason='stop',
index=0,
message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
role='assistant',
content='The captain of the whaleship Pequod in Nathaniel Hawthorne\'s novel "Moby-Dick" is Captain
Ahab. He\'s a vengeful and obsessive old sailor who\'s determined to hunt down and kill the white sperm whale
Moby-Dick, whom he\'s lost his leg to in a previous encounter.',
name=None,
tool_calls=None,
refusal=None,
annotations=None,
audio=None,
function_call=None
),
logprobs=None
)
],
created=1752578797,
model='llama3.2:3b-instruct-fp16',
object='chat.completion',
service_tier=None,
system_fingerprint='fp_ollama',
usage={
'completion_tokens': 67,
'prompt_tokens': 33,
'total_tokens': 100,
'completion_tokens_details': None,
'prompt_tokens_details': None
}
)
Required arguments: Note: At least one of these parameters is required for chat completion
--message
: Message--session
: Start a Chat Session
Optional arguments:
--stream
: Stream--model-id
: Model ID
Model Management
Manage GenAI models.
llama-stack-client models list
Show available llama models at distribution endpoint
llama-stack-client models list
Available Models
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ model_type ┃ identifier ┃ provider_resource_id ┃ metadata ┃ provider_id ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ llm │ meta-llama/Llama-3.2-3B-Instruct │ llama3.2:3b-instruct-fp16 │ │ ollama │
└──────────────┴──────────────────────────────────────┴──────────────────────────────┴───────────┴─────────────┘
Total models: 1
llama-stack-client models get
Show details of a specific model at the distribution endpoint
llama-stack-client models get Llama3.1-8B-Instruct
+----------------------+----------------------+----------------------------------------------------------+---------------+
| identifier | llama_model | metadata | provider_id |
+======================+======================+==========================================================+===============+
| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | {'huggingface_repo': 'meta-llama/Llama-3.1-8B-Instruct'} | tgi0 |
+----------------------+----------------------+----------------------------------------------------------+---------------+
llama-stack-client models get Random-Model
Model RandomModel is not found at distribution endpoint host:port. Please ensure endpoint is serving specified model.
llama-stack-client models register
Register a new model at distribution endpoint
llama-stack-client models register <model_id> [--provider-id <provider_id>] [--provider-model-id <provider_model_id>] [--metadata <metadata>] [--model-type <model_type>]
Required arguments:
MODEL_ID
: Model ID--provider-id
: Provider ID for the model
Optional arguments:
--provider-model-id
: Provider's model ID--metadata
: JSON metadata for the model--model-type
: Model type:llm
,embedding
llama-stack-client models unregister
Unregister a model from distribution endpoint
llama-stack-client models unregister <model_id>
Vector DB Management
Manage vector databases.
llama-stack-client vector_dbs list
Show available vector dbs on distribution endpoint
llama-stack-client vector_dbs list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ provider_resource_id ┃ vector_db_type ┃ params ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ my_demo_vector_db │ faiss │ my_demo_vector_db │ │ embedding_dimension: 768 │
│ │ │ │ │ embedding_model: nomic-embed-text-v1.5 │
│ │ │ │ │ type: vector_db │
│ │ │ │ │ │
└──────────────────────────┴─────────────┴──────────────────────────┴────────────────┴───────────────────────────────────┘
llama-stack-client vector_dbs register
Create a new vector db
llama-stack-client vector_dbs register <vector-db-id> [--provider-id <provider-id>] [--provider-vector-db-id <provider-vector-db-id>] [--embedding-model <embedding-model>] [--embedding-dimension <embedding-dimension>]
Required arguments:
VECTOR_DB_ID
: Vector DB ID
Optional arguments:
--provider-id
: Provider ID for the vector db--provider-vector-db-id
: Provider's vector db ID--embedding-model
: Embedding model to use. Default:nomic-embed-text-v1.5
--embedding-dimension
: Dimension of embeddings. Default: 768
llama-stack-client vector_dbs unregister
Delete a vector db
llama-stack-client vector_dbs unregister <vector-db-id>
Required arguments:
VECTOR_DB_ID
: Vector DB ID
Shield Management
Manage safety shield services.
llama-stack-client shields list
Show available safety shields on distribution endpoint
llama-stack-client shields list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier ┃ provider_alias ┃ params ┃ provider_id ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ ollama │ ollama/llama-guard3:1b │ │ llama-guard │
└──────────────────────────────────┴───────────────────────────────────────────────────────────────────────┴───────────────────────┴────────────────────────────────────┘
llama-stack-client shields register
Register a new safety shield
llama-stack-client shields register --shield-id <shield-id> [--provider-id <provider-id>] [--provider-shield-id <provider-shield-id>] [--params <params>]
Required arguments:
--shield-id
: ID of the shield
Optional arguments:
--provider-id
: Provider ID for the shield--provider-shield-id
: Provider's shield ID--params
: JSON configuration parameters for the shield
Eval execution
Run evaluation tasks.
llama-stack-client eval run-benchmark
Run a evaluation benchmark task
llama-stack-client eval run-benchmark <eval-task-id1> [<eval-task-id2> ...] --eval-task-config <config-file> --output-dir <output-dir> --model-id <model-id> [--num-examples <num>] [--visualize] [--repeat-penalty <repeat-penalty>] [--top-p <top-p>] [--max-tokens <max-tokens>]
Required arguments:
--eval-task-config
: Path to the eval task config file in JSON format--output-dir
: Path to the directory where evaluation results will be saved--model-id
: model id to run the benchmark eval on
Optional arguments:
--num-examples
: Number of examples to evaluate (useful for debugging)--visualize
: If set, visualizes evaluation results after completion--repeat-penalty
: repeat-penalty in the sampling params to run generation--top-p
: top-p in the sampling params to run generation--max-tokens
: max-tokens in the sampling params to run generation--temperature
: temperature in the sampling params to run generation
Example benchmark_config.json:
{
"type": "benchmark",
"eval_candidate": {
"type": "model",
"model": "Llama3.1-405B-Instruct",
"sampling_params": {
"strategy": "greedy",
}
}
}
llama-stack-client eval run-scoring
Run scoring from application datasets
llama-stack-client eval run-scoring <eval-task-id> --output-dir <output-dir> [--num-examples <num>] [--visualize]
Required arguments:
--output-dir
: Path to the directory where scoring results will be saved
Optional arguments:
--num-examples
: Number of examples to evaluate (useful for debugging)--visualize
: If set, visualizes scoring results after completion--scoring-params-config
: Path to the scoring params config file in JSON format--dataset-id
: Pre-registered dataset_id to score (from llama-stack-client datasets list)--dataset-path
: Path to the dataset file to score
Eval Tasks
Manage evaluation tasks.
llama-stack-client eval_tasks list
Show available eval tasks on distribution endpoint
llama-stack-client eval_tasks list
llama-stack-client eval_tasks register
Register a new eval task
llama-stack-client eval_tasks register --eval-task-id <eval-task-id> --dataset-id <dataset-id> --scoring-functions <scoring-functions> [--provider-id <provider-id>] [--provider-eval-task-id <provider-eval-task-id>] [--metadata <metadata>]
Required arguments:
--eval-task-id
: ID of the eval task--dataset-id
: ID of the dataset to evaluate--scoring-functions
: Scoring functions to use for evaluation
Optional arguments:
--provider-id
: Provider ID for the eval task--provider-eval-task-id
: Provider's eval task ID
Tool Group Management
Manage available tool groups.
llama-stack-client toolgroups list
Show available llama toolgroups at distribution endpoint
llama-stack-client toolgroups list
+---------------------------+------------------+------+---------------+
| identifier | provider_id | args | mcp_endpoint |
+===========================+==================+======+===============+
| builtin::rag | rag-runtime | None | None |
+---------------------------+------------------+------+---------------+
| builtin::websearch | tavily-search | None | None |
+---------------------------+------------------+------+---------------+
llama-stack-client toolgroups get
Get available llama toolgroups by id
llama-stack-client toolgroups get <toolgroup_id>
Shows detailed information about a specific toolgroup. If the toolgroup is not found, displays an error message.
Required arguments:
TOOLGROUP_ID
: ID of the tool group
llama-stack-client toolgroups register
Register a new toolgroup at distribution endpoint
llama-stack-client toolgroups register <toolgroup_id> [--provider-id <provider-id>] [--provider-toolgroup-id <provider-toolgroup-id>] [--mcp-config <mcp-config>] [--args <args>]
Required arguments:
TOOLGROUP_ID
: ID of the tool group
Optional arguments:
--provider-id
: Provider ID for the toolgroup--provider-toolgroup-id
: Provider's toolgroup ID--mcp-config
: JSON configuration for the MCP endpoint--args
: JSON arguments for the toolgroup
llama-stack-client toolgroups unregister
Unregister a toolgroup from distribution endpoint
llama-stack-client toolgroups unregister <toolgroup_id>
Required arguments:
TOOLGROUP_ID
: ID of the tool group
Datasets Management
Manage datasets.
llama-stack-client datasets list
Show available datasets on distribution endpoint
llama-stack-client datasets list
llama-stack-client datasets register
llama-stack-client datasets register --dataset_id <dataset_id> --purpose <purpose> [--url <url] [--dataset-path <dataset-path>] [--dataset-id <dataset-id>] [--metadata <metadata>]
Required arguments:
--dataset_id
: Id of the dataset--purpose
: Purpose of the dataset
Optional arguments:
--metadata
: Metadata of the dataset--url
: URL of the dataset--dataset-path
: Local file path to the dataset. If specified, upload dataset via URL
llama-stack-client datasets unregister
Remove a dataset
llama-stack-client datasets unregister <dataset-id>
Required arguments:
DATASET_ID
: Id of the dataset
Scoring Functions Management
Manage scoring functions.
llama-stack-client scoring_functions list
Show available scoring functions on distribution endpoint
llama-stack-client scoring_functions list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ description ┃ type ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ basic::docvqa │ basic │ DocVQA Visual Question & Answer scoring function │ scoring_function │
│ basic::equality │ basic │ Returns 1.0 if the input is equal to the target, 0.0 │ scoring_function │
│ │ │ otherwise. │ │
└────────────────────────────────────────────┴──────────────┴───────────────────────────────────────────────────────────────┴──────────────────┘
llama-stack-client scoring_functions register
Register a new scoring function
llama-stack-client scoring_functions register --scoring-fn-id <scoring-fn-id> --description <description> --return-type <return-type> [--provider-id <provider-id>] [--provider-scoring-fn-id <provider-scoring-fn-id>] [--params <params>]
Required arguments:
--scoring-fn-id
: Id of the scoring function--description
: Description of the scoring function--return-type
: Return type of the scoring function
Optional arguments:
--provider-id
: Provider ID for the scoring function--provider-scoring-fn-id
: Provider's scoring function ID--params
: Parameters for the scoring function in JSON format
Post Training Management
Post-training.
llama-stack-client post_training list
Show the list of available post training jobs
llama-stack-client post_training list
["job-1", "job-2", "job-3"]
llama-stack-client post_training artifacts
Get the training artifacts of a specific post training job
llama-stack-client post_training artifacts --job-uuid <job-uuid>
JobArtifactsResponse(checkpoints=[], job_uuid='job-1')
Required arguments:
--job-uuid
: Job UUID
llama-stack-client post_training supervised_fine_tune
Kick off a supervised fine tune job
llama-stack-client post_training supervised_fine_tune --job-uuid <job-uuid> --model <model> --algorithm-config <algorithm-config> --training-config <training-config> [--checkpoint-dir <checkpoint-dir>]
Required arguments:
--job-uuid
: Job UUID--model
: Model ID--algorithm-config
: Algorithm Config--training-config
: Training Config
Optional arguments:
--checkpoint-dir
: Checkpoint Config
llama-stack-client post_training status
Show the status of a specific post training job
llama-stack-client post_training status --job-uuid <job-uuid>
JobStatusResponse(
checkpoints=[],
job_uuid='job-1',
status='completed',
completed_at="",
resources_allocated="",
scheduled_at="",
started_at=""
)
Required arguments:
--job-uuid
: Job UUID
llama-stack-client post_training cancel
Cancel the training job
llama-stack-client post_training cancel --job-uuid <job-uuid>
# This functionality is not yet implemented for llama-stack-client
╭────────────────────────────────────────────────────────────╮
│ Failed to post_training cancel_training_job │
│ │
│ Error Type: InternalServerError │
│ Details: Error code: 501 - {'detail': 'Not implemented: '} │
╰────────────────────────────────────────────────────────────╯
Required arguments:
--job-uuid
: Job UUID