llama-stack-mirror/docs/source/references/llama_stack_client_cli_reference.md
Mark Campbell 65fcd03461
docs: update outdated llama stack client documentation (#2758)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Adds new documentation that was missing for the Llama Stack Python
Client as well as updates old/outdated docs
2025-07-15 11:49:59 -07:00

24 KiB

llama (client-side) CLI Reference

The llama-stack-client CLI allows you to query information about the distribution.

Basic Commands

llama-stack-client

llama-stack-client
Usage: llama-stack-client [OPTIONS] COMMAND [ARGS]...

  Welcome to the llama-stack-client CLI - a command-line interface for
  interacting with Llama Stack

Options:
  --version        Show the version and exit.
  --endpoint TEXT  Llama Stack distribution endpoint
  --api-key TEXT   Llama Stack distribution API key
  --config TEXT    Path to config file
  --help           Show this message and exit.

Commands:
  configure          Configure Llama Stack Client CLI.
  datasets           Manage datasets.
  eval               Run evaluation tasks.
  eval_tasks         Manage evaluation tasks.
  inference          Inference (chat).
  inspect            Inspect server configuration.
  models             Manage GenAI models.
  post_training      Post-training.
  providers          Manage API providers.
  scoring_functions  Manage scoring functions.
  shields            Manage safety shield services.
  toolgroups         Manage available tool groups.
  vector_dbs         Manage vector databases.

llama-stack-client configure

Configure Llama Stack Client CLI.

llama-stack-client configure
> Enter the host name of the Llama Stack distribution server: localhost
> Enter the port number of the Llama Stack distribution server: 8321
Done! You can now use the Llama Stack Client CLI with endpoint http://localhost:8321

Optional arguments:

  • --endpoint: Llama Stack distribution endpoint
  • --api-key: Llama Stack distribution API key

llama-stack-client inspect version

Inspect server configuration.

llama-stack-client inspect version
VersionInfo(version='0.2.14')

llama-stack-client providers list

Show available providers on distribution endpoint

llama-stack-client providers list
+-----------+----------------+-----------------+
| API       | Provider ID    | Provider Type   |
+===========+================+=================+
| scoring   | meta0          | meta-reference  |
+-----------+----------------+-----------------+
| datasetio | meta0          | meta-reference  |
+-----------+----------------+-----------------+
| inference | tgi0           | remote::tgi     |
+-----------+----------------+-----------------+
| memory    | meta-reference | meta-reference  |
+-----------+----------------+-----------------+
| agents    | meta-reference | meta-reference  |
+-----------+----------------+-----------------+
| telemetry | meta-reference | meta-reference  |
+-----------+----------------+-----------------+
| safety    | meta-reference | meta-reference  |
+-----------+----------------+-----------------+

llama-stack-client providers inspect

Show specific provider configuration on distribution endpoint

llama-stack-client providers inspect <provider_id>

Inference

Inference (chat).

llama-stack-client inference chat-completion

Show available inference chat completion endpoints on distribution endpoint

llama-stack-client inference chat-completion --message <message> [--stream] [--session] [--model-id]
OpenAIChatCompletion(
    id='chatcmpl-aacd11f3-8899-4ec5-ac5b-e655132f6891',
    choices=[
        OpenAIChatCompletionChoice(
            finish_reason='stop',
            index=0,
            message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
                role='assistant',
                content='The captain of the whaleship Pequod in Nathaniel Hawthorne\'s novel "Moby-Dick" is Captain
Ahab. He\'s a vengeful and obsessive old sailor who\'s determined to hunt down and kill the white sperm whale
Moby-Dick, whom he\'s lost his leg to in a previous encounter.',
                name=None,
                tool_calls=None,
                refusal=None,
                annotations=None,
                audio=None,
                function_call=None
            ),
            logprobs=None
        )
    ],
    created=1752578797,
    model='llama3.2:3b-instruct-fp16',
    object='chat.completion',
    service_tier=None,
    system_fingerprint='fp_ollama',
    usage={
        'completion_tokens': 67,
        'prompt_tokens': 33,
        'total_tokens': 100,
        'completion_tokens_details': None,
        'prompt_tokens_details': None
    }
)

Required arguments: Note: At least one of these parameters is required for chat completion

  • --message: Message
  • --session: Start a Chat Session

Optional arguments:

  • --stream: Stream
  • --model-id: Model ID

Model Management

Manage GenAI models.

llama-stack-client models list

Show available llama models at distribution endpoint

llama-stack-client models list
Available Models

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ model_type   ┃ identifier                           ┃ provider_resource_id         ┃ metadata  ┃ provider_id ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ llm          │ meta-llama/Llama-3.2-3B-Instruct     │ llama3.2:3b-instruct-fp16    │           │ ollama      │
└──────────────┴──────────────────────────────────────┴──────────────────────────────┴───────────┴─────────────┘

Total models: 1

llama-stack-client models get

Show details of a specific model at the distribution endpoint

llama-stack-client models get Llama3.1-8B-Instruct
+----------------------+----------------------+----------------------------------------------------------+---------------+
| identifier           | llama_model          | metadata                                                 | provider_id   |
+======================+======================+==========================================================+===============+
| Llama3.1-8B-Instruct | Llama3.1-8B-Instruct | {'huggingface_repo': 'meta-llama/Llama-3.1-8B-Instruct'} | tgi0          |
+----------------------+----------------------+----------------------------------------------------------+---------------+
llama-stack-client models get Random-Model

Model RandomModel is not found at distribution endpoint host:port. Please ensure endpoint is serving specified model.

llama-stack-client models register

Register a new model at distribution endpoint

llama-stack-client models register <model_id> [--provider-id <provider_id>] [--provider-model-id <provider_model_id>] [--metadata <metadata>] [--model-type <model_type>]

Required arguments:

  • MODEL_ID: Model ID
  • --provider-id: Provider ID for the model

Optional arguments:

  • --provider-model-id: Provider's model ID
  • --metadata: JSON metadata for the model
  • --model-type: Model type: llm, embedding

llama-stack-client models unregister

Unregister a model from distribution endpoint

llama-stack-client models unregister <model_id>

Vector DB Management

Manage vector databases.

llama-stack-client vector_dbs list

Show available vector dbs on distribution endpoint

llama-stack-client vector_dbs list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier               ┃ provider_id ┃ provider_resource_id     ┃ vector_db_type ┃ params                            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ my_demo_vector_db        │ faiss       │ my_demo_vector_db        │                │ embedding_dimension: 384          │
│                          │             │                          │                │ embedding_model: all-MiniLM-L6-v2 │
│                          │             │                          │                │ type: vector_db                   │
│                          │             │                          │                │                                   │
└──────────────────────────┴─────────────┴──────────────────────────┴────────────────┴───────────────────────────────────┘

llama-stack-client vector_dbs register

Create a new vector db

llama-stack-client vector_dbs register <vector-db-id> [--provider-id <provider-id>] [--provider-vector-db-id <provider-vector-db-id>] [--embedding-model <embedding-model>] [--embedding-dimension <embedding-dimension>]

Required arguments:

  • VECTOR_DB_ID: Vector DB ID

Optional arguments:

  • --provider-id: Provider ID for the vector db
  • --provider-vector-db-id: Provider's vector db ID
  • --embedding-model: Embedding model to use. Default: all-MiniLM-L6-v2
  • --embedding-dimension: Dimension of embeddings. Default: 384

llama-stack-client vector_dbs unregister

Delete a vector db

llama-stack-client vector_dbs unregister <vector-db-id>

Required arguments:

  • VECTOR_DB_ID: Vector DB ID

Shield Management

Manage safety shield services.

llama-stack-client shields list

Show available safety shields on distribution endpoint

llama-stack-client shields list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ identifier                       ┃ provider_alias                                                        ┃ params                ┃ provider_id                        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ ollama                           │ ollama/llama-guard3:1b                                                │                       │ llama-guard                        │
└──────────────────────────────────┴───────────────────────────────────────────────────────────────────────┴───────────────────────┴────────────────────────────────────┘

llama-stack-client shields register

Register a new safety shield

llama-stack-client shields register --shield-id <shield-id> [--provider-id <provider-id>] [--provider-shield-id <provider-shield-id>] [--params <params>]

Required arguments:

  • --shield-id: ID of the shield

Optional arguments:

  • --provider-id: Provider ID for the shield
  • --provider-shield-id: Provider's shield ID
  • --params: JSON configuration parameters for the shield

Eval execution

Run evaluation tasks.

llama-stack-client eval run-benchmark

Run a evaluation benchmark task

llama-stack-client eval run-benchmark <eval-task-id1> [<eval-task-id2> ...] --eval-task-config <config-file> --output-dir <output-dir> --model-id <model-id> [--num-examples <num>] [--visualize] [--repeat-penalty <repeat-penalty>] [--top-p <top-p>] [--max-tokens <max-tokens>]

Required arguments:

  • --eval-task-config: Path to the eval task config file in JSON format
  • --output-dir: Path to the directory where evaluation results will be saved
  • --model-id: model id to run the benchmark eval on

Optional arguments:

  • --num-examples: Number of examples to evaluate (useful for debugging)
  • --visualize: If set, visualizes evaluation results after completion
  • --repeat-penalty: repeat-penalty in the sampling params to run generation
  • --top-p: top-p in the sampling params to run generation
  • --max-tokens: max-tokens in the sampling params to run generation
  • --temperature: temperature in the sampling params to run generation

Example benchmark_config.json:

{
    "type": "benchmark",
    "eval_candidate": {
        "type": "model",
        "model": "Llama3.1-405B-Instruct",
        "sampling_params": {
            "strategy": "greedy",
        }
    }
}

llama-stack-client eval run-scoring

Run scoring from application datasets

llama-stack-client eval run-scoring <eval-task-id> --output-dir <output-dir> [--num-examples <num>] [--visualize]

Required arguments:

  • --output-dir: Path to the directory where scoring results will be saved

Optional arguments:

  • --num-examples: Number of examples to evaluate (useful for debugging)
  • --visualize: If set, visualizes scoring results after completion
  • --scoring-params-config: Path to the scoring params config file in JSON format
  • --dataset-id: Pre-registered dataset_id to score (from llama-stack-client datasets list)
  • --dataset-path: Path to the dataset file to score

Eval Tasks

Manage evaluation tasks.

llama-stack-client eval_tasks list

Show available eval tasks on distribution endpoint

llama-stack-client eval_tasks list

llama-stack-client eval_tasks register

Register a new eval task

llama-stack-client eval_tasks register --eval-task-id <eval-task-id> --dataset-id <dataset-id> --scoring-functions <scoring-functions> [--provider-id <provider-id>] [--provider-eval-task-id <provider-eval-task-id>] [--metadata <metadata>]

Required arguments:

  • --eval-task-id: ID of the eval task
  • --dataset-id: ID of the dataset to evaluate
  • --scoring-functions: Scoring functions to use for evaluation

Optional arguments:

  • --provider-id: Provider ID for the eval task
  • --provider-eval-task-id: Provider's eval task ID

Tool Group Management

Manage available tool groups.

llama-stack-client toolgroups list

Show available llama toolgroups at distribution endpoint

llama-stack-client toolgroups list
+---------------------------+------------------+------+---------------+
| identifier                | provider_id      | args | mcp_endpoint  |
+===========================+==================+======+===============+
| builtin::rag              | rag-runtime      | None | None          |
+---------------------------+------------------+------+---------------+
| builtin::websearch        | tavily-search    | None | None          |
+---------------------------+------------------+------+---------------+

llama-stack-client toolgroups get

Get available llama toolgroups by id

llama-stack-client toolgroups get <toolgroup_id>

Shows detailed information about a specific toolgroup. If the toolgroup is not found, displays an error message.

Required arguments:

  • TOOLGROUP_ID: ID of the tool group

llama-stack-client toolgroups register

Register a new toolgroup at distribution endpoint

llama-stack-client toolgroups register <toolgroup_id> [--provider-id <provider-id>] [--provider-toolgroup-id <provider-toolgroup-id>] [--mcp-config <mcp-config>] [--args <args>]

Required arguments:

  • TOOLGROUP_ID: ID of the tool group

Optional arguments:

  • --provider-id: Provider ID for the toolgroup
  • --provider-toolgroup-id: Provider's toolgroup ID
  • --mcp-config: JSON configuration for the MCP endpoint
  • --args: JSON arguments for the toolgroup

llama-stack-client toolgroups unregister

Unregister a toolgroup from distribution endpoint

llama-stack-client toolgroups unregister <toolgroup_id>

Required arguments:

  • TOOLGROUP_ID: ID of the tool group

Datasets Management

Manage datasets.

llama-stack-client datasets list

Show available datasets on distribution endpoint

llama-stack-client datasets list

llama-stack-client datasets register

llama-stack-client datasets register --dataset_id <dataset_id> --purpose <purpose> [--url <url] [--dataset-path <dataset-path>] [--dataset-id <dataset-id>] [--metadata <metadata>]

Required arguments:

  • --dataset_id: Id of the dataset
  • --purpose: Purpose of the dataset

Optional arguments:

  • --metadata: Metadata of the dataset
  • --url: URL of the dataset
  • --dataset-path: Local file path to the dataset. If specified, upload dataset via URL

llama-stack-client datasets unregister

Remove a dataset

llama-stack-client datasets unregister <dataset-id>

Required arguments:

  • DATASET_ID: Id of the dataset

Scoring Functions Management

Manage scoring functions.

llama-stack-client scoring_functions list

Show available scoring functions on distribution endpoint

llama-stack-client scoring_functions list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┓
┃ identifier                                 ┃ provider_id  ┃ description                                                   ┃ type             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━┩
│ basic::bfcl                                │ basic        │ BFCL complex scoring                                          │ scoring_function │
│ basic::docvqa                              │ basic        │ DocVQA Visual Question & Answer scoring function              │ scoring_function │
│ basic::equality                            │ basic        │ Returns 1.0 if the input is equal to the target, 0.0          │ scoring_function │
│                                            │              │ otherwise.                                                    │                  │
└────────────────────────────────────────────┴──────────────┴───────────────────────────────────────────────────────────────┴──────────────────┘

llama-stack-client scoring_functions register

Register a new scoring function

llama-stack-client scoring_functions register --scoring-fn-id <scoring-fn-id> --description <description> --return-type <return-type> [--provider-id <provider-id>] [--provider-scoring-fn-id <provider-scoring-fn-id>] [--params <params>]

Required arguments:

  • --scoring-fn-id: Id of the scoring function
  • --description: Description of the scoring function
  • --return-type: Return type of the scoring function

Optional arguments:

  • --provider-id: Provider ID for the scoring function
  • --provider-scoring-fn-id: Provider's scoring function ID
  • --params: Parameters for the scoring function in JSON format

Post Training Management

Post-training.

llama-stack-client post_training list

Show the list of available post training jobs

llama-stack-client post_training list
["job-1", "job-2", "job-3"]

llama-stack-client post_training artifacts

Get the training artifacts of a specific post training job

llama-stack-client post_training artifacts --job-uuid <job-uuid>
JobArtifactsResponse(checkpoints=[], job_uuid='job-1')

Required arguments:

  • --job-uuid: Job UUID

llama-stack-client post_training supervised_fine_tune

Kick off a supervised fine tune job

llama-stack-client post_training supervised_fine_tune --job-uuid <job-uuid> --model <model> --algorithm-config <algorithm-config> --training-config <training-config> [--checkpoint-dir <checkpoint-dir>]

Required arguments:

  • --job-uuid: Job UUID
  • --model: Model ID
  • --algorithm-config: Algorithm Config
  • --training-config: Training Config

Optional arguments:

  • --checkpoint-dir: Checkpoint Config

llama-stack-client post_training status

Show the status of a specific post training job

llama-stack-client post_training status --job-uuid <job-uuid>
JobStatusResponse(
    checkpoints=[],
    job_uuid='job-1',
    status='completed',
    completed_at="",
    resources_allocated="",
    scheduled_at="",
    started_at=""
)

Required arguments:

  • --job-uuid: Job UUID

llama-stack-client post_training cancel

Cancel the training job

llama-stack-client post_training cancel --job-uuid <job-uuid>
# This functionality is not yet implemented for llama-stack-client
╭────────────────────────────────────────────────────────────╮
│ Failed to post_training cancel_training_job                │
│                                                            │
│ Error Type: InternalServerError                            │
│ Details: Error code: 501 - {'detail': 'Not implemented: '} │
╰────────────────────────────────────────────────────────────╯

Required arguments:

  • --job-uuid: Job UUID