diff --git a/MANIFEST.in b/MANIFEST.in index e7c63fffd..52ab42950 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,5 +1,4 @@ include requirements.txt include llama_stack/distribution/*.sh include llama_stack/cli/scripts/*.sh -include llama_stack/distribution/example_configs/conda/*.yaml -include llama_stack/distribution/example_configs/docker/*.yaml +include llama_stack/distribution/templates/*.yaml diff --git a/docs/getting_started.md b/docs/getting_started.md index 0ab3461dd..997ee0a22 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -28,6 +28,7 @@ Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/ ``` **`llama stack configure`** +- Run `llama stack configure ` with the name you have previously defined in `build` step. ``` llama stack configure my-local-llama-stack @@ -61,6 +62,7 @@ You can now run `llama stack run my-local-llama-stack --port PORT` or `llama sta ``` **`llama stack run`** +- Run `llama stack run ` with the name you have previously defined. ``` llama stack run my-local-llama-stack @@ -110,74 +112,94 @@ In the following steps, imagine we'll be working with a `Meta-Llama3.1-8B-Instru - `providers`: specifies the underlying implementation for serving each API endpoint - `image_type`: `conda` | `docker` to specify whether to build the distribution in the form of Docker image or Conda environment. -#### Build a local distribution with conda -The following command and specifications allows you to get started with building. -``` -llama stack build -``` -- You will be required to pass in a file path to the build.config file (e.g. `./llama_stack/distribution/example_configs/conda/local-conda-example-build.yaml`). We provide some example build config files for configuring different types of distributions in the `./llama_stack/distribution/example_configs/` folder. -The file will be of the contents -``` -$ cat ./llama_stack/distribution/example_configs/conda/local-conda-example-build.yaml +At the end of build command, we will generate `-build.yaml` file storing the build configurations. -name: 8b-instruct +After this step is complete, a file named `-build.yaml` will be generated and saved at the output file path specified at the end of the command. + +#### Building from scratch +- For a new user, we could start off with running `llama stack build` which will allow you to a interactively enter wizard where you will be prompted to enter build configurations. +``` +llama stack build +``` + +Running the command above will allow you to fill in the configuration to build your Llama Stack distribution, you will see the following outputs. + +``` +> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-llama-stack +> Enter the image type you want your distribution to be built with (docker or conda): conda + + Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs. +> Enter the API provider for the inference API: (default=meta-reference): meta-reference +> Enter the API provider for the safety API: (default=meta-reference): meta-reference +> Enter the API provider for the agents API: (default=meta-reference): meta-reference +> Enter the API provider for the memory API: (default=meta-reference): meta-reference +> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference + + > (Optional) Enter a short description for your Llama Stack distribution: + +Build spec configuration saved at ~/.conda/envs/llamastack-my-local-llama-stack/my-local-llama-stack-build.yaml +``` + +#### Building from templates +- To build from alternative API providers, we provide distribution templates for users to get started building a distribution backed by different providers. + +The following command will allow you to see the available templates and their corresponding providers. +``` +llama stack build --list-templates +``` + +![alt text](resources/list-templates.png) + +You may then pick a template to build your distribution with providers fitted to your liking. + +``` +llama stack build --template local-tgi --name my-tgi-stack +``` + +``` +$ llama stack build --template local-tgi --name my-tgi-stack +... +... +Build spec configuration saved at /home/xiyan/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml +You may now run `llama stack configure my-tgi-stack` or `llama stack configure /home/xiyan/.conda/envs/llamastack-my-tgi-stack/my-tgi-stack-build.yaml` +``` + +#### Building from config file +- In addition to templates, you may customize the build to your liking through editing config files and build from config files with the following command. + +- The config file will be of contents like the ones in `llama_stack/distributions/templates/`. + +``` +$ cat llama_stack/distribution/templates/local-ollama-build.yaml + +name: local-ollama distribution_spec: - distribution_type: local - description: Use code from `llama_stack` itself to serve all llama stack APIs - docker_image: null + description: Like local, but use ollama for running LLM inference providers: - inference: meta-reference - memory: meta-reference-faiss + inference: remote::ollama + memory: meta-reference safety: meta-reference - agentic_system: meta-reference - telemetry: console + agents: meta-reference + telemetry: meta-reference image_type: conda ``` -You may run the `llama stack build` command to generate your distribution with `--name` to override the name for your distribution. ``` -$ llama stack build ~/.llama/distributions/conda/8b-instruct-build.yaml --name 8b-instruct -... -... -Build spec configuration saved at ~/.llama/distributions/conda/8b-instruct-build.yaml +llama stack build --config llama_stack/distribution/templates/local-ollama-build.yaml ``` -After this step is complete, a file named `8b-instruct-build.yaml` will be generated and saved at `~/.llama/distributions/conda/8b-instruct-build.yaml`. - - -#### How to build distribution with different API providers using configs -To specify a different API provider, we can change the `distribution_spec` in our `-build.yaml` config. For example, the following build spec allows you to build a distribution using TGI as the inference API provider. - -``` -$ cat ./llama_stack/distribution/example_configs/conda/local-tgi-conda-example-build.yaml - -name: local-tgi-conda-example -distribution_spec: - description: Use TGI (local or with Hugging Face Inference Endpoints for running LLM inference. When using HF Inference Endpoints, you must provide the name of the endpoint). - docker_image: null - providers: - inference: remote::tgi - memory: meta-reference-faiss - safety: meta-reference - agentic_system: meta-reference - telemetry: console -image_type: conda -``` - -The following command allows you to build a distribution with TGI as the inference API provider, with the name `tgi`. -``` -llama stack build ./llama_stack/distribution/example_configs/conda/local-tgi-conda-example-build.yaml --name tgi -``` - -We provide some example build configs to help you get started with building with different API providers. - #### How to build distribution with Docker image -To build a docker image, simply change the `image_type` to `docker` in our `-build.yaml` file, and run `llama stack build -build.yaml`. + +To build a docker image, you may start off from a template and use the `--image-type docker` flag to specify `docker` as the build image type. ``` -$ cat ./llama_stack/distribution/example_configs/docker/local-docker-example-build.yaml +llama stack build --template local --image-type docker --name docker-0 +``` +Alternatively, you may use a config file and set `image_type` to `docker` in our `-build.yaml` file, and run `llama stack build -build.yaml`. The `-build.yaml` will be of contents like: + +``` name: local-docker-example distribution_spec: description: Use code from `llama_stack` itself to serve all llama stack APIs @@ -191,9 +213,9 @@ distribution_spec: image_type: docker ``` -The following command allows you to build a Docker image with the name `docker-local` +The following command allows you to build a Docker image with the name `` ``` -llama stack build ./llama_stack/distribution/example_configs/docker/local-docker-example-build.yaml --name docker-local +llama stack build --config -build.yaml Dockerfile created successfully in /tmp/tmp.I0ifS2c46A/DockerfileFROM python:3.10-slim WORKDIR /app @@ -203,10 +225,11 @@ You can run it with: podman run -p 8000:8000 llamastack-docker-local Build spec configuration saved at /home/xiyan/.llama/distributions/docker/docker-local-build.yaml ``` + ## Step 2. Configure After our distribution is built (either in form of docker or conda environment), we will run the following command to ``` -llama stack configure [ | ] +llama stack configure [ | | ] ``` - For `conda` environments: would be the generated build spec saved from Step 1. - For `docker` images downloaded from Dockerhub, you could also use as the argument. @@ -298,7 +321,7 @@ INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) ``` > [!NOTE] -> Configuration is in `~/.llama/builds/local/conda/8b-instruct.yaml`. Feel free to increase `max_seq_len`. +> Configuration is in `~/.llama/builds/local/conda/8b-instruct-run.yaml`. Feel free to increase `max_seq_len`. > [!IMPORTANT] > The "local" distribution inference server currently only supports CUDA. It will not work on Apple Silicon machines. diff --git a/docs/llama-stack-spec.html b/docs/llama-stack-spec.html new file mode 100644 index 000000000..a7ab57343 --- /dev/null +++ b/docs/llama-stack-spec.html @@ -0,0 +1,5850 @@ + + + + + + + OpenAPI specification + + + + + + + +
+ + + diff --git a/docs/llama-stack-spec.yaml b/docs/llama-stack-spec.yaml new file mode 100644 index 000000000..33d7d9a3a --- /dev/null +++ b/docs/llama-stack-spec.yaml @@ -0,0 +1,3695 @@ +components: + responses: {} + schemas: + AgentConfig: + additionalProperties: false + properties: + input_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + instructions: + type: string + model: + type: string + output_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + sampling_params: + $ref: '#/components/schemas/SamplingParams' + tool_choice: + $ref: '#/components/schemas/ToolChoice' + tool_prompt_format: + $ref: '#/components/schemas/ToolPromptFormat' + tools: + items: + oneOf: + - $ref: '#/components/schemas/SearchToolDefinition' + - $ref: '#/components/schemas/WolframAlphaToolDefinition' + - $ref: '#/components/schemas/PhotogenToolDefinition' + - $ref: '#/components/schemas/CodeInterpreterToolDefinition' + - $ref: '#/components/schemas/FunctionCallToolDefinition' + - additionalProperties: false + properties: + input_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + max_chunks: + type: integer + max_tokens_in_context: + type: integer + memory_bank_configs: + items: + oneOf: + - additionalProperties: false + properties: + bank_id: + type: string + type: + const: vector + type: string + required: + - bank_id + - type + type: object + - additionalProperties: false + properties: + bank_id: + type: string + keys: + items: + type: string + type: array + type: + const: keyvalue + type: string + required: + - bank_id + - type + - keys + type: object + - additionalProperties: false + properties: + bank_id: + type: string + type: + const: keyword + type: string + required: + - bank_id + - type + type: object + - additionalProperties: false + properties: + bank_id: + type: string + entities: + items: + type: string + type: array + type: + const: graph + type: string + required: + - bank_id + - type + - entities + type: object + type: array + output_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + query_generator_config: + oneOf: + - additionalProperties: false + properties: + sep: + type: string + type: + const: default + type: string + required: + - type + - sep + type: object + - additionalProperties: false + properties: + model: + type: string + template: + type: string + type: + const: llm + type: string + required: + - type + - model + - template + type: object + - additionalProperties: false + properties: + type: + const: custom + type: string + required: + - type + type: object + type: + const: memory + type: string + required: + - type + - memory_bank_configs + - query_generator_config + - max_tokens_in_context + - max_chunks + type: object + type: array + required: + - model + - instructions + type: object + AgentCreateResponse: + additionalProperties: false + properties: + agent_id: + type: string + required: + - agent_id + type: object + AgentSessionCreateResponse: + additionalProperties: false + properties: + session_id: + type: string + required: + - session_id + type: object + AgentStepResponse: + additionalProperties: false + properties: + step: + oneOf: + - $ref: '#/components/schemas/InferenceStep' + - $ref: '#/components/schemas/ToolExecutionStep' + - $ref: '#/components/schemas/ShieldCallStep' + - $ref: '#/components/schemas/MemoryRetrievalStep' + required: + - step + type: object + AgentTurnResponseEvent: + additionalProperties: false + properties: + payload: + oneOf: + - $ref: '#/components/schemas/AgentTurnResponseStepStartPayload' + - $ref: '#/components/schemas/AgentTurnResponseStepProgressPayload' + - $ref: '#/components/schemas/AgentTurnResponseStepCompletePayload' + - $ref: '#/components/schemas/AgentTurnResponseTurnStartPayload' + - $ref: '#/components/schemas/AgentTurnResponseTurnCompletePayload' + required: + - payload + title: Streamed agent execution response. + type: object + AgentTurnResponseStepCompletePayload: + additionalProperties: false + properties: + event_type: + const: step_complete + type: string + step_details: + oneOf: + - $ref: '#/components/schemas/InferenceStep' + - $ref: '#/components/schemas/ToolExecutionStep' + - $ref: '#/components/schemas/ShieldCallStep' + - $ref: '#/components/schemas/MemoryRetrievalStep' + step_type: + enum: + - inference + - tool_execution + - shield_call + - memory_retrieval + type: string + required: + - event_type + - step_type + - step_details + type: object + AgentTurnResponseStepProgressPayload: + additionalProperties: false + properties: + event_type: + const: step_progress + type: string + model_response_text_delta: + type: string + step_id: + type: string + step_type: + enum: + - inference + - tool_execution + - shield_call + - memory_retrieval + type: string + tool_call_delta: + $ref: '#/components/schemas/ToolCallDelta' + tool_response_text_delta: + type: string + required: + - event_type + - step_type + - step_id + type: object + AgentTurnResponseStepStartPayload: + additionalProperties: false + properties: + event_type: + const: step_start + type: string + metadata: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + step_id: + type: string + step_type: + enum: + - inference + - tool_execution + - shield_call + - memory_retrieval + type: string + required: + - event_type + - step_type + - step_id + type: object + AgentTurnResponseStreamChunk: + additionalProperties: false + properties: + event: + $ref: '#/components/schemas/AgentTurnResponseEvent' + required: + - event + type: object + AgentTurnResponseTurnCompletePayload: + additionalProperties: false + properties: + event_type: + const: turn_complete + type: string + turn: + $ref: '#/components/schemas/Turn' + required: + - event_type + - turn + type: object + AgentTurnResponseTurnStartPayload: + additionalProperties: false + properties: + event_type: + const: turn_start + type: string + turn_id: + type: string + required: + - event_type + - turn_id + type: object + Attachment: + additionalProperties: false + properties: + content: + oneOf: + - type: string + - items: + type: string + type: array + - $ref: '#/components/schemas/URL' + mime_type: + type: string + required: + - content + - mime_type + type: object + BatchChatCompletionRequest: + additionalProperties: false + properties: + logprobs: + additionalProperties: false + properties: + top_k: + type: integer + type: object + messages_batch: + items: + items: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/SystemMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + - $ref: '#/components/schemas/CompletionMessage' + type: array + type: array + model: + type: string + sampling_params: + $ref: '#/components/schemas/SamplingParams' + tool_choice: + $ref: '#/components/schemas/ToolChoice' + tool_prompt_format: + $ref: '#/components/schemas/ToolPromptFormat' + tools: + items: + $ref: '#/components/schemas/ToolDefinition' + type: array + required: + - model + - messages_batch + type: object + BatchChatCompletionResponse: + additionalProperties: false + properties: + completion_message_batch: + items: + $ref: '#/components/schemas/CompletionMessage' + type: array + required: + - completion_message_batch + type: object + BatchCompletionRequest: + additionalProperties: false + properties: + content_batch: + items: + oneOf: + - type: string + - items: + type: string + type: array + type: array + logprobs: + additionalProperties: false + properties: + top_k: + type: integer + type: object + model: + type: string + sampling_params: + $ref: '#/components/schemas/SamplingParams' + required: + - model + - content_batch + type: object + BatchCompletionResponse: + additionalProperties: false + properties: + completion_message_batch: + items: + $ref: '#/components/schemas/CompletionMessage' + type: array + required: + - completion_message_batch + type: object + BuiltinShield: + enum: + - llama_guard + - code_scanner_guard + - third_party_shield + - injection_shield + - jailbreak_shield + type: string + BuiltinTool: + enum: + - brave_search + - wolfram_alpha + - photogen + - code_interpreter + type: string + CancelEvaluationJobRequest: + additionalProperties: false + properties: + job_uuid: + type: string + required: + - job_uuid + type: object + CancelTrainingJobRequest: + additionalProperties: false + properties: + job_uuid: + type: string + required: + - job_uuid + type: object + ChatCompletionRequest: + additionalProperties: false + properties: + logprobs: + additionalProperties: false + properties: + top_k: + type: integer + type: object + messages: + items: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/SystemMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + - $ref: '#/components/schemas/CompletionMessage' + type: array + model: + type: string + sampling_params: + $ref: '#/components/schemas/SamplingParams' + stream: + type: boolean + tool_choice: + $ref: '#/components/schemas/ToolChoice' + tool_prompt_format: + $ref: '#/components/schemas/ToolPromptFormat' + tools: + items: + $ref: '#/components/schemas/ToolDefinition' + type: array + required: + - model + - messages + type: object + ChatCompletionResponse: + additionalProperties: false + properties: + completion_message: + $ref: '#/components/schemas/CompletionMessage' + logprobs: + items: + $ref: '#/components/schemas/TokenLogProbs' + type: array + required: + - completion_message + title: Chat completion response. + type: object + ChatCompletionResponseEvent: + additionalProperties: false + properties: + delta: + oneOf: + - type: string + - $ref: '#/components/schemas/ToolCallDelta' + event_type: + $ref: '#/components/schemas/ChatCompletionResponseEventType' + logprobs: + items: + $ref: '#/components/schemas/TokenLogProbs' + type: array + stop_reason: + $ref: '#/components/schemas/StopReason' + required: + - event_type + - delta + title: Chat completion response event. + type: object + ChatCompletionResponseEventType: + enum: + - start + - complete + - progress + type: string + ChatCompletionResponseStreamChunk: + additionalProperties: false + properties: + event: + $ref: '#/components/schemas/ChatCompletionResponseEvent' + required: + - event + title: SSE-stream of these events. + type: object + Checkpoint: + description: Checkpoint created during training runs + CodeInterpreterToolDefinition: + additionalProperties: false + properties: + enable_inline_code_execution: + type: boolean + input_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + output_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + remote_execution: + $ref: '#/components/schemas/RestAPIExecutionConfig' + type: + const: code_interpreter + type: string + required: + - type + - enable_inline_code_execution + type: object + CompletionMessage: + additionalProperties: false + properties: + content: + oneOf: + - type: string + - items: + type: string + type: array + role: + const: assistant + type: string + stop_reason: + $ref: '#/components/schemas/StopReason' + tool_calls: + items: + $ref: '#/components/schemas/ToolCall' + type: array + required: + - role + - content + - stop_reason + - tool_calls + type: object + CompletionRequest: + additionalProperties: false + properties: + content: + oneOf: + - type: string + - items: + type: string + type: array + logprobs: + additionalProperties: false + properties: + top_k: + type: integer + type: object + model: + type: string + sampling_params: + $ref: '#/components/schemas/SamplingParams' + stream: + type: boolean + required: + - model + - content + type: object + CompletionResponse: + additionalProperties: false + properties: + completion_message: + $ref: '#/components/schemas/CompletionMessage' + logprobs: + items: + $ref: '#/components/schemas/TokenLogProbs' + type: array + required: + - completion_message + title: Completion response. + type: object + CompletionResponseStreamChunk: + additionalProperties: false + properties: + delta: + type: string + logprobs: + items: + $ref: '#/components/schemas/TokenLogProbs' + type: array + stop_reason: + $ref: '#/components/schemas/StopReason' + required: + - delta + title: streamed completion response. + type: object + CreateAgentRequest: + additionalProperties: false + properties: + agent_config: + $ref: '#/components/schemas/AgentConfig' + required: + - agent_config + type: object + CreateAgentSessionRequest: + additionalProperties: false + properties: + agent_id: + type: string + session_name: + type: string + required: + - agent_id + - session_name + type: object + CreateAgentTurnRequest: + additionalProperties: false + properties: + agent_id: + type: string + attachments: + items: + $ref: '#/components/schemas/Attachment' + type: array + messages: + items: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + type: array + session_id: + type: string + stream: + type: boolean + required: + - agent_id + - session_id + - messages + type: object + CreateDatasetRequest: + additionalProperties: false + properties: + dataset: + $ref: '#/components/schemas/TrainEvalDataset' + uuid: + type: string + required: + - uuid + - dataset + type: object + CreateMemoryBankRequest: + additionalProperties: false + properties: + config: + oneOf: + - additionalProperties: false + properties: + chunk_size_in_tokens: + type: integer + embedding_model: + type: string + overlap_size_in_tokens: + type: integer + type: + const: vector + type: string + required: + - type + - embedding_model + - chunk_size_in_tokens + type: object + - additionalProperties: false + properties: + type: + const: keyvalue + type: string + required: + - type + type: object + - additionalProperties: false + properties: + type: + const: keyword + type: string + required: + - type + type: object + - additionalProperties: false + properties: + type: + const: graph + type: string + required: + - type + type: object + name: + type: string + url: + $ref: '#/components/schemas/URL' + required: + - name + - config + type: object + DPOAlignmentConfig: + additionalProperties: false + properties: + epsilon: + type: number + gamma: + type: number + reward_clip: + type: number + reward_scale: + type: number + required: + - reward_scale + - reward_clip + - epsilon + - gamma + type: object + DeleteAgentsRequest: + additionalProperties: false + properties: + agent_id: + type: string + required: + - agent_id + type: object + DeleteAgentsSessionRequest: + additionalProperties: false + properties: + agent_id: + type: string + session_id: + type: string + required: + - agent_id + - session_id + type: object + DeleteDatasetRequest: + additionalProperties: false + properties: + dataset_uuid: + type: string + required: + - dataset_uuid + type: object + DeleteDocumentsRequest: + additionalProperties: false + properties: + bank_id: + type: string + document_ids: + items: + type: string + type: array + required: + - bank_id + - document_ids + type: object + DialogGenerations: + additionalProperties: false + properties: + dialog: + items: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/SystemMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + - $ref: '#/components/schemas/CompletionMessage' + type: array + sampled_generations: + items: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/SystemMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + - $ref: '#/components/schemas/CompletionMessage' + type: array + required: + - dialog + - sampled_generations + type: object + DoraFinetuningConfig: + additionalProperties: false + properties: + alpha: + type: integer + apply_lora_to_mlp: + type: boolean + apply_lora_to_output: + type: boolean + lora_attn_modules: + items: + type: string + type: array + rank: + type: integer + required: + - lora_attn_modules + - apply_lora_to_mlp + - apply_lora_to_output + - rank + - alpha + type: object + DropMemoryBankRequest: + additionalProperties: false + properties: + bank_id: + type: string + required: + - bank_id + type: object + EmbeddingsRequest: + additionalProperties: false + properties: + contents: + items: + oneOf: + - type: string + - items: + type: string + type: array + type: array + model: + type: string + required: + - model + - contents + type: object + EmbeddingsResponse: + additionalProperties: false + properties: + embeddings: + items: + items: + type: number + type: array + type: array + required: + - embeddings + type: object + EvaluateQuestionAnsweringRequest: + additionalProperties: false + properties: + metrics: + items: + enum: + - em + - f1 + type: string + type: array + required: + - metrics + type: object + EvaluateSummarizationRequest: + additionalProperties: false + properties: + metrics: + items: + enum: + - rouge + - bleu + type: string + type: array + required: + - metrics + type: object + EvaluateTextGenerationRequest: + additionalProperties: false + properties: + metrics: + items: + enum: + - perplexity + - rouge + - bleu + type: string + type: array + required: + - metrics + type: object + EvaluationJob: + additionalProperties: false + properties: + job_uuid: + type: string + required: + - job_uuid + type: object + EvaluationJobArtifactsResponse: + additionalProperties: false + properties: + job_uuid: + type: string + required: + - job_uuid + title: Artifacts of a evaluation job. + type: object + EvaluationJobLogStream: + additionalProperties: false + properties: + job_uuid: + type: string + required: + - job_uuid + type: object + EvaluationJobStatusResponse: + additionalProperties: false + properties: + job_uuid: + type: string + required: + - job_uuid + type: object + FinetuningAlgorithm: + enum: + - full + - lora + - qlora + - dora + type: string + FunctionCallToolDefinition: + additionalProperties: false + properties: + description: + type: string + function_name: + type: string + input_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + output_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + parameters: + additionalProperties: + $ref: '#/components/schemas/ToolParamDefinition' + type: object + remote_execution: + $ref: '#/components/schemas/RestAPIExecutionConfig' + type: + const: function_call + type: string + required: + - type + - function_name + - description + - parameters + type: object + GetAgentsSessionRequest: + additionalProperties: false + properties: + turn_ids: + items: + type: string + type: array + type: object + GetDocumentsRequest: + additionalProperties: false + properties: + document_ids: + items: + type: string + type: array + required: + - document_ids + type: object + InferenceStep: + additionalProperties: false + properties: + completed_at: + format: date-time + type: string + model_response: + $ref: '#/components/schemas/CompletionMessage' + started_at: + format: date-time + type: string + step_id: + type: string + step_type: + const: inference + type: string + turn_id: + type: string + required: + - turn_id + - step_id + - step_type + - model_response + type: object + InsertDocumentsRequest: + additionalProperties: false + properties: + bank_id: + type: string + documents: + items: + $ref: '#/components/schemas/MemoryBankDocument' + type: array + ttl_seconds: + type: integer + required: + - bank_id + - documents + type: object + LogEventRequest: + additionalProperties: false + properties: + event: + oneOf: + - $ref: '#/components/schemas/UnstructuredLogEvent' + - $ref: '#/components/schemas/MetricEvent' + - $ref: '#/components/schemas/StructuredLogEvent' + required: + - event + type: object + LogSeverity: + enum: + - verbose + - debug + - info + - warn + - error + - critical + type: string + LoraFinetuningConfig: + additionalProperties: false + properties: + alpha: + type: integer + apply_lora_to_mlp: + type: boolean + apply_lora_to_output: + type: boolean + lora_attn_modules: + items: + type: string + type: array + rank: + type: integer + required: + - lora_attn_modules + - apply_lora_to_mlp + - apply_lora_to_output + - rank + - alpha + type: object + MemoryBank: + additionalProperties: false + properties: + bank_id: + type: string + config: + oneOf: + - additionalProperties: false + properties: + chunk_size_in_tokens: + type: integer + embedding_model: + type: string + overlap_size_in_tokens: + type: integer + type: + const: vector + type: string + required: + - type + - embedding_model + - chunk_size_in_tokens + type: object + - additionalProperties: false + properties: + type: + const: keyvalue + type: string + required: + - type + type: object + - additionalProperties: false + properties: + type: + const: keyword + type: string + required: + - type + type: object + - additionalProperties: false + properties: + type: + const: graph + type: string + required: + - type + type: object + name: + type: string + url: + $ref: '#/components/schemas/URL' + required: + - bank_id + - name + - config + type: object + MemoryBankDocument: + additionalProperties: false + properties: + content: + oneOf: + - type: string + - items: + type: string + type: array + - $ref: '#/components/schemas/URL' + document_id: + type: string + metadata: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + mime_type: + type: string + required: + - document_id + - content + - metadata + type: object + MemoryRetrievalStep: + additionalProperties: false + properties: + completed_at: + format: date-time + type: string + inserted_context: + oneOf: + - type: string + - items: + type: string + type: array + memory_bank_ids: + items: + type: string + type: array + started_at: + format: date-time + type: string + step_id: + type: string + step_type: + const: memory_retrieval + type: string + turn_id: + type: string + required: + - turn_id + - step_id + - step_type + - memory_bank_ids + - inserted_context + type: object + MetricEvent: + additionalProperties: false + properties: + attributes: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + metric: + type: string + span_id: + type: string + timestamp: + format: date-time + type: string + trace_id: + type: string + type: + const: metric + type: string + unit: + type: string + value: + oneOf: + - type: integer + - type: number + required: + - trace_id + - span_id + - timestamp + - type + - metric + - value + - unit + type: object + OnViolationAction: + enum: + - 0 + - 1 + - 2 + type: integer + OptimizerConfig: + additionalProperties: false + properties: + lr: + type: number + lr_min: + type: number + optimizer_type: + enum: + - adam + - adamw + - sgd + type: string + weight_decay: + type: number + required: + - optimizer_type + - lr + - lr_min + - weight_decay + type: object + PhotogenToolDefinition: + additionalProperties: false + properties: + input_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + output_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + remote_execution: + $ref: '#/components/schemas/RestAPIExecutionConfig' + type: + const: photogen + type: string + required: + - type + type: object + PostTrainingJob: + additionalProperties: false + properties: + job_uuid: + type: string + required: + - job_uuid + type: object + PostTrainingJobArtifactsResponse: + additionalProperties: false + properties: + checkpoints: + items: + $ref: '#/components/schemas/Checkpoint' + type: array + job_uuid: + type: string + required: + - job_uuid + - checkpoints + title: Artifacts of a finetuning job. + type: object + PostTrainingJobLogStream: + additionalProperties: false + properties: + job_uuid: + type: string + log_lines: + items: + type: string + type: array + required: + - job_uuid + - log_lines + title: Stream of logs from a finetuning job. + type: object + PostTrainingJobStatus: + enum: + - running + - completed + - failed + - scheduled + type: string + PostTrainingJobStatusResponse: + additionalProperties: false + properties: + checkpoints: + items: + $ref: '#/components/schemas/Checkpoint' + type: array + completed_at: + format: date-time + type: string + job_uuid: + type: string + resources_allocated: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + scheduled_at: + format: date-time + type: string + started_at: + format: date-time + type: string + status: + $ref: '#/components/schemas/PostTrainingJobStatus' + required: + - job_uuid + - status + - checkpoints + title: Status of a finetuning job. + type: object + PreferenceOptimizeRequest: + additionalProperties: false + properties: + algorithm: + $ref: '#/components/schemas/RLHFAlgorithm' + algorithm_config: + $ref: '#/components/schemas/DPOAlignmentConfig' + dataset: + $ref: '#/components/schemas/TrainEvalDataset' + finetuned_model: + $ref: '#/components/schemas/URL' + hyperparam_search_config: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + job_uuid: + type: string + logger_config: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + optimizer_config: + $ref: '#/components/schemas/OptimizerConfig' + training_config: + $ref: '#/components/schemas/TrainingConfig' + validation_dataset: + $ref: '#/components/schemas/TrainEvalDataset' + required: + - job_uuid + - finetuned_model + - dataset + - validation_dataset + - algorithm + - algorithm_config + - optimizer_config + - training_config + - hyperparam_search_config + - logger_config + type: object + QLoraFinetuningConfig: + additionalProperties: false + properties: + alpha: + type: integer + apply_lora_to_mlp: + type: boolean + apply_lora_to_output: + type: boolean + lora_attn_modules: + items: + type: string + type: array + rank: + type: integer + required: + - lora_attn_modules + - apply_lora_to_mlp + - apply_lora_to_output + - rank + - alpha + type: object + QueryDocumentsRequest: + additionalProperties: false + properties: + bank_id: + type: string + params: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + query: + oneOf: + - type: string + - items: + type: string + type: array + required: + - bank_id + - query + type: object + QueryDocumentsResponse: + additionalProperties: false + properties: + chunks: + items: + additionalProperties: false + properties: + content: + oneOf: + - type: string + - items: + type: string + type: array + document_id: + type: string + token_count: + type: integer + required: + - content + - token_count + - document_id + type: object + type: array + scores: + items: + type: number + type: array + required: + - chunks + - scores + type: object + RLHFAlgorithm: + enum: + - dpo + type: string + RestAPIExecutionConfig: + additionalProperties: false + properties: + body: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + headers: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + method: + $ref: '#/components/schemas/RestAPIMethod' + params: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + url: + $ref: '#/components/schemas/URL' + required: + - url + - method + type: object + RestAPIMethod: + enum: + - GET + - POST + - PUT + - DELETE + type: string + RewardScoreRequest: + additionalProperties: false + properties: + dialog_generations: + items: + $ref: '#/components/schemas/DialogGenerations' + type: array + model: + type: string + required: + - dialog_generations + - model + type: object + RewardScoringResponse: + additionalProperties: false + properties: + scored_generations: + items: + $ref: '#/components/schemas/ScoredDialogGenerations' + type: array + required: + - scored_generations + title: Response from the reward scoring. Batch of (prompt, response, score) + tuples that pass the threshold. + type: object + RunShieldResponse: + additionalProperties: false + properties: + responses: + items: + $ref: '#/components/schemas/ShieldResponse' + type: array + required: + - responses + type: object + RunShieldsRequest: + additionalProperties: false + properties: + messages: + items: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/SystemMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + - $ref: '#/components/schemas/CompletionMessage' + type: array + shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + required: + - messages + - shields + type: object + SamplingParams: + additionalProperties: false + properties: + max_tokens: + type: integer + repetition_penalty: + type: number + strategy: + $ref: '#/components/schemas/SamplingStrategy' + temperature: + type: number + top_k: + type: integer + top_p: + type: number + required: + - strategy + type: object + SamplingStrategy: + enum: + - greedy + - top_p + - top_k + type: string + ScoredDialogGenerations: + additionalProperties: false + properties: + dialog: + items: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/SystemMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + - $ref: '#/components/schemas/CompletionMessage' + type: array + scored_generations: + items: + $ref: '#/components/schemas/ScoredMessage' + type: array + required: + - dialog + - scored_generations + type: object + ScoredMessage: + additionalProperties: false + properties: + message: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/SystemMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + - $ref: '#/components/schemas/CompletionMessage' + score: + type: number + required: + - message + - score + type: object + SearchToolDefinition: + additionalProperties: false + properties: + api_key: + type: string + engine: + enum: + - bing + - brave + type: string + input_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + output_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + remote_execution: + $ref: '#/components/schemas/RestAPIExecutionConfig' + type: + const: brave_search + type: string + required: + - type + - api_key + - engine + type: object + Session: + additionalProperties: false + properties: + memory_bank: + $ref: '#/components/schemas/MemoryBank' + session_id: + type: string + session_name: + type: string + started_at: + format: date-time + type: string + turns: + items: + $ref: '#/components/schemas/Turn' + type: array + required: + - session_id + - session_name + - turns + - started_at + title: A single session of an interaction with an Agentic System. + type: object + ShieldCallStep: + additionalProperties: false + properties: + completed_at: + format: date-time + type: string + response: + $ref: '#/components/schemas/ShieldResponse' + started_at: + format: date-time + type: string + step_id: + type: string + step_type: + const: shield_call + type: string + turn_id: + type: string + required: + - turn_id + - step_id + - step_type + - response + type: object + ShieldDefinition: + additionalProperties: false + properties: + description: + type: string + execution_config: + $ref: '#/components/schemas/RestAPIExecutionConfig' + on_violation_action: + $ref: '#/components/schemas/OnViolationAction' + parameters: + additionalProperties: + $ref: '#/components/schemas/ToolParamDefinition' + type: object + shield_type: + oneOf: + - $ref: '#/components/schemas/BuiltinShield' + - type: string + required: + - shield_type + - on_violation_action + type: object + ShieldResponse: + additionalProperties: false + properties: + is_violation: + type: boolean + shield_type: + oneOf: + - $ref: '#/components/schemas/BuiltinShield' + - type: string + violation_return_message: + type: string + violation_type: + type: string + required: + - shield_type + - is_violation + type: object + SpanEndPayload: + additionalProperties: false + properties: + status: + $ref: '#/components/schemas/SpanStatus' + type: + const: span_end + type: string + required: + - type + - status + type: object + SpanStartPayload: + additionalProperties: false + properties: + name: + type: string + parent_span_id: + type: string + type: + const: span_start + type: string + required: + - type + - name + type: object + SpanStatus: + enum: + - ok + - error + type: string + StopReason: + enum: + - end_of_turn + - end_of_message + - out_of_tokens + type: string + StructuredLogEvent: + additionalProperties: false + properties: + attributes: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + payload: + oneOf: + - $ref: '#/components/schemas/SpanStartPayload' + - $ref: '#/components/schemas/SpanEndPayload' + span_id: + type: string + timestamp: + format: date-time + type: string + trace_id: + type: string + type: + const: structured_log + type: string + required: + - trace_id + - span_id + - timestamp + - type + - payload + type: object + SupervisedFineTuneRequest: + additionalProperties: false + properties: + algorithm: + $ref: '#/components/schemas/FinetuningAlgorithm' + algorithm_config: + oneOf: + - $ref: '#/components/schemas/LoraFinetuningConfig' + - $ref: '#/components/schemas/QLoraFinetuningConfig' + - $ref: '#/components/schemas/DoraFinetuningConfig' + dataset: + $ref: '#/components/schemas/TrainEvalDataset' + hyperparam_search_config: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + job_uuid: + type: string + logger_config: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + model: + type: string + optimizer_config: + $ref: '#/components/schemas/OptimizerConfig' + training_config: + $ref: '#/components/schemas/TrainingConfig' + validation_dataset: + $ref: '#/components/schemas/TrainEvalDataset' + required: + - job_uuid + - model + - dataset + - validation_dataset + - algorithm + - algorithm_config + - optimizer_config + - training_config + - hyperparam_search_config + - logger_config + type: object + SyntheticDataGenerateRequest: + additionalProperties: false + properties: + dialogs: + items: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/SystemMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + - $ref: '#/components/schemas/CompletionMessage' + type: array + filtering_function: + enum: + - none + - random + - top_k + - top_p + - top_k_top_p + - sigmoid + title: The type of filtering function. + type: string + model: + type: string + required: + - dialogs + - filtering_function + type: object + SyntheticDataGenerationResponse: + additionalProperties: false + properties: + statistics: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + synthetic_data: + items: + $ref: '#/components/schemas/ScoredDialogGenerations' + type: array + required: + - synthetic_data + title: Response from the synthetic data generation. Batch of (prompt, response, + score) tuples that pass the threshold. + type: object + SystemMessage: + additionalProperties: false + properties: + content: + oneOf: + - type: string + - items: + type: string + type: array + role: + const: system + type: string + required: + - role + - content + type: object + TokenLogProbs: + additionalProperties: false + properties: + logprobs_by_token: + additionalProperties: + type: number + type: object + required: + - logprobs_by_token + type: object + ToolCall: + additionalProperties: false + properties: + arguments: + additionalProperties: + oneOf: + - type: string + - type: integer + - type: number + - type: boolean + - type: 'null' + - items: + oneOf: + - type: string + - type: integer + - type: number + - type: boolean + - type: 'null' + type: array + - additionalProperties: + oneOf: + - type: string + - type: integer + - type: number + - type: boolean + - type: 'null' + type: object + type: object + call_id: + type: string + tool_name: + oneOf: + - $ref: '#/components/schemas/BuiltinTool' + - type: string + required: + - call_id + - tool_name + - arguments + type: object + ToolCallDelta: + additionalProperties: false + properties: + content: + oneOf: + - type: string + - $ref: '#/components/schemas/ToolCall' + parse_status: + $ref: '#/components/schemas/ToolCallParseStatus' + required: + - content + - parse_status + type: object + ToolCallParseStatus: + enum: + - started + - in_progress + - failure + - success + type: string + ToolChoice: + enum: + - auto + - required + type: string + ToolDefinition: + additionalProperties: false + properties: + description: + type: string + parameters: + additionalProperties: + $ref: '#/components/schemas/ToolParamDefinition' + type: object + tool_name: + oneOf: + - $ref: '#/components/schemas/BuiltinTool' + - type: string + required: + - tool_name + type: object + ToolExecutionStep: + additionalProperties: false + properties: + completed_at: + format: date-time + type: string + started_at: + format: date-time + type: string + step_id: + type: string + step_type: + const: tool_execution + type: string + tool_calls: + items: + $ref: '#/components/schemas/ToolCall' + type: array + tool_responses: + items: + $ref: '#/components/schemas/ToolResponse' + type: array + turn_id: + type: string + required: + - turn_id + - step_id + - step_type + - tool_calls + - tool_responses + type: object + ToolParamDefinition: + additionalProperties: false + properties: + description: + type: string + param_type: + type: string + required: + type: boolean + required: + - param_type + type: object + ToolPromptFormat: + description: "`json` --\n Refers to the json format for calling tools.\n\ + \ The json format takes the form like\n {\n \"type\": \"function\"\ + ,\n \"function\" : {\n \"name\": \"function_name\",\n \ + \ \"description\": \"function_description\",\n \"parameters\"\ + : {...}\n }\n }\n\n`function_tag` --\n This is an example of\ + \ how you could define\n your own user defined format for making tool calls.\n\ + \ The function_tag format looks like this,\n (parameters)\n\ + \nThe detailed prompts for each of these formats are added to llama cli" + enum: + - json + - function_tag + title: This Enum refers to the prompt format for calling custom / zero shot + tools + type: string + ToolResponse: + additionalProperties: false + properties: + call_id: + type: string + content: + oneOf: + - type: string + - items: + type: string + type: array + tool_name: + oneOf: + - $ref: '#/components/schemas/BuiltinTool' + - type: string + required: + - call_id + - tool_name + - content + type: object + ToolResponseMessage: + additionalProperties: false + properties: + call_id: + type: string + content: + oneOf: + - type: string + - items: + type: string + type: array + role: + const: ipython + type: string + tool_name: + oneOf: + - $ref: '#/components/schemas/BuiltinTool' + - type: string + required: + - role + - call_id + - tool_name + - content + type: object + Trace: + additionalProperties: false + properties: + end_time: + format: date-time + type: string + root_span_id: + type: string + start_time: + format: date-time + type: string + trace_id: + type: string + required: + - trace_id + - root_span_id + - start_time + type: object + TrainEvalDataset: + additionalProperties: false + properties: + columns: + additionalProperties: + $ref: '#/components/schemas/TrainEvalDatasetColumnType' + type: object + content_url: + $ref: '#/components/schemas/URL' + metadata: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + required: + - columns + - content_url + title: Dataset to be used for training or evaluating language models. + type: object + TrainEvalDatasetColumnType: + enum: + - dialog + - text + - media + - number + - json + type: string + TrainingConfig: + additionalProperties: false + properties: + batch_size: + type: integer + enable_activation_checkpointing: + type: boolean + fsdp_cpu_offload: + type: boolean + memory_efficient_fsdp_wrap: + type: boolean + n_epochs: + type: integer + n_iters: + type: integer + shuffle: + type: boolean + required: + - n_epochs + - batch_size + - shuffle + - n_iters + - enable_activation_checkpointing + - memory_efficient_fsdp_wrap + - fsdp_cpu_offload + type: object + Turn: + additionalProperties: false + properties: + completed_at: + format: date-time + type: string + input_messages: + items: + oneOf: + - $ref: '#/components/schemas/UserMessage' + - $ref: '#/components/schemas/ToolResponseMessage' + type: array + output_attachments: + items: + $ref: '#/components/schemas/Attachment' + type: array + output_message: + $ref: '#/components/schemas/CompletionMessage' + session_id: + type: string + started_at: + format: date-time + type: string + steps: + items: + oneOf: + - $ref: '#/components/schemas/InferenceStep' + - $ref: '#/components/schemas/ToolExecutionStep' + - $ref: '#/components/schemas/ShieldCallStep' + - $ref: '#/components/schemas/MemoryRetrievalStep' + type: array + turn_id: + type: string + required: + - turn_id + - session_id + - input_messages + - steps + - output_message + - output_attachments + - started_at + title: A single turn in an interaction with an Agentic System. + type: object + URL: + format: uri + pattern: ^(https?://|file://|data:) + type: string + UnstructuredLogEvent: + additionalProperties: false + properties: + attributes: + additionalProperties: + oneOf: + - type: 'null' + - type: boolean + - type: number + - type: string + - type: array + - type: object + type: object + message: + type: string + severity: + $ref: '#/components/schemas/LogSeverity' + span_id: + type: string + timestamp: + format: date-time + type: string + trace_id: + type: string + type: + const: unstructured_log + type: string + required: + - trace_id + - span_id + - timestamp + - type + - message + - severity + type: object + UpdateDocumentsRequest: + additionalProperties: false + properties: + bank_id: + type: string + documents: + items: + $ref: '#/components/schemas/MemoryBankDocument' + type: array + required: + - bank_id + - documents + type: object + UserMessage: + additionalProperties: false + properties: + content: + oneOf: + - type: string + - items: + type: string + type: array + context: + oneOf: + - type: string + - items: + type: string + type: array + role: + const: user + type: string + required: + - role + - content + type: object + WolframAlphaToolDefinition: + additionalProperties: false + properties: + api_key: + type: string + input_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + output_shields: + items: + $ref: '#/components/schemas/ShieldDefinition' + type: array + remote_execution: + $ref: '#/components/schemas/RestAPIExecutionConfig' + type: + const: wolfram_alpha + type: string + required: + - type + - api_key + type: object +info: + description: "This is the specification of the llama stack that provides\n \ + \ a set of endpoints and their corresponding interfaces that are tailored\ + \ to\n best leverage Llama Models. The specification is still in\ + \ draft and subject to change.\n Generated at 2024-09-18 19:27:39.955190" + title: '[DRAFT] Llama Stack Specification' + version: 0.0.1 +jsonSchemaDialect: https://json-schema.org/draft/2020-12/schema +openapi: 3.1.0 +paths: + /agents/create: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/CreateAgentRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/AgentCreateResponse' + description: OK + tags: + - Agents + /agents/delete: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteAgentsRequest' + required: true + responses: + '200': + description: OK + tags: + - Agents + /agents/session/create: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/CreateAgentSessionRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/AgentSessionCreateResponse' + description: OK + tags: + - Agents + /agents/session/delete: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteAgentsSessionRequest' + required: true + responses: + '200': + description: OK + tags: + - Agents + /agents/session/get: + post: + parameters: + - in: query + name: agent_id + required: true + schema: + type: string + - in: query + name: session_id + required: true + schema: + type: string + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/GetAgentsSessionRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/Session' + description: OK + tags: + - Agents + /agents/step/get: + get: + parameters: + - in: query + name: agent_id + required: true + schema: + type: string + - in: query + name: turn_id + required: true + schema: + type: string + - in: query + name: step_id + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/AgentStepResponse' + description: OK + tags: + - Agents + /agents/turn/create: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/CreateAgentTurnRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/AgentTurnResponseStreamChunk' + description: OK + tags: + - Agents + /agents/turn/get: + get: + parameters: + - in: query + name: agent_id + required: true + schema: + type: string + - in: query + name: turn_id + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/Turn' + description: OK + tags: + - Agents + /batch_inference/chat_completion: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/BatchChatCompletionRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/BatchChatCompletionResponse' + description: OK + tags: + - BatchInference + /batch_inference/completion: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/BatchCompletionRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/BatchCompletionResponse' + description: OK + tags: + - BatchInference + /datasets/create: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/CreateDatasetRequest' + required: true + responses: + '200': + description: OK + tags: + - Datasets + /datasets/delete: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteDatasetRequest' + required: true + responses: + '200': + description: OK + tags: + - Datasets + /datasets/get: + get: + parameters: + - in: query + name: dataset_uuid + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/TrainEvalDataset' + description: OK + tags: + - Datasets + /evaluate/job/artifacts: + get: + parameters: + - in: query + name: job_uuid + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluationJobArtifactsResponse' + description: OK + tags: + - Evaluations + /evaluate/job/cancel: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/CancelEvaluationJobRequest' + required: true + responses: + '200': + description: OK + tags: + - Evaluations + /evaluate/job/logs: + get: + parameters: + - in: query + name: job_uuid + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluationJobLogStream' + description: OK + tags: + - Evaluations + /evaluate/job/status: + get: + parameters: + - in: query + name: job_uuid + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluationJobStatusResponse' + description: OK + tags: + - Evaluations + /evaluate/jobs: + get: + parameters: [] + responses: + '200': + content: + application/jsonl: + schema: + $ref: '#/components/schemas/EvaluationJob' + description: OK + tags: + - Evaluations + /evaluate/question_answering/: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluateQuestionAnsweringRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluationJob' + description: OK + tags: + - Evaluations + /evaluate/summarization/: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluateSummarizationRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluationJob' + description: OK + tags: + - Evaluations + /evaluate/text_generation/: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluateTextGenerationRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/EvaluationJob' + description: OK + tags: + - Evaluations + /inference/chat_completion: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/ChatCompletionRequest' + required: true + responses: + '200': + content: + text/event-stream: + schema: + oneOf: + - $ref: '#/components/schemas/ChatCompletionResponse' + - $ref: '#/components/schemas/ChatCompletionResponseStreamChunk' + description: Chat completion response. **OR** SSE-stream of these events. + tags: + - Inference + /inference/completion: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/CompletionRequest' + required: true + responses: + '200': + content: + application/json: + schema: + oneOf: + - $ref: '#/components/schemas/CompletionResponse' + - $ref: '#/components/schemas/CompletionResponseStreamChunk' + description: Completion response. **OR** streamed completion response. + tags: + - Inference + /inference/embeddings: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/EmbeddingsRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/EmbeddingsResponse' + description: OK + tags: + - Inference + /memory_bank/documents/delete: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/DeleteDocumentsRequest' + required: true + responses: + '200': + description: OK + tags: + - Memory + /memory_bank/documents/get: + post: + parameters: + - in: query + name: bank_id + required: true + schema: + type: string + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/GetDocumentsRequest' + required: true + responses: + '200': + content: + application/jsonl: + schema: + $ref: '#/components/schemas/MemoryBankDocument' + description: OK + tags: + - Memory + /memory_bank/insert: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/InsertDocumentsRequest' + required: true + responses: + '200': + description: OK + tags: + - Memory + /memory_bank/query: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/QueryDocumentsRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/QueryDocumentsResponse' + description: OK + tags: + - Memory + /memory_bank/update: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/UpdateDocumentsRequest' + required: true + responses: + '200': + description: OK + tags: + - Memory + /memory_banks/create: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/CreateMemoryBankRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/MemoryBank' + description: OK + tags: + - Memory + /memory_banks/drop: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/DropMemoryBankRequest' + required: true + responses: + '200': + content: + application/json: + schema: + type: string + description: OK + tags: + - Memory + /memory_banks/get: + get: + parameters: + - in: query + name: bank_id + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + oneOf: + - $ref: '#/components/schemas/MemoryBank' + - type: 'null' + description: OK + tags: + - Memory + /memory_banks/list: + get: + parameters: [] + responses: + '200': + content: + application/jsonl: + schema: + $ref: '#/components/schemas/MemoryBank' + description: OK + tags: + - Memory + /post_training/job/artifacts: + get: + parameters: + - in: query + name: job_uuid + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/PostTrainingJobArtifactsResponse' + description: OK + tags: + - PostTraining + /post_training/job/cancel: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/CancelTrainingJobRequest' + required: true + responses: + '200': + description: OK + tags: + - PostTraining + /post_training/job/logs: + get: + parameters: + - in: query + name: job_uuid + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/PostTrainingJobLogStream' + description: OK + tags: + - PostTraining + /post_training/job/status: + get: + parameters: + - in: query + name: job_uuid + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/PostTrainingJobStatusResponse' + description: OK + tags: + - PostTraining + /post_training/jobs: + get: + parameters: [] + responses: + '200': + content: + application/jsonl: + schema: + $ref: '#/components/schemas/PostTrainingJob' + description: OK + tags: + - PostTraining + /post_training/preference_optimize: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/PreferenceOptimizeRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/PostTrainingJob' + description: OK + tags: + - PostTraining + /post_training/supervised_fine_tune: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/SupervisedFineTuneRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/PostTrainingJob' + description: OK + tags: + - PostTraining + /reward_scoring/score: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/RewardScoreRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/RewardScoringResponse' + description: OK + tags: + - RewardScoring + /safety/run_shields: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/RunShieldsRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/RunShieldResponse' + description: OK + tags: + - Safety + /synthetic_data_generation/generate: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/SyntheticDataGenerateRequest' + required: true + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/SyntheticDataGenerationResponse' + description: OK + tags: + - SyntheticDataGeneration + /telemetry/get_trace: + get: + parameters: + - in: query + name: trace_id + required: true + schema: + type: string + responses: + '200': + content: + application/json: + schema: + $ref: '#/components/schemas/Trace' + description: OK + tags: + - Telemetry + /telemetry/log_event: + post: + parameters: [] + requestBody: + content: + application/json: + schema: + $ref: '#/components/schemas/LogEventRequest' + required: true + responses: + '200': + description: OK + tags: + - Telemetry +security: +- Default: [] +servers: +- url: http://any-hosted-llama-stack.com +tags: +- name: BatchInference +- name: PostTraining +- name: Inference +- name: Safety +- name: RewardScoring +- name: Telemetry +- name: Evaluations +- name: SyntheticDataGeneration +- name: Memory +- name: Agents +- name: Datasets +- description: + name: BuiltinTool +- description: + name: CompletionMessage +- description: + name: SamplingParams +- description: + name: SamplingStrategy +- description: + name: StopReason +- description: + name: SystemMessage +- description: + name: ToolCall +- description: + name: ToolChoice +- description: + name: ToolDefinition +- description: + name: ToolParamDefinition +- description: "This Enum refers to the prompt format for calling custom / zero shot\ + \ tools\n\n`json` --\n Refers to the json format for calling tools.\n The\ + \ json format takes the form like\n {\n \"type\": \"function\",\n \ + \ \"function\" : {\n \"name\": \"function_name\",\n \ + \ \"description\": \"function_description\",\n \"parameters\": {...}\n\ + \ }\n }\n\n`function_tag` --\n This is an example of how you could\ + \ define\n your own user defined format for making tool calls.\n The function_tag\ + \ format looks like this,\n (parameters)\n\ + \nThe detailed prompts for each of these formats are added to llama cli\n\n" + name: ToolPromptFormat +- description: + name: ToolResponseMessage +- description: + name: UserMessage +- description: + name: BatchChatCompletionRequest +- description: + name: BatchChatCompletionResponse +- description: + name: BatchCompletionRequest +- description: + name: BatchCompletionResponse +- description: + name: CancelEvaluationJobRequest +- description: + name: CancelTrainingJobRequest +- description: + name: ChatCompletionRequest +- description: 'Chat completion response. + + + ' + name: ChatCompletionResponse +- description: 'Chat completion response event. + + + ' + name: ChatCompletionResponseEvent +- description: + name: ChatCompletionResponseEventType +- description: 'SSE-stream of these events. + + + ' + name: ChatCompletionResponseStreamChunk +- description: + name: TokenLogProbs +- description: + name: ToolCallDelta +- description: + name: ToolCallParseStatus +- description: + name: CompletionRequest +- description: 'Completion response. + + + ' + name: CompletionResponse +- description: 'streamed completion response. + + + ' + name: CompletionResponseStreamChunk +- description: + name: AgentConfig +- description: + name: BuiltinShield +- description: + name: CodeInterpreterToolDefinition +- description: + name: FunctionCallToolDefinition +- description: + name: OnViolationAction +- description: + name: PhotogenToolDefinition +- description: + name: RestAPIExecutionConfig +- description: + name: RestAPIMethod +- description: + name: SearchToolDefinition +- description: + name: ShieldDefinition +- description: + name: URL +- description: + name: WolframAlphaToolDefinition +- description: + name: CreateAgentRequest +- description: + name: AgentCreateResponse +- description: + name: CreateAgentSessionRequest +- description: + name: AgentSessionCreateResponse +- description: + name: Attachment +- description: + name: CreateAgentTurnRequest +- description: 'Streamed agent execution response. + + + ' + name: AgentTurnResponseEvent +- description: + name: AgentTurnResponseStepCompletePayload +- description: + name: AgentTurnResponseStepProgressPayload +- description: + name: AgentTurnResponseStepStartPayload +- description: + name: AgentTurnResponseStreamChunk +- description: + name: AgentTurnResponseTurnCompletePayload +- description: + name: AgentTurnResponseTurnStartPayload +- description: + name: InferenceStep +- description: + name: MemoryRetrievalStep +- description: + name: ShieldCallStep +- description: + name: ShieldResponse +- description: + name: ToolExecutionStep +- description: + name: ToolResponse +- description: 'A single turn in an interaction with an Agentic System. + + + ' + name: Turn +- description: 'Dataset to be used for training or evaluating language models. + + + ' + name: TrainEvalDataset +- description: + name: TrainEvalDatasetColumnType +- description: + name: CreateDatasetRequest +- description: + name: CreateMemoryBankRequest +- description: + name: MemoryBank +- description: + name: DeleteAgentsRequest +- description: + name: DeleteAgentsSessionRequest +- description: + name: DeleteDatasetRequest +- description: + name: DeleteDocumentsRequest +- description: + name: DropMemoryBankRequest +- description: + name: EmbeddingsRequest +- description: + name: EmbeddingsResponse +- description: + name: EvaluateQuestionAnsweringRequest +- description: + name: EvaluationJob +- description: + name: EvaluateSummarizationRequest +- description: + name: EvaluateTextGenerationRequest +- description: + name: GetAgentsSessionRequest +- description: 'A single session of an interaction with an Agentic System. + + + ' + name: Session +- description: + name: AgentStepResponse +- description: + name: GetDocumentsRequest +- description: + name: MemoryBankDocument +- description: 'Artifacts of a evaluation job. + + + ' + name: EvaluationJobArtifactsResponse +- description: + name: EvaluationJobLogStream +- description: + name: EvaluationJobStatusResponse +- description: + name: Trace +- description: 'Checkpoint created during training runs + + + ' + name: Checkpoint +- description: 'Artifacts of a finetuning job. + + + ' + name: PostTrainingJobArtifactsResponse +- description: 'Stream of logs from a finetuning job. + + + ' + name: PostTrainingJobLogStream +- description: + name: PostTrainingJobStatus +- description: 'Status of a finetuning job. + + + ' + name: PostTrainingJobStatusResponse +- description: + name: PostTrainingJob +- description: + name: InsertDocumentsRequest +- description: + name: LogSeverity +- description: + name: MetricEvent +- description: + name: SpanEndPayload +- description: + name: SpanStartPayload +- description: + name: SpanStatus +- description: + name: StructuredLogEvent +- description: + name: UnstructuredLogEvent +- description: + name: LogEventRequest +- description: + name: DPOAlignmentConfig +- description: + name: OptimizerConfig +- description: + name: RLHFAlgorithm +- description: + name: TrainingConfig +- description: + name: PreferenceOptimizeRequest +- description: + name: QueryDocumentsRequest +- description: + name: QueryDocumentsResponse +- description: + name: DialogGenerations +- description: + name: RewardScoreRequest +- description: 'Response from the reward scoring. Batch of (prompt, response, score) + tuples that pass the threshold. + + + ' + name: RewardScoringResponse +- description: + name: ScoredDialogGenerations +- description: + name: ScoredMessage +- description: + name: RunShieldsRequest +- description: + name: RunShieldResponse +- description: + name: DoraFinetuningConfig +- description: + name: FinetuningAlgorithm +- description: + name: LoraFinetuningConfig +- description: + name: QLoraFinetuningConfig +- description: + name: SupervisedFineTuneRequest +- description: + name: SyntheticDataGenerateRequest +- description: 'Response from the synthetic data generation. Batch of (prompt, response, + score) tuples that pass the threshold. + + + ' + name: SyntheticDataGenerationResponse +- description: + name: UpdateDocumentsRequest +x-tagGroups: +- name: Operations + tags: + - Agents + - BatchInference + - Datasets + - Evaluations + - Inference + - Memory + - PostTraining + - RewardScoring + - Safety + - SyntheticDataGeneration + - Telemetry +- name: Types + tags: + - AgentConfig + - AgentCreateResponse + - AgentSessionCreateResponse + - AgentStepResponse + - AgentTurnResponseEvent + - AgentTurnResponseStepCompletePayload + - AgentTurnResponseStepProgressPayload + - AgentTurnResponseStepStartPayload + - AgentTurnResponseStreamChunk + - AgentTurnResponseTurnCompletePayload + - AgentTurnResponseTurnStartPayload + - Attachment + - BatchChatCompletionRequest + - BatchChatCompletionResponse + - BatchCompletionRequest + - BatchCompletionResponse + - BuiltinShield + - BuiltinTool + - CancelEvaluationJobRequest + - CancelTrainingJobRequest + - ChatCompletionRequest + - ChatCompletionResponse + - ChatCompletionResponseEvent + - ChatCompletionResponseEventType + - ChatCompletionResponseStreamChunk + - Checkpoint + - CodeInterpreterToolDefinition + - CompletionMessage + - CompletionRequest + - CompletionResponse + - CompletionResponseStreamChunk + - CreateAgentRequest + - CreateAgentSessionRequest + - CreateAgentTurnRequest + - CreateDatasetRequest + - CreateMemoryBankRequest + - DPOAlignmentConfig + - DeleteAgentsRequest + - DeleteAgentsSessionRequest + - DeleteDatasetRequest + - DeleteDocumentsRequest + - DialogGenerations + - DoraFinetuningConfig + - DropMemoryBankRequest + - EmbeddingsRequest + - EmbeddingsResponse + - EvaluateQuestionAnsweringRequest + - EvaluateSummarizationRequest + - EvaluateTextGenerationRequest + - EvaluationJob + - EvaluationJobArtifactsResponse + - EvaluationJobLogStream + - EvaluationJobStatusResponse + - FinetuningAlgorithm + - FunctionCallToolDefinition + - GetAgentsSessionRequest + - GetDocumentsRequest + - InferenceStep + - InsertDocumentsRequest + - LogEventRequest + - LogSeverity + - LoraFinetuningConfig + - MemoryBank + - MemoryBankDocument + - MemoryRetrievalStep + - MetricEvent + - OnViolationAction + - OptimizerConfig + - PhotogenToolDefinition + - PostTrainingJob + - PostTrainingJobArtifactsResponse + - PostTrainingJobLogStream + - PostTrainingJobStatus + - PostTrainingJobStatusResponse + - PreferenceOptimizeRequest + - QLoraFinetuningConfig + - QueryDocumentsRequest + - QueryDocumentsResponse + - RLHFAlgorithm + - RestAPIExecutionConfig + - RestAPIMethod + - RewardScoreRequest + - RewardScoringResponse + - RunShieldResponse + - RunShieldsRequest + - SamplingParams + - SamplingStrategy + - ScoredDialogGenerations + - ScoredMessage + - SearchToolDefinition + - Session + - ShieldCallStep + - ShieldDefinition + - ShieldResponse + - SpanEndPayload + - SpanStartPayload + - SpanStatus + - StopReason + - StructuredLogEvent + - SupervisedFineTuneRequest + - SyntheticDataGenerateRequest + - SyntheticDataGenerationResponse + - SystemMessage + - TokenLogProbs + - ToolCall + - ToolCallDelta + - ToolCallParseStatus + - ToolChoice + - ToolDefinition + - ToolExecutionStep + - ToolParamDefinition + - ToolPromptFormat + - ToolResponse + - ToolResponseMessage + - Trace + - TrainEvalDataset + - TrainEvalDatasetColumnType + - TrainingConfig + - Turn + - URL + - UnstructuredLogEvent + - UpdateDocumentsRequest + - UserMessage + - WolframAlphaToolDefinition diff --git a/rfcs/openapi_generator/README.md b/docs/openapi_generator/README.md similarity index 100% rename from rfcs/openapi_generator/README.md rename to docs/openapi_generator/README.md diff --git a/rfcs/openapi_generator/generate.py b/docs/openapi_generator/generate.py similarity index 100% rename from rfcs/openapi_generator/generate.py rename to docs/openapi_generator/generate.py diff --git a/rfcs/openapi_generator/pyopenapi/README.md b/docs/openapi_generator/pyopenapi/README.md similarity index 100% rename from rfcs/openapi_generator/pyopenapi/README.md rename to docs/openapi_generator/pyopenapi/README.md diff --git a/rfcs/openapi_generator/pyopenapi/__init__.py b/docs/openapi_generator/pyopenapi/__init__.py similarity index 100% rename from rfcs/openapi_generator/pyopenapi/__init__.py rename to docs/openapi_generator/pyopenapi/__init__.py diff --git a/rfcs/openapi_generator/pyopenapi/generator.py b/docs/openapi_generator/pyopenapi/generator.py similarity index 100% rename from rfcs/openapi_generator/pyopenapi/generator.py rename to docs/openapi_generator/pyopenapi/generator.py diff --git a/rfcs/openapi_generator/pyopenapi/operations.py b/docs/openapi_generator/pyopenapi/operations.py similarity index 100% rename from rfcs/openapi_generator/pyopenapi/operations.py rename to docs/openapi_generator/pyopenapi/operations.py diff --git a/rfcs/openapi_generator/pyopenapi/options.py b/docs/openapi_generator/pyopenapi/options.py similarity index 100% rename from rfcs/openapi_generator/pyopenapi/options.py rename to docs/openapi_generator/pyopenapi/options.py diff --git a/rfcs/openapi_generator/pyopenapi/specification.py b/docs/openapi_generator/pyopenapi/specification.py similarity index 100% rename from rfcs/openapi_generator/pyopenapi/specification.py rename to docs/openapi_generator/pyopenapi/specification.py diff --git a/rfcs/openapi_generator/pyopenapi/template.html b/docs/openapi_generator/pyopenapi/template.html similarity index 100% rename from rfcs/openapi_generator/pyopenapi/template.html rename to docs/openapi_generator/pyopenapi/template.html diff --git a/rfcs/openapi_generator/pyopenapi/utility.py b/docs/openapi_generator/pyopenapi/utility.py similarity index 100% rename from rfcs/openapi_generator/pyopenapi/utility.py rename to docs/openapi_generator/pyopenapi/utility.py diff --git a/rfcs/openapi_generator/run_openapi_generator.sh b/docs/openapi_generator/run_openapi_generator.sh similarity index 91% rename from rfcs/openapi_generator/run_openapi_generator.sh rename to docs/openapi_generator/run_openapi_generator.sh index 1b2f979cc..ec95948d7 100755 --- a/rfcs/openapi_generator/run_openapi_generator.sh +++ b/docs/openapi_generator/run_openapi_generator.sh @@ -28,4 +28,4 @@ if [ ${#missing_packages[@]} -ne 0 ]; then exit 1 fi -PYTHONPATH=$PYTHONPATH:../.. python -m rfcs.openapi_generator.generate $* +PYTHONPATH=$PYTHONPATH:../.. python -m docs.openapi_generator.generate $* diff --git a/rfcs/RFC-0001-llama-stack-assets/agentic-system.png b/docs/resources/agentic-system.png similarity index 100% rename from rfcs/RFC-0001-llama-stack-assets/agentic-system.png rename to docs/resources/agentic-system.png diff --git a/docs/resources/list-templates.png b/docs/resources/list-templates.png new file mode 100644 index 000000000..5b17641ef Binary files /dev/null and b/docs/resources/list-templates.png differ diff --git a/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.html b/docs/resources/llama-stack-spec.html similarity index 100% rename from rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.html rename to docs/resources/llama-stack-spec.html diff --git a/rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.yaml b/docs/resources/llama-stack-spec.yaml similarity index 100% rename from rfcs/RFC-0001-llama-stack-assets/llama-stack-spec.yaml rename to docs/resources/llama-stack-spec.yaml diff --git a/rfcs/RFC-0001-llama-stack-assets/llama-stack.png b/docs/resources/llama-stack.png similarity index 100% rename from rfcs/RFC-0001-llama-stack-assets/llama-stack.png rename to docs/resources/llama-stack.png diff --git a/rfcs/RFC-0001-llama-stack-assets/model-lifecycle.png b/docs/resources/model-lifecycle.png similarity index 100% rename from rfcs/RFC-0001-llama-stack-assets/model-lifecycle.png rename to docs/resources/model-lifecycle.png diff --git a/llama_stack/apis/memory/client.py b/llama_stack/apis/memory/client.py index d2845326b..7b61cd830 100644 --- a/llama_stack/apis/memory/client.py +++ b/llama_stack/apis/memory/client.py @@ -13,12 +13,12 @@ from typing import Any, Dict, List, Optional import fire import httpx - -from llama_stack.distribution.datatypes import RemoteProviderConfig from termcolor import cprint -from .memory import * # noqa: F403 -from .common.file_utils import data_url_from_file +from llama_stack.distribution.datatypes import RemoteProviderConfig + +from llama_stack.apis.memory import * # noqa: F403 +from llama_stack.providers.utils.memory.file_utils import data_url_from_file async def get_client_impl(config: RemoteProviderConfig, _deps: Any) -> Memory: diff --git a/llama_stack/cli/stack/build.py b/llama_stack/cli/stack/build.py index f35c6ceab..0393fb708 100644 --- a/llama_stack/cli/stack/build.py +++ b/llama_stack/cli/stack/build.py @@ -212,7 +212,7 @@ class StackBuild(Subcommand): providers_for_api = all_providers[api] api_provider = prompt( - "> Enter the API provider for the {} API: (default=meta-reference): ".format( + "> Enter provider for the {} API: (default=meta-reference): ".format( api.value ), validator=Validator.from_callable( diff --git a/llama_stack/cli/stack/configure.py b/llama_stack/cli/stack/configure.py index b0aa1c3ab..37366f620 100644 --- a/llama_stack/cli/stack/configure.py +++ b/llama_stack/cli/stack/configure.py @@ -53,46 +53,61 @@ class StackConfigure(Subcommand): from termcolor import cprint docker_image = None + build_config_file = Path(args.config) + + if build_config_file.exists(): + with open(build_config_file, "r") as f: + build_config = BuildConfig(**yaml.safe_load(f)) + self._configure_llama_distribution(build_config, args.output_dir) + return + + # if we get here, we need to try to find the conda build config file + cprint( + f"Could not find {build_config_file}. Trying conda build name instead...", + color="green", + ) conda_dir = Path(os.getenv("CONDA_PREFIX")).parent / f"llamastack-{args.config}" build_config_file = Path(conda_dir) / f"{args.config}-build.yaml" - if not build_config_file.exists(): - cprint( - f"Could not find {build_config_file}. Trying docker image name instead...", - color="green", - ) - docker_image = args.config + if build_config_file.exists(): + with open(build_config_file, "r") as f: + build_config = BuildConfig(**yaml.safe_load(f)) - builds_dir = BUILDS_BASE_DIR / ImageType.docker.value - if args.output_dir: - builds_dir = Path(output_dir) - os.makedirs(builds_dir, exist_ok=True) + self._configure_llama_distribution(build_config, args.output_dir) + return - script = pkg_resources.resource_filename( - "llama_stack", "distribution/configure_container.sh" - ) - script_args = [script, docker_image, str(builds_dir)] + # if we get here, we need to try to find the docker image + cprint( + f"Could not find {build_config_file}. Trying docker image name instead...", + color="green", + ) + docker_image = args.config + builds_dir = BUILDS_BASE_DIR / ImageType.docker.value + if args.output_dir: + builds_dir = Path(output_dir) + os.makedirs(builds_dir, exist_ok=True) - return_code = run_with_pty(script_args) + script = pkg_resources.resource_filename( + "llama_stack", "distribution/configure_container.sh" + ) + script_args = [script, docker_image, str(builds_dir)] - # we have regenerated the build config file with script, now check if it exists - if return_code != 0: - self.parser.error( - f"Can not find {build_config_file}. Please run llama stack build first or check if docker image exists" - ) + return_code = run_with_pty(script_args) - build_name = docker_image.removeprefix("llamastack-") - saved_file = str(builds_dir / f"{build_name}-run.yaml") - cprint( - f"YAML configuration has been written to {saved_file}. You can now run `llama stack run {saved_file}`", - color="green", + # we have regenerated the build config file with script, now check if it exists + if return_code != 0: + self.parser.error( + f"Failed to configure container {docker_image} with return code {return_code}. Please run `llama stack build first`. " ) return - with open(build_config_file, "r") as f: - build_config = BuildConfig(**yaml.safe_load(f)) - - self._configure_llama_distribution(build_config, args.output_dir) + build_name = docker_image.removeprefix("llamastack-") + saved_file = str(builds_dir / f"{build_name}-run.yaml") + cprint( + f"YAML configuration has been written to {saved_file}. You can now run `llama stack run {saved_file}`", + color="green", + ) + return def _configure_llama_distribution( self, diff --git a/llama_stack/distribution/configure.py b/llama_stack/distribution/configure.py index c54bb27b6..ab1f31de6 100644 --- a/llama_stack/distribution/configure.py +++ b/llama_stack/distribution/configure.py @@ -30,12 +30,8 @@ def make_routing_entry_type(config_class: Any): def configure_api_providers( config: StackRunConfig, spec: DistributionSpec ) -> StackRunConfig: - cprint("Configuring APIs to serve...", "white", attrs=["bold"]) - print("Enter comma-separated list of APIs to serve:") - apis = config.apis_to_serve or list(spec.providers.keys()) config.apis_to_serve = [a for a in apis if a != "telemetry"] - print("") apis = [v.value for v in stack_apis()] all_providers = api_providers() diff --git a/llama_stack/providers/impls/meta_reference/agents/safety.py b/llama_stack/providers/impls/meta_reference/agents/safety.py index f7148ddce..7363fa0b1 100644 --- a/llama_stack/providers/impls/meta_reference/agents/safety.py +++ b/llama_stack/providers/impls/meta_reference/agents/safety.py @@ -7,15 +7,14 @@ from typing import List from llama_models.llama3.api.datatypes import Message, Role, UserMessage +from termcolor import cprint from llama_stack.apis.safety import ( OnViolationAction, - RunShieldRequest, Safety, ShieldDefinition, ShieldResponse, ) -from termcolor import cprint class SafetyException(Exception): # noqa: N818 @@ -45,10 +44,8 @@ class ShieldRunnerMixin: messages[0] = UserMessage(content=messages[0].content) res = await self.safety_api.run_shields( - RunShieldRequest( - messages=messages, - shields=shields, - ) + messages=messages, + shields=shields, ) results = res.responses diff --git a/llama_stack/providers/impls/meta_reference/inference/config.py b/llama_stack/providers/impls/meta_reference/inference/config.py index 27943cb2c..8e3d3ed3c 100644 --- a/llama_stack/providers/impls/meta_reference/inference/config.py +++ b/llama_stack/providers/impls/meta_reference/inference/config.py @@ -11,10 +11,10 @@ from llama_models.datatypes import ModelFamily from llama_models.schema_utils import json_schema_type from llama_models.sku_list import all_registered_models, resolve_model -from llama_stack.apis.inference import QuantizationConfig - from pydantic import BaseModel, Field, field_validator +from llama_stack.apis.inference import QuantizationConfig + @json_schema_type class MetaReferenceImplConfig(BaseModel): @@ -24,7 +24,7 @@ class MetaReferenceImplConfig(BaseModel): ) quantization: Optional[QuantizationConfig] = None torch_seed: Optional[int] = None - max_seq_len: int + max_seq_len: int = 4096 max_batch_size: int = 1 @field_validator("model") diff --git a/requirements.txt b/requirements.txt index 3741f048f..e339bd62c 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,7 +2,8 @@ blobfile fire httpx huggingface-hub -llama-models>=0.0.18 +llama-models>=0.0.19 +prompt-toolkit python-dotenv pydantic requests diff --git a/rfcs/RFC-0001-llama-stack.md b/rfcs/RFC-0001-llama-stack.md index 137b15d11..0968e1c64 100644 --- a/rfcs/RFC-0001-llama-stack.md +++ b/rfcs/RFC-0001-llama-stack.md @@ -21,7 +21,7 @@ Meta releases weights of both the pretrained and instruction fine-tuned Llama mo ### Model Lifecycle -![Figure 1: Model Life Cycle](RFC-0001-llama-stack-assets/model-lifecycle.png) +![Figure 1: Model Life Cycle](../docs/resources/model-lifecycle.png) For each of the operations that need to be performed (e.g. fine tuning, inference, evals etc) during the model life cycle, we identified the capabilities as toolchain APIs that are needed. Some of these capabilities are primitive operations like inference while other capabilities like synthetic data generation are composed of other capabilities. The list of APIs we have identified to support the lifecycle of Llama models is below: @@ -35,7 +35,7 @@ For each of the operations that need to be performed (e.g. fine tuning, inferenc ### Agentic System -![Figure 2: Agentic System](RFC-0001-llama-stack-assets/agentic-system.png) +![Figure 2: Agentic System](../docs/resources/agentic-system.png) In addition to the model lifecycle, we considered the different components involved in an agentic system. Specifically around tool calling and shields. Since the model may decide to call tools, a single model inference call is not enough. What’s needed is an agentic loop consisting of tool calls and inference. The model provides separate tokens representing end-of-message and end-of-turn. A message represents a possible stopping point for execution where the model can inform the execution environment that a tool call needs to be made. The execution environment, upon execution, adds back the result to the context window and makes another inference call. This process can get repeated until an end-of-turn token is generated. Note that as of today, in the OSS world, such a “loop” is often coded explicitly via elaborate prompt engineering using a ReAct pattern (typically) or preconstructed execution graph. Llama 3.1 (and future Llamas) attempts to absorb this multi-step reasoning loop inside the main model itself. @@ -60,12 +60,12 @@ The sequence diagram that details the steps is [here](https://github.com/meta-ll We define the Llama Stack as a layer cake shown below. -![Figure 3: Llama Stack](RFC-0001-llama-stack-assets/llama-stack.png) +![Figure 3: Llama Stack](../docs/resources/llama-stack.png) -The API is defined in the [YAML](RFC-0001-llama-stack-assets/llama-stack-spec.yaml) and [HTML](RFC-0001-llama-stack-assets/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories. +The API is defined in the [YAML](../docs/llama-stack-spec.yaml) and [HTML](../docs/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories. diff --git a/setup.py b/setup.py index 4c10cfbfc..dd72abcde 100644 --- a/setup.py +++ b/setup.py @@ -16,7 +16,7 @@ def read_requirements(): setup( name="llama_stack", - version="0.0.18", + version="0.0.20", author="Meta Llama", author_email="llama-oss@meta.com", description="Llama Stack",