forked from phoenix-oss/llama-stack-mirror
moving rfc->docs
This commit is contained in:
parent
2c1ad10710
commit
5ec64ac68c
6 changed files with 4 additions and 4 deletions
Before Width: | Height: | Size: 128 KiB After Width: | Height: | Size: 128 KiB |
Before Width: | Height: | Size: 71 KiB After Width: | Height: | Size: 71 KiB |
Before Width: | Height: | Size: 17 KiB After Width: | Height: | Size: 17 KiB |
|
@ -21,7 +21,7 @@ Meta releases weights of both the pretrained and instruction fine-tuned Llama mo
|
|||
|
||||
### Model Lifecycle
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
For each of the operations that need to be performed (e.g. fine tuning, inference, evals etc) during the model life cycle, we identified the capabilities as toolchain APIs that are needed. Some of these capabilities are primitive operations like inference while other capabilities like synthetic data generation are composed of other capabilities. The list of APIs we have identified to support the lifecycle of Llama models is below:
|
||||
|
@ -35,7 +35,7 @@ For each of the operations that need to be performed (e.g. fine tuning, inferenc
|
|||
|
||||
### Agentic System
|
||||
|
||||

|
||||

|
||||
|
||||
In addition to the model lifecycle, we considered the different components involved in an agentic system. Specifically around tool calling and shields. Since the model may decide to call tools, a single model inference call is not enough. What’s needed is an agentic loop consisting of tool calls and inference. The model provides separate tokens representing end-of-message and end-of-turn. A message represents a possible stopping point for execution where the model can inform the execution environment that a tool call needs to be made. The execution environment, upon execution, adds back the result to the context window and makes another inference call. This process can get repeated until an end-of-turn token is generated.
|
||||
Note that as of today, in the OSS world, such a “loop” is often coded explicitly via elaborate prompt engineering using a ReAct pattern (typically) or preconstructed execution graph. Llama 3.1 (and future Llamas) attempts to absorb this multi-step reasoning loop inside the main model itself.
|
||||
|
@ -60,12 +60,12 @@ The sequence diagram that details the steps is [here](https://github.com/meta-ll
|
|||
|
||||
We define the Llama Stack as a layer cake shown below.
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
|
||||
|
||||
The API is defined in the [YAML](RFC-0001-llama-stack-assets/llama-stack-spec.yaml) and [HTML](RFC-0001-llama-stack-assets/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories.
|
||||
The API is defined in the [YAML](../docs/llama-stack-spec.yaml) and [HTML](../docs/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories.
|
||||
|
||||
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue