moving rfc->docs

This commit is contained in:
Xi Yan 2024-09-18 16:54:24 -07:00
parent 2c1ad10710
commit 5ec64ac68c
6 changed files with 4 additions and 4 deletions

Binary file not shown.

Before

Width:  |  Height:  |  Size: 128 KiB

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

Binary file not shown.

Before

Width:  |  Height:  |  Size: 71 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

View file

@ -21,7 +21,7 @@ Meta releases weights of both the pretrained and instruction fine-tuned Llama mo
### Model Lifecycle
![Figure 1: Model Life Cycle](RFC-0001-llama-stack-assets/model-lifecycle.png)
![Figure 1: Model Life Cycle](../docs/resources/model-lifecycle.png)
For each of the operations that need to be performed (e.g. fine tuning, inference, evals etc) during the model life cycle, we identified the capabilities as toolchain APIs that are needed. Some of these capabilities are primitive operations like inference while other capabilities like synthetic data generation are composed of other capabilities. The list of APIs we have identified to support the lifecycle of Llama models is below:
@ -35,7 +35,7 @@ For each of the operations that need to be performed (e.g. fine tuning, inferenc
### Agentic System
![Figure 2: Agentic System](RFC-0001-llama-stack-assets/agentic-system.png)
![Figure 2: Agentic System](../docs/resources/agentic-system.png)
In addition to the model lifecycle, we considered the different components involved in an agentic system. Specifically around tool calling and shields. Since the model may decide to call tools, a single model inference call is not enough. Whats needed is an agentic loop consisting of tool calls and inference. The model provides separate tokens representing end-of-message and end-of-turn. A message represents a possible stopping point for execution where the model can inform the execution environment that a tool call needs to be made. The execution environment, upon execution, adds back the result to the context window and makes another inference call. This process can get repeated until an end-of-turn token is generated.
Note that as of today, in the OSS world, such a “loop” is often coded explicitly via elaborate prompt engineering using a ReAct pattern (typically) or preconstructed execution graph. Llama 3.1 (and future Llamas) attempts to absorb this multi-step reasoning loop inside the main model itself.
@ -60,12 +60,12 @@ The sequence diagram that details the steps is [here](https://github.com/meta-ll
We define the Llama Stack as a layer cake shown below.
![Figure 3: Llama Stack](RFC-0001-llama-stack-assets/llama-stack.png)
![Figure 3: Llama Stack](../docs/resources/llama-stack.png)
The API is defined in the [YAML](RFC-0001-llama-stack-assets/llama-stack-spec.yaml) and [HTML](RFC-0001-llama-stack-assets/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories.
The API is defined in the [YAML](../docs/llama-stack-spec.yaml) and [HTML](../docs/llama-stack-spec.html) files. These files were generated using the Pydantic definitions in (api/datatypes.py and api/endpoints.py) files that are in the llama-models, llama-stack, and llama-agentic-system repositories.