mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-16 09:58:10 +00:00
chore: remove "rfc" directory and move original rfc to "docs" (#2718)
# What does this PR do? the "rfc" directory has only a single document in it, and its the original RFC for creating Llama Stack simply the project directory structure by moving this into the "docs" directory and renaming it to "original_rfc" to preserve the context of the doc ## Why did you do this? A simplified top-level directory structure helps keep the project simpler and prevents misleading new contributors into thinking we use it (we really don't) --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com> Co-authored-by: raghotham <raghotham@gmail.com>
This commit is contained in:
parent
9f04bc6d1a
commit
5fe3027cbf
1 changed files with 6 additions and 4 deletions
|
@ -1,5 +1,7 @@
|
||||||
# The Llama Stack API
|
# The Llama Stack API
|
||||||
|
|
||||||
|
*Originally authored Jul 23, 2024*
|
||||||
|
|
||||||
**Authors:**
|
**Authors:**
|
||||||
|
|
||||||
* Meta: @raghotham, @ashwinb, @hjshah, @jspisak
|
* Meta: @raghotham, @ashwinb, @hjshah, @jspisak
|
||||||
|
@ -24,7 +26,7 @@ Meta releases weights of both the pretrained and instruction fine-tuned Llama mo
|
||||||
|
|
||||||
### Model Lifecycle
|
### Model Lifecycle
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
For each of the operations that need to be performed (e.g. fine tuning, inference, evals etc) during the model life cycle, we identified the capabilities as toolchain APIs that are needed. Some of these capabilities are primitive operations like inference while other capabilities like synthetic data generation are composed of other capabilities. The list of APIs we have identified to support the lifecycle of Llama models is below:
|
For each of the operations that need to be performed (e.g. fine tuning, inference, evals etc) during the model life cycle, we identified the capabilities as toolchain APIs that are needed. Some of these capabilities are primitive operations like inference while other capabilities like synthetic data generation are composed of other capabilities. The list of APIs we have identified to support the lifecycle of Llama models is below:
|
||||||
|
|
||||||
|
@ -37,7 +39,7 @@ For each of the operations that need to be performed (e.g. fine tuning, inferenc
|
||||||
|
|
||||||
### Agentic System
|
### Agentic System
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
In addition to the model lifecycle, we considered the different components involved in an agentic system. Specifically around tool calling and shields. Since the model may decide to call tools, a single model inference call is not enough. What’s needed is an agentic loop consisting of tool calls and inference. The model provides separate tokens representing end-of-message and end-of-turn. A message represents a possible stopping point for execution where the model can inform the execution environment that a tool call needs to be made. The execution environment, upon execution, adds back the result to the context window and makes another inference call. This process can get repeated until an end-of-turn token is generated.
|
In addition to the model lifecycle, we considered the different components involved in an agentic system. Specifically around tool calling and shields. Since the model may decide to call tools, a single model inference call is not enough. What’s needed is an agentic loop consisting of tool calls and inference. The model provides separate tokens representing end-of-message and end-of-turn. A message represents a possible stopping point for execution where the model can inform the execution environment that a tool call needs to be made. The execution environment, upon execution, adds back the result to the context window and makes another inference call. This process can get repeated until an end-of-turn token is generated.
|
||||||
Note that as of today, in the OSS world, such a “loop” is often coded explicitly via elaborate prompt engineering using a ReAct pattern (typically) or preconstructed execution graph. Llama 3.1 (and future Llamas) attempts to absorb this multi-step reasoning loop inside the main model itself.
|
Note that as of today, in the OSS world, such a “loop” is often coded explicitly via elaborate prompt engineering using a ReAct pattern (typically) or preconstructed execution graph. Llama 3.1 (and future Llamas) attempts to absorb this multi-step reasoning loop inside the main model itself.
|
||||||
|
@ -63,9 +65,9 @@ The sequence diagram that details the steps is [here](https://github.com/meta-ll
|
||||||
|
|
||||||
We define the Llama Stack as a layer cake shown below.
|
We define the Llama Stack as a layer cake shown below.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
The API is defined in the [YAML](../docs/_static/llama-stack-spec.yaml) and [HTML](../docs/_static/llama-stack-spec.html) files.
|
The API is defined in the [YAML](_static/llama-stack-spec.yaml) and [HTML](_static/llama-stack-spec.html) files.
|
||||||
|
|
||||||
## Sample implementations
|
## Sample implementations
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue