forked from phoenix-oss/llama-stack-mirror
Rename inline -> local (#24)
* Rename the "inline" distribution to "local" * further rename --------- Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
This commit is contained in:
parent
dd15671f7f
commit
416097a9ea
6 changed files with 19 additions and 33 deletions
|
@ -37,7 +37,7 @@ A provider can also be just a pointer to a remote REST service -- for example, c
|
||||||
|
|
||||||
## Llama Stack Distribution
|
## Llama Stack Distribution
|
||||||
|
|
||||||
A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by inline code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications.
|
A Distribution is where APIs and Providers are assembled together to provide a consistent whole to the end application developer. You can mix-and-match providers -- some could be backed by local code and some could be remote. As a hobbyist, you can serve a small model locally, but can choose a cloud provider for a large model. Regardless, the higher level APIs your app needs to work with don't need to change at all. You can even imagine moving across the server / mobile-device boundary as well always using the same uniform set of APIs for developing Generative AI applications.
|
||||||
|
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
|
@ -208,7 +208,7 @@ $ llama distribution list
|
||||||
+---------------+---------------------------------------------+----------------------------------------------------------------------+
|
+---------------+---------------------------------------------+----------------------------------------------------------------------+
|
||||||
| Spec ID | ProviderSpecs | Description |
|
| Spec ID | ProviderSpecs | Description |
|
||||||
+---------------+---------------------------------------------+----------------------------------------------------------------------+
|
+---------------+---------------------------------------------+----------------------------------------------------------------------+
|
||||||
| inline | { | Use code from `llama_toolchain` itself to serve all llama stack APIs |
|
| local | { | Use code from `llama_toolchain` itself to serve all llama stack APIs |
|
||||||
| | "inference": "meta-reference", | |
|
| | "inference": "meta-reference", | |
|
||||||
| | "safety": "meta-reference", | |
|
| | "safety": "meta-reference", | |
|
||||||
| | "agentic_system": "meta-reference" | |
|
| | "agentic_system": "meta-reference" | |
|
||||||
|
@ -220,7 +220,7 @@ $ llama distribution list
|
||||||
| | "agentic_system": "agentic_system-remote" | |
|
| | "agentic_system": "agentic_system-remote" | |
|
||||||
| | } | |
|
| | } | |
|
||||||
+---------------+---------------------------------------------+----------------------------------------------------------------------+
|
+---------------+---------------------------------------------+----------------------------------------------------------------------+
|
||||||
| ollama-inline | { | Like local-source, but use ollama for running LLM inference |
|
| local-ollama | { | Like local, but use ollama for running LLM inference |
|
||||||
| | "inference": "meta-ollama", | |
|
| | "inference": "meta-ollama", | |
|
||||||
| | "safety": "meta-reference", | |
|
| | "safety": "meta-reference", | |
|
||||||
| | "agentic_system": "meta-reference" | |
|
| | "agentic_system": "meta-reference" | |
|
||||||
|
@ -229,16 +229,16 @@ $ llama distribution list
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
As you can see above, each “spec” details the “providers” that make up that spec. For eg. The inline uses the “meta-reference” provider for inference while the ollama-inline relies on a different provider ( ollama ) for inference.
|
As you can see above, each “spec” details the “providers” that make up that spec. For eg. The `local` spec uses the “meta-reference” provider for inference while the `local-ollama` spec relies on a different provider ( ollama ) for inference.
|
||||||
|
|
||||||
Lets install the fully local implementation of the llama-stack – named `inline` above.
|
Lets install the fully local implementation of the llama-stack – named `local` above.
|
||||||
|
|
||||||
To install a distro, we run a simple command providing 2 inputs –
|
To install a distro, we run a simple command providing 2 inputs –
|
||||||
- **Spec Id** of the distribution that we want to install ( as obtained from the list command )
|
- **Spec Id** of the distribution that we want to install ( as obtained from the list command )
|
||||||
- A **Name** by which this installation will be known locally.
|
- A **Name** by which this installation will be known locally.
|
||||||
|
|
||||||
```
|
```
|
||||||
llama distribution install --spec inline --name inline_llama_8b
|
llama distribution install --spec local --name local_llama_8b
|
||||||
```
|
```
|
||||||
|
|
||||||
This will create a new conda environment (name can be passed optionally) and install dependencies (via pip) as required by the distro.
|
This will create a new conda environment (name can be passed optionally) and install dependencies (via pip) as required by the distro.
|
||||||
|
@ -246,12 +246,12 @@ This will create a new conda environment (name can be passed optionally) and ins
|
||||||
Once it runs successfully , you should see some outputs in the form
|
Once it runs successfully , you should see some outputs in the form
|
||||||
|
|
||||||
```
|
```
|
||||||
$ llama distribution install --spec inline --name inline_llama_8b
|
$ llama distribution install --spec local --name local_llama_8b
|
||||||
....
|
....
|
||||||
....
|
....
|
||||||
Successfully installed cfgv-3.4.0 distlib-0.3.8 identify-2.6.0 libcst-1.4.0 llama_toolchain-0.0.2 moreorless-0.4.0 nodeenv-1.9.1 pre-commit-3.8.0 stdlibs-2024.5.15 toml-0.10.2 tomlkit-0.13.0 trailrunner-1.4.0 ufmt-2.7.0 usort-1.0.8 virtualenv-20.26.3
|
Successfully installed cfgv-3.4.0 distlib-0.3.8 identify-2.6.0 libcst-1.4.0 llama_toolchain-0.0.2 moreorless-0.4.0 nodeenv-1.9.1 pre-commit-3.8.0 stdlibs-2024.5.15 toml-0.10.2 tomlkit-0.13.0 trailrunner-1.4.0 ufmt-2.7.0 usort-1.0.8 virtualenv-20.26.3
|
||||||
|
|
||||||
Distribution `inline_llama_8b` (with spec inline) has been installed successfully!
|
Distribution `local_llama_8b` (with spec local) has been installed successfully!
|
||||||
```
|
```
|
||||||
|
|
||||||
Next step is to configure the distribution that you just installed. We provide a simple CLI tool to enable simple configuration.
|
Next step is to configure the distribution that you just installed. We provide a simple CLI tool to enable simple configuration.
|
||||||
|
@ -260,12 +260,12 @@ It will ask for some details like model name, paths to models, etc.
|
||||||
|
|
||||||
NOTE: You will have to download the models if not done already. Follow instructions here on how to download using the llama cli
|
NOTE: You will have to download the models if not done already. Follow instructions here on how to download using the llama cli
|
||||||
```
|
```
|
||||||
llama distribution configure --name inline_llama_8b
|
llama distribution configure --name local_llama_8b
|
||||||
```
|
```
|
||||||
|
|
||||||
Here is an example screenshot of how the cli will guide you to fill the configuration
|
Here is an example screenshot of how the cli will guide you to fill the configuration
|
||||||
```
|
```
|
||||||
$ llama distribution configure --name inline_llama_8b
|
$ llama distribution configure --name local_llama_8b
|
||||||
|
|
||||||
Configuring API surface: inference
|
Configuring API surface: inference
|
||||||
Enter value for model (required): Meta-Llama3.1-8B-Instruct
|
Enter value for model (required): Meta-Llama3.1-8B-Instruct
|
||||||
|
@ -278,7 +278,7 @@ Do you want to configure llama_guard_shield? (y/n): n
|
||||||
Do you want to configure prompt_guard_shield? (y/n): n
|
Do you want to configure prompt_guard_shield? (y/n): n
|
||||||
Configuring API surface: agentic_system
|
Configuring API surface: agentic_system
|
||||||
|
|
||||||
YAML configuration has been written to ~/.llama/distributions/inline0/config.yaml
|
YAML configuration has been written to ~/.llama/distributions/local0/config.yaml
|
||||||
```
|
```
|
||||||
|
|
||||||
As you can see, we did basic configuration above and configured inference to run on model Meta-Llama3.1-8B-Instruct ( obtained from the llama model list command ).
|
As you can see, we did basic configuration above and configured inference to run on model Meta-Llama3.1-8B-Instruct ( obtained from the llama model list command ).
|
||||||
|
@ -290,12 +290,12 @@ For how these configurations are stored as yaml, checkout the file printed at th
|
||||||
|
|
||||||
Now let’s start the distribution using the cli.
|
Now let’s start the distribution using the cli.
|
||||||
```
|
```
|
||||||
llama distribution start --name inline_llama_8b --port 5000
|
llama distribution start --name local_llama_8b --port 5000
|
||||||
```
|
```
|
||||||
You should see the distribution start and print the APIs that it is supporting,
|
You should see the distribution start and print the APIs that it is supporting,
|
||||||
|
|
||||||
```
|
```
|
||||||
$ llama distribution start --name inline_llama_8b --port 5000
|
$ llama distribution start --name local_llama_8b --port 5000
|
||||||
|
|
||||||
> initializing model parallel with size 1
|
> initializing model parallel with size 1
|
||||||
> initializing ddp with size 1
|
> initializing ddp with size 1
|
||||||
|
@ -329,7 +329,7 @@ Lets test with a client
|
||||||
|
|
||||||
```
|
```
|
||||||
cd /path/to/llama-toolchain
|
cd /path/to/llama-toolchain
|
||||||
conda activate <env-for-distro> # ( Eg. local_inline in above example )
|
conda activate <env-for-distribution> # ( Eg. local_llama_8b in above example )
|
||||||
|
|
||||||
python -m llama_toolchain.inference.client localhost 5000
|
python -m llama_toolchain.inference.client localhost 5000
|
||||||
```
|
```
|
||||||
|
|
|
@ -36,7 +36,7 @@ class DistributionInstall(Subcommand):
|
||||||
self.parser.add_argument(
|
self.parser.add_argument(
|
||||||
"--spec",
|
"--spec",
|
||||||
type=str,
|
type=str,
|
||||||
help="Distribution spec to install (try ollama-inline)",
|
help="Distribution spec to install (try local-ollama)",
|
||||||
required=True,
|
required=True,
|
||||||
choices=[d.spec_id for d in available_distribution_specs()],
|
choices=[d.spec_id for d in available_distribution_specs()],
|
||||||
)
|
)
|
||||||
|
|
|
@ -1,14 +0,0 @@
|
||||||
inference_config:
|
|
||||||
impl_config:
|
|
||||||
impl_type: "inline"
|
|
||||||
checkpoint_config:
|
|
||||||
checkpoint:
|
|
||||||
checkpoint_type: "pytorch"
|
|
||||||
checkpoint_dir: {checkpoint_dir}/
|
|
||||||
tokenizer_path: {checkpoint_dir}/tokenizer.model
|
|
||||||
model_parallel_size: {model_parallel_size}
|
|
||||||
quantization_format: bf16
|
|
||||||
quantization: null
|
|
||||||
torch_seed: null
|
|
||||||
max_seq_len: 16384
|
|
||||||
max_batch_size: 1
|
|
|
@ -96,7 +96,7 @@ ensure_conda_env_python310() {
|
||||||
|
|
||||||
if [ "$#" -ne 3 ]; then
|
if [ "$#" -ne 3 ]; then
|
||||||
echo "Usage: $0 <environment_name> <distribution_name> <pip_dependencies>" >&2
|
echo "Usage: $0 <environment_name> <distribution_name> <pip_dependencies>" >&2
|
||||||
echo "Example: $0 my_env local-inline 'numpy pandas scipy'" >&2
|
echo "Example: $0 my_env local-llama-8b 'numpy pandas scipy'" >&2
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
|
|
@ -28,7 +28,7 @@ def available_distribution_specs() -> List[DistributionSpec]:
|
||||||
providers = api_providers()
|
providers = api_providers()
|
||||||
return [
|
return [
|
||||||
DistributionSpec(
|
DistributionSpec(
|
||||||
spec_id="inline",
|
spec_id="local",
|
||||||
description="Use code from `llama_toolchain` itself to serve all llama stack APIs",
|
description="Use code from `llama_toolchain` itself to serve all llama stack APIs",
|
||||||
provider_specs={
|
provider_specs={
|
||||||
Api.inference: providers[Api.inference]["meta-reference"],
|
Api.inference: providers[Api.inference]["meta-reference"],
|
||||||
|
@ -42,8 +42,8 @@ def available_distribution_specs() -> List[DistributionSpec]:
|
||||||
provider_specs={x: remote_spec(x) for x in providers},
|
provider_specs={x: remote_spec(x) for x in providers},
|
||||||
),
|
),
|
||||||
DistributionSpec(
|
DistributionSpec(
|
||||||
spec_id="ollama-inline",
|
spec_id="local-ollama",
|
||||||
description="Like local-source, but use ollama for running LLM inference",
|
description="Like local, but use ollama for running LLM inference",
|
||||||
provider_specs={
|
provider_specs={
|
||||||
Api.inference: providers[Api.inference]["meta-ollama"],
|
Api.inference: providers[Api.inference]["meta-ollama"],
|
||||||
Api.safety: providers[Api.safety]["meta-reference"],
|
Api.safety: providers[Api.safety]["meta-reference"],
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue