added templates and enhanced readme (#307)

Co-authored-by: Justin Lee <justinai@fb.com>
This commit is contained in:
Justin Lee 2024-10-24 17:07:06 -07:00 committed by GitHub
parent 3e1c3fdb3f
commit b6d8246b82
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 293 additions and 136 deletions

77
.github/ISSUE_TEMPLATE/bug.yml vendored Normal file
View file

@ -0,0 +1,77 @@
name: 🐛 Bug Report
description: Create a report to help us reproduce and fix the bug
body:
- type: markdown
attributes:
value: >
#### Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the
existing and past issues](https://github.com/meta-llama/llama-stack/issues).
- type: textarea
id: system-info
attributes:
label: System Info
description: |
Please share your system info with us. You can use the following command to capture your environment information
python -m "torch.utils.collect_env"
placeholder: |
PyTorch version, CUDA version, GPU type, #num of GPUs...
validations:
required: true
- type: checkboxes
id: information-scripts-examples
attributes:
label: Information
description: 'The problem arises when using:'
options:
- label: "The official example scripts"
- label: "My own modified scripts"
- type: textarea
id: bug-description
attributes:
label: 🐛 Describe the bug
description: |
Please provide a clear and concise description of what the bug is.
Please also paste or describe the results you observe instead of the expected results.
placeholder: |
A clear and concise description of what the bug is.
```llama stack
# Command that you used for running the examples
```
Description of the results
validations:
required: true
- type: textarea
attributes:
label: Error logs
description: |
If you observe an error, please paste the error message including the **full** traceback of the exception. It may be relevant to wrap error messages in ```` ```triple quotes blocks``` ````.
placeholder: |
```
The error message you got, with the full traceback.
```
validations:
required: true
- type: textarea
id: expected-behavior
validations:
required: true
attributes:
label: Expected behavior
description: "A clear and concise description of what you would expect to happen."
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!

View file

@ -0,0 +1,31 @@
name: 🚀 Feature request
description: Submit a proposal/request for a new llama-stack feature
body:
- type: textarea
id: feature-pitch
attributes:
label: 🚀 The feature, motivation and pitch
description: >
A clear and concise description of the feature proposal. Please outline the motivation for the proposal. Is your feature request related to a specific problem? e.g., *"I'm working on X and would like Y to be possible"*. If this is related to another GitHub issue, please link here too.
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternatives
description: >
A description of any alternative solutions or features you've considered, if any.
- type: textarea
id: additional-context
attributes:
label: Additional context
description: >
Add any other context or screenshots about the feature request.
- type: markdown
attributes:
value: >
Thanks for contributing 🎉!

31
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View file

@ -0,0 +1,31 @@
# What does this PR do?
Closes # (issue)
## Feature/Issue validation/testing/test plan
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration or test plan.
- [ ] Test A
Logs for Test A
- [ ] Test B
Logs for Test B
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Did you read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?
Thanks for contributing 🎉!

View file

@ -65,23 +65,30 @@ A Distribution is where APIs and Providers are assembled together to provide a c
| Dell-TGI | [Local TGI + Chroma](https://hub.docker.com/repository/docker/llamastack/llamastack-local-tgi-chroma/general) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | | Dell-TGI | [Local TGI + Chroma](https://hub.docker.com/repository/docker/llamastack/llamastack-local-tgi-chroma/general) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
## Installation ## Installation
You can install this repository as a [package](https://pypi.org/project/llama-stack/) with `pip install llama-stack` You have two ways to install this repository:
If you want to install from source: 1. **Install as a package**:
You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command:
```bash
pip install llama-stack
```
```bash 2. **Install from source**:
mkdir -p ~/local If you prefer to install from the source code, follow these steps:
cd ~/local ```bash
git clone git@github.com:meta-llama/llama-stack.git mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-stack.git
conda create -n stack python=3.10 conda create -n stack python=3.10
conda activate stack conda activate stack
cd llama-stack cd llama-stack
$CONDA_PREFIX/bin/pip install -e . $CONDA_PREFIX/bin/pip install -e .
``` ```
## Documentations ## Documentations

View file

@ -5,163 +5,174 @@ This guide will walk you though the steps to get started on end-to-end flow for
## Installation ## Installation
The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package. The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-stack` package.
You can install this repository as a [package](https://pypi.org/project/llama-stack/) with `pip install llama-stack` You have two ways to install this repository:
If you want to install from source: 1. **Install as a package**:
You can install the repository directly from [PyPI](https://pypi.org/project/llama-stack/) by running the following command:
```bash
pip install llama-stack
```
```bash 2. **Install from source**:
mkdir -p ~/local If you prefer to install from the source code, follow these steps:
cd ~/local ```bash
git clone git@github.com:meta-llama/llama-stack.git mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-stack.git
conda create -n stack python=3.10 conda create -n stack python=3.10
conda activate stack conda activate stack
cd llama-stack cd llama-stack
$CONDA_PREFIX/bin/pip install -e . $CONDA_PREFIX/bin/pip install -e .
``` ```
For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md). For what you can do with the Llama CLI, please refer to [CLI Reference](./cli_reference.md).
## Starting Up Llama Stack Server ## Starting Up Llama Stack Server
#### Starting up server via docker
We provide 2 pre-built Docker image of Llama Stack distribution, which can be found in the following links. You have two ways to start up Llama stack server:
- [llamastack-local-gpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-gpu/general)
1. **Starting up server via docker**:
We provide 2 pre-built Docker image of Llama Stack distribution, which can be found in the following links.
- [llamastack-local-gpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-gpu/general)
- This is a packaged version with our local meta-reference implementations, where you will be running inference locally with downloaded Llama model checkpoints. - This is a packaged version with our local meta-reference implementations, where you will be running inference locally with downloaded Llama model checkpoints.
- [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general) - [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general)
- This is a lite version with remote inference where you can hook up to your favourite remote inference framework (e.g. ollama, fireworks, together, tgi) for running inference without GPU. - This is a lite version with remote inference where you can hook up to your favourite remote inference framework (e.g. ollama, fireworks, together, tgi) for running inference without GPU.
> [!NOTE] > [!NOTE]
> For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container. > For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.
``` ```
export LLAMA_CHECKPOINT_DIR=~/.llama export LLAMA_CHECKPOINT_DIR=~/.llama
``` ```
> [!NOTE] > [!NOTE]
> `~/.llama` should be the path containing downloaded weights of Llama models. > `~/.llama` should be the path containing downloaded weights of Llama models.
To download llama models, use To download llama models, use
``` ```
llama download --model-id Llama3.1-8B-Instruct llama download --model-id Llama3.1-8B-Instruct
``` ```
To download and start running a pre-built docker container, you may use the following commands: To download and start running a pre-built docker container, you may use the following commands:
``` ```
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu
``` ```
> [!TIP] > [!TIP]
> Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](../distributions/) to help you get started. > Pro Tip: We may use `docker compose up` for starting up a distribution with remote providers (e.g. TGI) using [llamastack-local-cpu](https://hub.docker.com/repository/docker/llamastack/llamastack-local-cpu/general). You can checkout [these scripts](../distributions/) to help you get started.
#### Build->Configure->Run Llama Stack server via conda
You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack.
**`llama stack build`** 2. **Build->Configure->Run Llama Stack server via conda**:
- You'll be prompted to enter build information interactively.
```
llama stack build
> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-stack You may also build a LlamaStack distribution from scratch, configure it, and start running the distribution. This is useful for developing on LlamaStack.
> Enter the image type you want your distribution to be built with (docker or conda): conda
**`llama stack build`**
- You'll be prompted to enter build information interactively.
```
llama stack build
> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local-stack
> Enter the image type you want your distribution to be built with (docker or conda): conda
Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs. Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
> Enter the API provider for the inference API: (default=meta-reference): meta-reference > Enter the API provider for the inference API: (default=meta-reference): meta-reference
> Enter the API provider for the safety API: (default=meta-reference): meta-reference > Enter the API provider for the safety API: (default=meta-reference): meta-reference
> Enter the API provider for the agents API: (default=meta-reference): meta-reference > Enter the API provider for the agents API: (default=meta-reference): meta-reference
> Enter the API provider for the memory API: (default=meta-reference): meta-reference > Enter the API provider for the memory API: (default=meta-reference): meta-reference
> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference > Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
> (Optional) Enter a short description for your Llama Stack distribution: > (Optional) Enter a short description for your Llama Stack distribution:
Build spec configuration saved at ~/.conda/envs/llamastack-my-local-stack/my-local-stack-build.yaml Build spec configuration saved at ~/.conda/envs/llamastack-my-local-stack/my-local-stack-build.yaml
You can now run `llama stack configure my-local-stack` You can now run `llama stack configure my-local-stack`
``` ```
**`llama stack configure`** **`llama stack configure`**
- Run `llama stack configure <name>` with the name you have previously defined in `build` step. - Run `llama stack configure <name>` with the name you have previously defined in `build` step.
``` ```
llama stack configure <name> llama stack configure <name>
``` ```
- You will be prompted to enter configurations for your Llama Stack - You will be prompted to enter configurations for your Llama Stack
``` ```
$ llama stack configure my-local-stack $ llama stack configure my-local-stack
Could not find my-local-stack. Trying conda build name instead... Could not find my-local-stack. Trying conda build name instead...
Configuring API `inference`... Configuring API `inference`...
=== Configuring provider `meta-reference` for API inference... === Configuring provider `meta-reference` for API inference...
Enter value for model (default: Llama3.1-8B-Instruct) (required): Enter value for model (default: Llama3.1-8B-Instruct) (required):
Do you want to configure quantization? (y/n): n Do you want to configure quantization? (y/n): n
Enter value for torch_seed (optional): Enter value for torch_seed (optional):
Enter value for max_seq_len (default: 4096) (required): Enter value for max_seq_len (default: 4096) (required):
Enter value for max_batch_size (default: 1) (required): Enter value for max_batch_size (default: 1) (required):
Configuring API `safety`... Configuring API `safety`...
=== Configuring provider `meta-reference` for API safety... === Configuring provider `meta-reference` for API safety...
Do you want to configure llama_guard_shield? (y/n): n Do you want to configure llama_guard_shield? (y/n): n
Do you want to configure prompt_guard_shield? (y/n): n Do you want to configure prompt_guard_shield? (y/n): n
Configuring API `agents`... Configuring API `agents`...
=== Configuring provider `meta-reference` for API agents... === Configuring provider `meta-reference` for API agents...
Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite):
Configuring SqliteKVStoreConfig: Configuring SqliteKVStoreConfig:
Enter value for namespace (optional): Enter value for namespace (optional):
Enter value for db_path (default: /home/xiyan/.llama/runtime/kvstore.db) (required): Enter value for db_path (default: /home/xiyan/.llama/runtime/kvstore.db) (required):
Configuring API `memory`... Configuring API `memory`...
=== Configuring provider `meta-reference` for API memory... === Configuring provider `meta-reference` for API memory...
> Please enter the supported memory bank type your provider has for memory: vector > Please enter the supported memory bank type your provider has for memory: vector
Configuring API `telemetry`... Configuring API `telemetry`...
=== Configuring provider `meta-reference` for API telemetry... === Configuring provider `meta-reference` for API telemetry...
> YAML configuration has been written to ~/.llama/builds/conda/my-local-stack-run.yaml. > YAML configuration has been written to ~/.llama/builds/conda/my-local-stack-run.yaml.
You can now run `llama stack run my-local-stack --port PORT` You can now run `llama stack run my-local-stack --port PORT`
``` ```
**`llama stack run`** **`llama stack run`**
- Run `llama stack run <name>` with the name you have previously defined. - Run `llama stack run <name>` with the name you have previously defined.
``` ```
llama stack run my-local-stack llama stack run my-local-stack
... ...
> initializing model parallel with size 1 > initializing model parallel with size 1
> initializing ddp with size 1 > initializing ddp with size 1
> initializing pipeline with size 1 > initializing pipeline with size 1
... ...
Finished model load YES READY Finished model load YES READY
Serving POST /inference/chat_completion Serving POST /inference/chat_completion
Serving POST /inference/completion Serving POST /inference/completion
Serving POST /inference/embeddings Serving POST /inference/embeddings
Serving POST /memory_banks/create Serving POST /memory_banks/create
Serving DELETE /memory_bank/documents/delete Serving DELETE /memory_bank/documents/delete
Serving DELETE /memory_banks/drop Serving DELETE /memory_banks/drop
Serving GET /memory_bank/documents/get Serving GET /memory_bank/documents/get
Serving GET /memory_banks/get Serving GET /memory_banks/get
Serving POST /memory_bank/insert Serving POST /memory_bank/insert
Serving GET /memory_banks/list Serving GET /memory_banks/list
Serving POST /memory_bank/query Serving POST /memory_bank/query
Serving POST /memory_bank/update Serving POST /memory_bank/update
Serving POST /safety/run_shield Serving POST /safety/run_shield
Serving POST /agentic_system/create Serving POST /agentic_system/create
Serving POST /agentic_system/session/create Serving POST /agentic_system/session/create
Serving POST /agentic_system/turn/create Serving POST /agentic_system/turn/create
Serving POST /agentic_system/delete Serving POST /agentic_system/delete
Serving POST /agentic_system/session/delete Serving POST /agentic_system/session/delete
Serving POST /agentic_system/session/get Serving POST /agentic_system/session/get
Serving POST /agentic_system/step/get Serving POST /agentic_system/step/get
Serving POST /agentic_system/turn/get Serving POST /agentic_system/turn/get
Serving GET /telemetry/get_trace Serving GET /telemetry/get_trace
Serving POST /telemetry/log_event Serving POST /telemetry/log_event
Listening on :::5000 Listening on :::5000
INFO: Started server process [587053] INFO: Started server process [587053]
INFO: Waiting for application startup. INFO: Waiting for application startup.
INFO: Application startup complete. INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit) INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
``` ```
## Testing with client ## Testing with client