- Moved environment variable parsing and `setup_logging()` call from
module level to proper initialization points
- Added explicit `setup_logging()` calls in `server.py::create_app()`
and `library_client.py::AsyncLlamaStackAsLibraryClient.__init__()`
Module-level side effects are bad practice and can cause issues with
import order, testing, and circular dependencies. The previous
implementation ran logging setup on every import of the log module,
which is unpredictable and difficult to control.
---------
Co-authored-by: Claude <noreply@anthropic.com>
This change removes the `llama model` and `llama download` subcommands
from the CLI, replacing them with recommendations to use the Hugging
Face CLI instead.
Rationale for this change:
- The model management functionality was largely duplicating what
Hugging Face CLI already provides, leading to unnecessary maintenance
overhead (except the download source from Meta?)
- Maintaining our own implementation required fixing bugs and keeping up
with changes in model repositories and download mechanisms
- The Hugging Face CLI is more mature, widely adopted, and better
maintained
- This allows us to focus on the core Llama Stack functionality rather
than reimplementing model management tools
Changes made:
- Removed all model-related CLI commands and their implementations
- Updated documentation to recommend using `huggingface-cli` for model
downloads
- Removed Meta-specific download logic and statements
- Simplified the CLI to focus solely on stack management operations
Users should now use:
- `huggingface-cli download` for downloading models
- `huggingface-cli scan-cache` for listing downloaded models
This is a breaking change as it removes previously available CLI
commands.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Mainly tried to cover the entire llama_stack/apis directory, we only
have one left. Some excludes were just noop.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
```
before:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...
Welcome to the Llama CLI
options:
-h, --help show this help message and exit
subcommands:
{model,stack,download,verify-download}
$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...
Work with llama models
options:
-h, --help show this help message and exit
model_subcommands:
{download,list,prompt-format,describe,verify-download,remove}
$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...
Operations for the Llama Stack / Distributions
options:
-h, --help show this help message and exit
--version show program's version number and exit
stack_subcommands:
{build,list-apis,list-providers,run}
===================
after:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...
Welcome to the Llama CLI
options:
-h, --help show this help message and exit
subcommands:
{model,stack,download,verify-download}
model Work with llama models
stack Operations for the Llama Stack / Distributions
download Download a model from llama.meta.com or Hugging Face Hub
verify-download Verify integrity of downloaded model files
$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...
Work with llama models
options:
-h, --help show this help message and exit
model_subcommands:
{download,list,prompt-format,describe,verify-download,remove}
download Download a model from llama.meta.com or Hugging Face Hub
list Show available llama models
prompt-format Show llama model message formats
describe Show details about a llama model
verify-download Verify the downloaded checkpoints' checksums for models downloaded from Meta
remove Remove the downloaded llama model
$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...
Operations for the Llama Stack / Distributions
options:
-h, --help show this help message and exit
--version show program's version number and exit
stack_subcommands:
{build,list-apis,list-providers,run}
build Build a Llama stack container
list-apis List APIs part of the Llama Stack implementation
list-providers Show available Llama Stack Providers for an API
run Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
It is important to verify large checkpoints downloaded via `llama model
download` because subtle corruptions can easily happen with large file
system writes. This PR adds a `verify-download` subcommand. Note that
verification itself is a very time consuming process (and will take
several **minutes** for the 405B model), hence this is a separate
subcommand (and not part of the download which can already be
time-consuming) and there are spinners and a bit of a "show" around it
in the implementation.
## Test Plan
<img width="1012" alt="image"
src="https://github.com/user-attachments/assets/f82b0d42-2a15-4917-b85e-6d3cd7d31e55">
* API Keys passed from Client instead of distro configuration
* delete distribution registry
* Rename the "package" word away
* Introduce a "Router" layer for providers
Some providers need to be factorized and considered as thin routing
layers on top of other providers. Consider two examples:
- The inference API should be a routing layer over inference providers,
routed using the "model" key
- The memory banks API is another instance where various memory bank
types will be provided by independent providers (e.g., a vector store
is served by Chroma while a keyvalue memory can be served by Redis or
PGVector)
This commit introduces a generalized routing layer for this purpose.
* update `apis_to_serve`
* llama_toolchain -> llama_stack
* Codemod from llama_toolchain -> llama_stack
- added providers/registry
- cleaned up api/ subdirectories and moved impls away
- restructured api/api.py
- from llama_stack.apis.<api> import foo should work now
- update imports to do llama_stack.apis.<api>
- update many other imports
- added __init__, fixed some registry imports
- updated registry imports
- create_agentic_system -> create_agent
- AgenticSystem -> Agent
* Moved some stuff out of common/; re-generated OpenAPI spec
* llama-toolchain -> llama-stack (hyphens)
* add control plane API
* add redis adapter + sqlite provider
* move core -> distribution
* Some more toolchain -> stack changes
* small naming shenanigans
* Removing custom tool and agent utilities and moving them client side
* Move control plane to distribution server for now
* Remove control plane from API list
* no codeshield dependency randomly plzzzzz
* Add "fire" as a dependency
* add back event loggers
* stack configure fixes
* use brave instead of bing in the example client
* add init file so it gets packaged
* add init files so it gets packaged
* Update MANIFEST
* bug fix
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
Co-authored-by: Xi Yan <xiyan@meta.com>
Co-authored-by: Ashwin Bharambe <ashwin@meta.com>
2024-09-17 19:51:35 -07:00
Renamed from llama_toolchain/cli/llama.py (Browse further)