# What does this PR do?
Fix the instruction in quickstart readme so the new developers/users can
run it without issues.
## Test Plan
None
## Sources
Please link relevant resources if necessary.
## Before submitting
- [X] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [X] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [X] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Co-authored-by: Henry Tai <henrytai@fb.com>
# What does this PR do?
adds a new method build_model_alias_with_just_llama_model which is
needed for cases like ollama's quantized models which do not really have
a repo in hf and an entry in SKU list.
## Test Plan
pytest -v -s -m "ollama"
llama_stack/providers/tests/inference/test_text_inference.py
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
# What does this PR do?
Adds a `/alpha/` prefix to all the REST API urls.
Also makes them all use hyphens instead of underscores as is more
standard practice.
(This is based on feedback from our partners.)
## Test Plan
The Stack itself does not need updating. However, client SDKs and
documentation will need to be updated.
# What does this PR do?
- Fix issue w/ llama stack build using together template
<img width="669" alt="image"
src="https://github.com/user-attachments/assets/1cbef052-d902-40b9-98f8-37efb494d117">
- For builds from templates, copy over the
`templates/<template-name>/run.yaml` file to the
`~/.llama/distributions/<name>/<name>-run.yaml` instead of re-building
run config.
## Test Plan
```
$ llama stack build --template together --image-type conda
..
Build spec configuration saved at /opt/anaconda3/envs/llamastack-together/together-build.yaml
Build Successful! Next steps:
1. Set the environment variables: LLAMASTACK_PORT, TOGETHER_API_KEY
2. `llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml`
```
```
$ llama stack run /Users/xiyan/.llama/distributions/llamastack-together/together-run.yaml
```
```
$ llama-stack-client models list
$ pytest -v -s -m remote agents/test_agents.py --env REMOTE_STACK_URL=http://localhost:5000 --inference-model meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
```
<img width="764" alt="image"
src="https://github.com/user-attachments/assets/b805b6c5-a316-4561-8fe3-24fc3b1f8b80">
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
# What does this PR do?
remove another model_ pydantic namespace warning and convert old-style
'class Config' to new-style 'model_config' workaround.
also a whitespace change to get past -
flake8...................................................................Failed
llama_stack/cli/download.py:296:85: E226 missing whitespace around
arithmetic operator
llama_stack/cli/download.py:297:54: E226 missing whitespace around
arithmetic operator
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
# What does this PR do?
In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.
Add Kotlin package link into readme docs
# What does this PR do?
add more quantized model support for ollama.
- [ ] Addresses issue (#issue)
## Test Plan
Tested with ollama docker that run llama3.2 3b 4bit model.
```
root@docker-desktop:/# ollama ps
NAME ID SIZE PROCESSOR UNTIL
llama3.2:3b a80c4f17acd5 3.5 GB 100% CPU 3 minutes from now
```
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
This PR adds a method in stack to return the stackrunconfig object based
on the template name. This will be used to instantiate a direct client
without the need for an explicit run.yaml
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
This PR allows models to be registered with provider as long as the user
specifies a llama model, even though the model does not match our
prebuilt provider specific mapping.
Test:
pytest -v -s
llama_stack/providers/tests/inference/test_model_registration.py -m
"together" --env TOGETHER_API_KEY=<KEY>
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
# What does this PR do?
Automatically generates
- build.yaml
- run.yaml
- run-with-safety.yaml
- parts of markdown docs
for the distributions.
## Test Plan
At this point, this only updates the YAMLs and the docs. Some testing
(especially with ollama and vllm) has been performed but needs to be
much more tested.
**Summary:**
Extend the shorthand run command so it can run successfully when config
exists under DISTRIBS_BASE_DIR (i.e. ~/.llama/distributions).
For example, imagine you created a new stack using the `llama stack
build` command where you named it "my-awesome-llama-stack".
```
$ llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack): my-awesome-llama-stack
```
To run the stack you created you will have to use long config path:
```
llama stack run ~/.llama/distributions/llamastack-my-awesome-llama-stack/my-awesome-llama-stack-run.yaml
```
With this change, you can start it using the stack name instead of full
path:
```
llama stack run my-awesome-llama-stack
```
**Test Plan:**
Verify command fails when stack doesn't exist
```
python3 -m llama_stack.cli.llama stack run my-test-stack
```
Output [FAILURE]
```
usage: llama stack run [-h] [--port PORT] [--disable-ipv6] config
llama stack run: error: File /Users/vladimirivic/.llama/distributions/llamastack-my-test-stack/my-test-stack-run.yaml does not exist. Please run `llama stack build` to generate (and optionally edit) a run.yaml file
```
Create a new stack using `llama stack build`.
Name it `my-test-stack`.
Verify command runs successfully
```
python3 -m llama_stack.cli.llama stack run my-test-stack
```
Output [SUCCESS]
```
Listening on ['::', '0.0.0.0']:5000
INFO: Started server process [80146]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://['::', '0.0.0.0']:5000 (Press CTRL+C to quit)
```
faiss serialize index returns a np object, that we first need to save to
buffer and then write to sqllite. Since we are using json, we need to
base64 encode the data.
Same in the read path, we base64 decode and read into np array and then
call into deserialize index.
tests:
torchrun $CONDA_PREFIX/bin/pytest -v -s -m "faiss"
llama_stack/providers/tests/memory/test_memory.py
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>
# What does this PR do?
- move folder
## Test Plan
**Unit Test**
```
pytest -v -s -m "huggingface" datasetio/test_datasetio.py
```
**E2E**
```
llama stack run
```
```
llama-stack-client eval run_benchmark meta-reference-mmlu --num-examples 5 --output-dir ./ --eval-task-config ~/eval_task_config.json --visualize
```
<img width="657" alt="image"
src="https://github.com/user-attachments/assets/63d53f9d-6c7e-4667-af8c-9d16c91ae6e3">
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
The semantics of an Update on resources is very tricky to reason about
especially for memory banks and models. The best way to go forward here
is for the user to unregister and register a new resource. We don't have
a compelling reason to support update APIs.
Tests:
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m
"chroma" --env CHROMA_HOST=localhost --env CHROMA_PORT=8000
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m
"pgvector" --env PGVECTOR_DB=postgres --env PGVECTOR_USER=postgres --env
PGVECTOR_PASSWORD=mysecretpassword --env PGVECTOR_HOST=0.0.0.0
$CONDA_PREFIX/bin/pytest -v -s -m "ollama"
llama_stack/providers/tests/inference/test_model_registration.py
---------
Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>