Charlie Doern
1ae61e8d5f
fix: replace all instances of --yaml-config with --config ( #2196 )
...
# What does this PR do?
start_stack.sh was using --yaml-config which is deprecated.
a bunch of distro docs also mentioned --yaml-config. Replaces all
instances and logic for --yaml-config with --config
resolves #2189
Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-05-16 14:31:12 -07:00
Sébastien Han
43e623eea6
chore: remove last instances of code-interpreter provider ( #2143 )
...
Was removed in https://github.com/meta-llama/llama-stack/pull/2087
Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-12 10:54:43 -07:00
Hardik Shah
581e8ae562
fix: docker run with --pull always
to fetch the latest image ( #1733 )
...
As titled
2025-03-20 15:35:48 -07:00
Ashwin Bharambe
992f865b2e
chore: move embedding deps to RAG tool where they are needed ( #1210 )
...
`EMBEDDING_DEPS` were wrongly associated with `vector_io` providers.
They are needed by
https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/utils/memory/vector_store.py#L142
and related code and is used by the RAG tool and as such should only be
needed by the `inline::rag-runtime` provider.
2025-02-21 11:33:41 -08:00
Ellis Tarn
afca9d92f9
fix: Readthedocs cannot parse comments, resulting in docs bugs ( #1033 )
2025-02-10 16:35:16 -05:00
Hardik Shah
a84e7669f0
feat: Add a new template for dell
( #978 )
...
- Added new template `dell` and its documentation
- Update docs
- [minor] uv fix i came across
- codegen for all templates
Tested with
```bash
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=6601
export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT ](about:blank)
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321
# build the stack template
llama stack build --template=dell
# start the TGI inference server
podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference ) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0
# start chroma-db for vector-io ( aka RAG )
podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname)
# build docker
llama stack build --template=dell --image-type=container
# run llama stack server ( via docker )
podman run -it \
--network host \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL
# test the server
cd <PATH_TO_LLAMA_STACK_REPO>
LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py
```
---------
Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 14:14:39 -08:00