Commit graph

170 commits

Author SHA1 Message Date
jhpiedrahitao
63e3c5812d update sambanova models 2025-04-15 10:26:57 -05:00
jhpiedrahitao
172a918fe3 Merge branch 'main' into feat/litellm_sambanova_usage 2025-04-11 19:28:02 -05:00
Sébastien Han
edd9aaac3b
fix: use torchao 0.8.0 for inference (#1925)
# What does this PR do?

While building the "experimental-post-training" distribution, we
encountered a version conflict between torchao with inference requiring
version 0.5.0 and training currently depending on version 0.8.0.

Resolves this error:

```
  × No solution found when resolving dependencies:
  ╰─▶ Because you require torchao==0.5.0 and torchao==0.8.0, we can conclude that your requirements are unsatisfiable.
ERROR    2025-04-10 10:41:22,597 llama_stack.distribution.build:128 uncategorized: Failed to build target test with
         return code 1
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-10 13:39:20 -07:00
Jorge Piedrahita Ortiz
13c660f5a5
Merge branch 'meta-llama:main' into feat/litellm_sambanova_usage 2025-04-10 11:01:51 -05:00
Yuan Tang
1be66d754e
docs: Redirect instructions for additional hardware accelerators for remote vLLM provider (#1923)
# What does this PR do?

vLLM website just added a [new index page for installing for different
hardware
accelerators](https://docs.vllm.ai/en/latest/getting_started/installation.html).
This PR adds a link to that page with additional edits to make sure
readers are aware that the use of GPUs on this page are for
demonstration purposes only.

This closes https://github.com/meta-llama/llama-stack/issues/1813.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-04-10 10:04:17 +02:00
AlexHe99
983f6feeb8
docs: Update remote-vllm.md with AMD GPU vLLM server supported. (#1858)
Add the content to use AMD GPU as the vLLM server. Split the original
part to two sub chapters,
1. AMD vLLM server
2. NVIDIA vLLM server (orignal)

# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: Alex He <alehe@amd.com>
2025-04-08 21:35:32 -07:00
Jorge Piedrahita Ortiz
aff9e18f9f
Merge branch 'meta-llama:main' into feat/litellm_sambanova_usage 2025-04-08 16:02:58 -05:00
jhpiedrahitao
8e6b622923 add llama4 maverick to sambanova inference models 2025-04-08 16:02:08 -05:00
ehhuang
031a40bec0
fix: type (#1898)
# What does this PR do?


## Test Plan
2025-04-08 09:07:25 -07:00
jhpiedrahitao
ecf84a3b2f fix typo un sambanova llama4 scout model name 2025-04-08 08:06:38 -05:00
Jorge Piedrahita Ortiz
49bf4211da
Merge branch 'main' into feat/litellm_sambanova_usage 2025-04-08 07:40:56 -05:00
ehhuang
7b4eb0967e
test: verification on provider's OAI endpoints (#1893)
# What does this PR do?


## Test Plan
export MODEL=accounts/fireworks/models/llama4-scout-instruct-basic;
LLAMA_STACK_CONFIG=verification pytest -s -v tests/integration/inference
--vision-model $MODEL --text-model $MODEL
2025-04-07 23:06:28 -07:00
Ashwin Bharambe
530d4bdfe1
refactor: move all llama code to models/llama out of meta reference (#1887)
# What does this PR do?

Move around bits. This makes the copies from llama-models _much_ easier
to maintain and ensures we don't entangle meta-reference specific
tidbits into llama-models code even by accident.

Also, kills the meta-reference-quantized-gpu distro and rolls
quantization deps into meta-reference-gpu.

## Test Plan

```
LLAMA_MODELS_DEBUG=1 \
  with-proxy llama stack run meta-reference-gpu \
  --env INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct \
   --env INFERENCE_CHECKPOINT_DIR=<DIR> \
   --env MODEL_PARALLEL_SIZE=4 \
   --env QUANTIZATION_TYPE=fp8_mixed
```

Start a server with and without quantization. Point integration tests to
it using:

```
pytest -s -v  tests/integration/inference/test_text_inference.py \
   --stack-config http://localhost:8321 --text-model meta-llama/Llama-4-Scout-17B-16E-Instruct
```
2025-04-07 15:03:58 -07:00
jhpiedrahitao
085cc7beed update get_apikey in adaptor get_params 2025-04-02 15:43:14 -05:00
jhpiedrahitao
9c9f9577e2 Merge branch 'main' into feat/litellm_sambanova_usage 2025-04-01 07:57:21 -05:00
Dmitry Rogozhkin
935e706b15
docs: fix remote-vllm instructions (#1805)
# What does this PR do?

* Fix location of `run.yaml` relative to the cloned llama stack
repository
* Drop `-it` from `docker run` commands as its not needed running
services

## Test Plan

* Verified running the llama stack following updated instruction

CC: @ashwinb

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-03-27 10:19:51 -04:00
Sébastien Han
e3578b1c1b
chore: remove distributions dir (#1809)
# What does this PR do?

Followup on https://github.com/meta-llama/llama-stack/pull/1801. Move
the deps files to llama_stack/templates.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-03-27 09:03:39 -04:00
ehhuang
2f38851751
chore: Revert "chore(telemetry): remove service_name entirely" (#1785)
Reverts meta-llama/llama-stack#1755 closes #1781
2025-03-25 14:42:05 -07:00
Rashmi Pawar
1a73f8305b
feat: Add nemo customizer (#1448)
# What does this PR do?

This PR adds support for NVIDIA's NeMo Customizer API to the Llama Stack
post-training module. The integration enables users to fine-tune models
using NVIDIA's cloud-based customization service through a consistent
Llama Stack interface.


[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
Yet to be done

Things pending under this PR:

- [x] Integration of fine-tuned model(new checkpoint) for inference with
nvidia llm distribution
- [x] distribution integration of API
- [x] Add test cases for customizer(In Progress)
- [x] Documentation

```

LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/post_training/test_supervised_fine_tuning.py 

============================================================================================================================================================================ test session starts =============================================================================================================================================================================
platform linux -- Python 3.10.0, pytest-8.3.4, pluggy-1.5.0 -- /home/ubuntu/llama-stack/.venv/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.0', 'Platform': 'Linux-6.8.0-1021-gcp-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'nbval': '0.11.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'html': '4.1.1', 'asyncio': '0.25.3'}}
rootdir: /home/ubuntu/llama-stack
configfile: pyproject.toml
plugins: nbval-0.11.0, metadata-3.1.1, anyio-4.8.0, html-4.1.1, asyncio-0.25.3
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items                                                                                                                                                                                                                                                                                                                                                            

tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_post_training_provider_registration[txt=8B] PASSED                                                                                                                                                                                                                                                 [ 50%]
tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_list_training_jobs[txt=8B] PASSED                                                                                                                                                                                                                                                                  [100%]

======================================================================================================================================================================== 2 passed, 1 warning in 0.10s ========================================================================================================================================================================
```
cc: @mattf @dglogo @sumitb

---------

Co-authored-by: Ubuntu <ubuntu@llama-stack-customizer-dev-inst-2tx95fyisatvlic4we8hidx5tfj.us-central1-a.c.brevdevprod.internal>
2025-03-25 11:01:10 -07:00
Jorge Piedrahita Ortiz
8783dd8162
Merge branch 'main' into feat/litellm_sambanova_usage 2025-03-24 08:02:40 -05:00
ehhuang
b9fbfed216
chore(telemetry): remove service_name entirely (#1755)
# What does this PR do?


## Test Plan

LLAMA_STACK_CONFIG=dev pytest -s -v
tests/integration/agents/test_agents.py::test_custom_tool
--safety-shield meta-llama/Llama-Guard-3-8B --text-model
accounts/fireworks/models/llama-v3p1-8b-instruct

and verify trace in jaeger UI
https://llama-stack.readthedocs.io/en/latest/building_applications/telemetry.html#
2025-03-21 15:11:56 -07:00
ehhuang
34f89bfbd6
feat(telemetry): use zero-width space to avoid clutter (#1754)
# What does this PR do?
Before 
<img width="858" alt="image"
src="https://github.com/user-attachments/assets/6cefb1ae-5603-4818-85ea-a0c337b986bc"
/>

Note the redundant 'llama-stack' in front of every span

## Test Plan
<img width="1171" alt="image"
src="https://github.com/user-attachments/assets/bdc5fd5b-ff1f-4f10-8b40-cff2ea93dd1f"
/>
2025-03-21 12:02:10 -07:00
Yuan Tang
dce9a24a6c
test: Add default vLLM URL in remote-vllm template (#1736)
# What does this PR do?

This is to avoid errors like the following when running inference
integration tests:

```
ERROR tests/integration/inference/test_text_inference.py::test_text_completion_stop_sequence[txt=8B-inference:completion:stop_sequence] - llama_stack.distribution.stack.EnvVarError: Environment variable 'VLLM_URL' not set or empty at providers.inference[0].config.url
```

It's also good to have a default, which is consistent with vLLM API
server.

## Test Plan

Integration tests can run without the error above.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-21 07:31:59 -07:00
Botao Chen
9114bef484
fix: fix experimental-post-training template (#1740)
## What does this PR do?

fix the template to make it compatible with the latest dataset and eval
api change

## test 
run `llama stack run
llama_stack/templates/experimental-post-training/run.yaml` and spin up
the llama stack server successfully
2025-03-20 23:07:19 -07:00
Hardik Shah
127bac6869
fix: Default to port 8321 everywhere (#1734)
As titled, moved all instances of 5001 to 8321
2025-03-20 15:50:41 -07:00
Hardik Shah
581e8ae562
fix: docker run with --pull always to fetch the latest image (#1733)
As titled
2025-03-20 15:35:48 -07:00
Yuan Tang
f5a5c5d459
docs: Add instruction on enabling tool calling for remote vLLM (#1719)
# What does this PR do?

This PR adds a link to tool calling instructions in vLLM. Users have
asked about this many times, e.g.
https://github.com/meta-llama/llama-stack/issues/1648#issuecomment-2740642077

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-03-20 15:18:17 -07:00
Botao Chen
f369871083
feat: [New Eval Benchamark] IfEval (#1708)
# What does this PR do?
In this PR, we added a new eval open benchmark IfEval based on paper
https://arxiv.org/abs/2311.07911 to measure the model capability of
instruction following.


## Test Plan
spin up a llama stack server with open-benchmark template

run `llama-stack-client --endpoint xxx eval run-benchmark
"meta-reference-ifeval" --model-id "meta-llama/Llama-3.3-70B-Instruct"
--output-dir "/home/markchen1015/" --num-examples 20` on client side and
get the eval aggregate results
2025-03-19 16:39:59 -07:00
yyymeta
d117bfe597
feat: [new open benchmark] DocVQA (#1647)
# What does this PR do?
DocVQA asks model to look a a picture, then answer a question given in
text, with a text answer by text information in the picture. these
questions often require understanding of relative positions of texts
within the picture.

original dataset is defined in the "Task1" of
https://www.docvqa.org/datasets


## Test Plan
setup llama server with 

```
llama stack run ./llama_stack/templates/open-benchmark/run.yaml
```


then send traffic:

```
 llama-stack-client eval run-benchmark "meta-reference-docvqa"  --model-id   meta-llama/Llama-3.3-70B-Instruct     --output-dir /tmp/gpqa    --num-examples   200
```
2025-03-19 14:56:14 -07:00
Jorge Piedrahita Ortiz
02a4f9ac59
Merge branch 'main' into feat/litellm_sambanova_usage 2025-03-19 16:51:59 -05:00
Botao Chen
ab777ef5cd
fix: fix open-benchmark template (#1695)
## What does this PR do?
open-benchmark templated is broken after the datasets api refactor due
to 2 reasons
- provider_id and provider_resource_id are no longer needed 
- the type in run.yaml will be resolved as dict

this PR is to fix the above 2 issues 

## Test 
spin up a llama stack server successfully with llama stack run
`llama_stack/templates/open-benchmark/run.yaml`
2025-03-19 11:27:11 -07:00
jhpiedrahitao
43d6c6fade update run template 2025-03-18 16:43:19 -05:00
Jorge Piedrahita Ortiz
5bd1bd30e2
Merge branch 'main' into feat/litellm_sambanova_usage 2025-03-18 12:06:58 -05:00
Luis Tomas Bolivar
168cbcbb92
fix: Add the option to not verify SSL at remote-vllm provider (#1585)
# What does this PR do?
Add the option to not verify SSL certificates for the remote-vllm
provider. This allows llama stack server to talk to remote LLMs which
have self-signed certificates

Partially addresses  #1545
2025-03-18 09:33:35 -04:00
Xi Yan
5287b437ae
feat(api): (1/n) datasets api clean up (#1573)
## PR Stack
- https://github.com/meta-llama/llama-stack/pull/1573
- https://github.com/meta-llama/llama-stack/pull/1625
- https://github.com/meta-llama/llama-stack/pull/1656
- https://github.com/meta-llama/llama-stack/pull/1657
- https://github.com/meta-llama/llama-stack/pull/1658
- https://github.com/meta-llama/llama-stack/pull/1659
- https://github.com/meta-llama/llama-stack/pull/1660

**Client SDK**
- https://github.com/meta-llama/llama-stack-client-python/pull/203

**CI**
- 1391130488
<img width="1042" alt="image"
src="https://github.com/user-attachments/assets/69636067-376d-436b-9204-896e2dd490ca"
/>
-- the test_rag_agent_with_attachments is flaky and not related to this
PR

## Doc
<img width="789" alt="image"
src="https://github.com/user-attachments/assets/b88390f3-73d6-4483-b09a-a192064e32d9"
/>


## Client Usage
```python
client.datasets.register(
    source={
        "type": "uri",
        "uri": "lsfs://mydata.jsonl",
    },
    schema="jsonl_messages",
    # optional 
    dataset_id="my_first_train_data"
)

# quick prototype debugging
client.datasets.register(
    data_reference={
        "type": "rows",
        "rows": [
                "messages": [...],
        ],
    },
    schema="jsonl_messages",
)
```

## Test Plan
- CI:
1387805545

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/datasets/test_datasets.py
```

```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/integration/scoring/test_scoring.py
```

```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
2025-03-17 16:55:45 -07:00
cdgamarose-nv
252a487085
feat: added nvidia as safety provider (#1248)
# What does this PR do?
Adds nvidia as a safety provider by interfacing with the nemo guardrails
microservice.
This enables checking user’s input or the LLM’s output against input and
output guardrails by using the `/v1/guardrails/checks` endpoint of the[
guardrails
API.](https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/guides/checks-guide.html)

## Test Plan
Deploy nemo guardrails service following the documentation:
https://developer.nvidia.com/docs/nemo-microservices/guardrails/source/getting-started/deploy-docker.html

### Standalone:
```bash
(venv) local-cdgamarose@a1u1g-rome-0153:~/llama-stack$ pytest -v -s llama_stack/providers/tests/safety/test_safety.py --providers inference=nvidia,safety=nvidia --safety-shield meta/llama-3.1-8b-instruct

=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.10.12, pytest-8.3.4, pluggy-1.5.0 -- /localhome/local-cdgamarose/llama-stack/venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.10.12', 'Platform': 'Linux-5.15.0-122-generic-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'html': '4.1.1'}}
rootdir: /localhome/local-cdgamarose/llama-stack
configfile: pyproject.toml
plugins: metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, html-4.1.1
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None
collected 2 items

llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_shield_list[--inference=nvidia:safety=nvidia] Initializing NVIDIASafetyAdapter(http://0.0.0.0:7331)...
PASSED
llama_stack/providers/tests/safety/test_safety.py::TestSafety::test_run_shield[--inference=nvidia:safety=nvidia] PASSED

============================================================================== 2 passed, 2 warnings in 4.78s ==============================================================================

```
### Distribution:
```
llama stack run llama_stack/templates/nvidia/run-with-safety.yaml
curl -v -X 'POST' "http://localhost:8321/v1/safety/run-shield" -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"shield_id": "meta/llama-3.1-8b-instruct", "messages":[{"role": "user", "content": "you are stupid"}]}'
{"violation":{"violation_level":"error","user_message":"Sorry I cannot do this.","metadata":{"self check input":{"status":"blocked"}}}}
```

[//]: # (## Documentation)

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-03-17 14:39:23 -07:00
Jorge Piedrahita Ortiz
716cb09056
Merge branch 'meta-llama:main' into feat/litellm_sambanova_usage 2025-03-17 09:42:15 -05:00
yyymeta
a626b7bce3
feat: [new open benchmark] BFCL_v3 (#1578)
# What does this PR do?
create a new dataset BFCL_v3 from
https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html

overall each question asks the model to perform a task described in
natural language, and additionally a set of available functions and
their schema are given for the model to choose from. the model is
required to write the function call form including function name and
parameters , to achieve the stated purpose. the results are validated
against provided ground truth, to make sure that the generated function
call and the ground truth function call are syntactically and
semantically equivalent, by checking their AST .



## Test Plan

start server by 

```
llama stack run ./llama_stack/templates/ollama/run.yaml
```

then send traffic
```
 llama-stack-client eval run-benchmark "bfcl"  --model-id   meta-llama/Llama-3.2-3B-Instruct    --output-dir /tmp/gpqa    --num-examples   2
```




[//]: # (## Documentation)
2025-03-14 12:50:49 -07:00
Xi Yan
9617468d13
fix: passthrough provider template + fix (#1612)
# What does this PR do?

- Fix issue w/ passthrough provider


[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
llama stack run

[//]: # (## Documentation)
2025-03-13 09:44:26 -07:00
Ashwin Bharambe
d072b5fa0c
test: add unit test to ensure all config types are instantiable (#1601) 2025-03-12 22:29:58 -07:00
jhpiedrahitao
bed93ec552 update dev template 2025-03-12 15:29:10 -05:00
Jorge Piedrahita Ortiz
e49bcd46fe
Merge branch 'meta-llama:main' into feat/litellm_sambanova_usage 2025-03-12 15:12:42 -05:00
jhpiedrahitao
397eed9630 Sambanova now using LiteLLM openai-compat, models and template updated 2025-03-12 15:08:27 -05:00
Xi Yan
c7139b0b67
fix: fix precommit (#1594)
# What does this PR do?

- fix precommit

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
CI

[//]: # (## Documentation)
2025-03-12 11:59:21 -07:00
Botao Chen
0b0be70605
feat: Add open benchmark template codegen (#1579)
## What does this PR do?

As title, add codegen for open-benchmark template

## test 

checked the new generated run.yaml file and it's identical before and
after the change

Also add small improvement to together template so that missing
TOGETHER_API_KEY won't crash the server which is the consistent user
experience as other remote providers
2025-03-12 11:12:08 -07:00
Dinesh Yeduguru
85501ed875
fix: remove Llama-3.2-1B-Instruct for fireworks (#1558)
# What does this PR do?
remove Llama-3.2-1B-Instruct for fireworks as its no longer appears to
be hosted on website.


## Test Plan

python distro_codegen.py
2025-03-11 11:19:29 -07:00
Botao Chen
e3edca7739
feat: [new open benchmark] Math 500 (#1538)
## What does this PR do?
Created a new math_500 open-benchmark based on OpenAI's [Let's Verify
Step by Step](https://arxiv.org/abs/2305.20050) paper and hugging face's
[HuggingFaceH4/MATH-500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)
dataset.

The challenge part of this benchmark is to parse the generated and
expected answer and verify if they are same. For the parsing part, we
refer to [Minerva: Solving Quantitative Reasoning Problems with Language
Models](https://research.google/blog/minerva-solving-quantitative-reasoning-problems-with-language-models/).

To simply the parse logic, as the next step, we plan to also refer to
what [simple-eval](https://github.com/openai/simple-evals) is doing,
using llm as judge to check if the generated answer matches the expected
answer or not


## Test Plan
on sever side, spin up a server with open-benchmark template `llama
stack run llama_stack/templates/open-benchamrk/run.yaml`

on client side, issue an open benchmark eval request `llama-stack-client
--endpoint xxx eval run-benchmark "meta-reference-math-500" --model-id
"meta-llama/Llama-3.3-70B-Instruct" --output-dir "/home/markchen1015/"
--num-examples 20` and get ther aggregated eval results
<img width="238" alt="Screenshot 2025-03-10 at 7 57 04 PM"
src="https://github.com/user-attachments/assets/2c9da042-3b70-470e-a7c4-69f4cc24d1fb"
/>

check the generated answer and the related scoring and they make sense
2025-03-10 20:38:28 -07:00
Ashwin Bharambe
dc84bc755a
fix: revert to using faiss for ollama distro (#1530)
This is unfortunate because `sqlite-vec` seems promising. But its PIP
package is not quite complete. It does not have binary for arm64 (I
think, or maybe it even lacks 64 bit builds?) which results in the arm64
container resulting in
```
File "/usr/local/lib/python3.10/site-packages/sqlite_vec/init.py", line 17, in load
    conn.load_extension(loadable_path())
sqlite3.OperationalError: /usr/local/lib/python3.10/site-packages/sqlite_vec/vec0.so: wrong ELF class: ELFCLASS32
```

To get around I tried to install from source via `uv pip install
sqlite-vec --no-binary=sqlite-vec` however it even lacks a source
distribution which makes that impossible.

## Test Plan

Build the container locally using: 

```bash
LLAMA_STACK_DIR=. llama stack build --template ollama --image-type container
```

Run the container as: 

```
podman run --privileged -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
   -v ~/.llama:/root/.llama \
    --env INFERENCE_MODEL=$INFERENCE_MODEL \
    --env OLLAMA_URL=http://host.containers.internal:11434 \
    -v ~/local/llama-stack:/app/llama-stack-source 
    localhost/distribution-ollama:dev --port $LLAMA_STACK_PORT
```

Verify the container starts up correctly. Without this patch, it would
encounter the ELFCLASS32 error.
2025-03-10 16:15:17 -07:00
Reid
0b8cb830b9
docs: update ollama doc url (#1508)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

It should changed in this pr
https://github.com/meta-llama/llama-stack/pull/1190/files#diff-53e3f35ced54ee5e57dc8b0d3b04770ed84f2f6434c6f492f42569b3c2810ecd

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-10 13:04:59 -07:00
Botao Chen
ade76e4a69
fix: update the open benchmark eval doc (#1497)
## What does this PR do?
add proper links to the doc

## test
preview the doc 

<img width="1304" alt="Screenshot 2025-03-07 at 3 03 22 PM"
src="https://github.com/user-attachments/assets/0a0e2a3d-2420-4af0-99c3-a4786855fae0"
/>

<img width="1303" alt="Screenshot 2025-03-07 at 3 03 32 PM"
src="https://github.com/user-attachments/assets/e11844e7-ee8a-4a64-8617-abafa02b2868"
/>
2025-03-07 15:05:27 -08:00