From 254adf2e101862fe2c83eaed3cb94effc65fddc3 Mon Sep 17 00:00:00 2001
From: "mergify[bot]" <37929162+mergify[bot]@users.noreply.github.com>
Date: Mon, 8 Dec 2025 12:15:55 +0100
Subject: [PATCH] chore(docs): Remove Llama 4 support details from README
(backport #4178) (#4323)
This is an automatic backport of pull request #4178 done by
[Mergify](https://mergify.com).
Co-authored-by: raghotham
---
README.md | 77 -------------------------------------------------------
1 file changed, 77 deletions(-)
diff --git a/README.md b/README.md
index bb8587855..af1272586 100644
--- a/README.md
+++ b/README.md
@@ -10,83 +10,6 @@
[**Quick Start**](https://llamastack.github.io/docs/getting_started/quickstart) | [**Documentation**](https://llamastack.github.io/docs) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack)
-### āØš Llama 4 Support šāØ
-We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta.
-
-
-
-š Click here to see how to run Llama 4 models on Llama Stack
-
-\
-*Note you need 8xH100 GPU-host to run these models*
-
-```bash
-pip install -U llama_stack
-
-MODEL="Llama-4-Scout-17B-16E-Instruct"
-# get meta url from llama.com
-huggingface-cli download meta-llama/$MODEL --local-dir ~/.llama/$MODEL
-
-# install dependencies for the distribution
-llama stack list-deps meta-reference-gpu | xargs -L1 uv pip install
-
-# start a llama stack server
-INFERENCE_MODEL=meta-llama/$MODEL llama stack run meta-reference-gpu
-
-# install client to interact with the server
-pip install llama-stack-client
-```
-### CLI
-```bash
-# Run a chat completion
-MODEL="Llama-4-Scout-17B-16E-Instruct"
-
-llama-stack-client --endpoint http://localhost:8321 \
-inference chat-completion \
---model-id meta-llama/$MODEL \
---message "write a haiku for meta's llama 4 models"
-
-OpenAIChatCompletion(
- ...
- choices=[
- OpenAIChatCompletionChoice(
- finish_reason='stop',
- index=0,
- message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam(
- role='assistant',
- content='...**Silent minds awaken,** \n**Whispers of billions of words,** \n**Reasoning breaks the night.** \n\nā \n*This haiku blends the essence of LLaMA 4\'s capabilities with nature-inspired metaphor, evoking its vast training data and transformative potential.*',
- ...
- ),
- ...
- )
- ],
- ...
-)
-```
-### Python SDK
-```python
-from llama_stack_client import LlamaStackClient
-
-client = LlamaStackClient(base_url=f"http://localhost:8321")
-
-model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
-prompt = "Write a haiku about coding"
-
-print(f"User> {prompt}")
-response = client.chat.completions.create(
- model=model_id,
- messages=[
- {"role": "system", "content": "You are a helpful assistant."},
- {"role": "user", "content": prompt},
- ],
-)
-print(f"Assistant> {response.choices[0].message.content}")
-```
-As more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned!
-
-
-
-
### š One-Line Installer š
To try Llama Stack locally, run: