diff --git a/README.md b/README.md index 639e7280d..5360f4ff0 100644 --- a/README.md +++ b/README.md @@ -10,83 +10,6 @@ [**Quick Start**](https://llamastack.github.io/docs/getting_started/quickstart) | [**Documentation**](https://llamastack.github.io/docs) | [**Colab Notebook**](./docs/getting_started.ipynb) | [**Discord**](https://discord.gg/llama-stack) -### āœØšŸŽ‰ Llama 4 Support šŸŽ‰āœØ -We released [Version 0.2.0](https://github.com/meta-llama/llama-stack/releases/tag/v0.2.0) with support for the Llama 4 herd of models released by Meta. - -
- -šŸ‘‹ Click here to see how to run Llama 4 models on Llama Stack - -\ -*Note you need 8xH100 GPU-host to run these models* - -```bash -pip install -U llama_stack - -MODEL="Llama-4-Scout-17B-16E-Instruct" -# get meta url from llama.com -huggingface-cli download meta-llama/$MODEL --local-dir ~/.llama/$MODEL - -# install dependencies for the distribution -llama stack list-deps meta-reference-gpu | xargs -L1 uv pip install - -# start a llama stack server -INFERENCE_MODEL=meta-llama/$MODEL llama stack run meta-reference-gpu - -# install client to interact with the server -pip install llama-stack-client -``` -### CLI -```bash -# Run a chat completion -MODEL="Llama-4-Scout-17B-16E-Instruct" - -llama-stack-client --endpoint http://localhost:8321 \ -inference chat-completion \ ---model-id meta-llama/$MODEL \ ---message "write a haiku for meta's llama 4 models" - -OpenAIChatCompletion( - ... - choices=[ - OpenAIChatCompletionChoice( - finish_reason='stop', - index=0, - message=OpenAIChatCompletionChoiceMessageOpenAIAssistantMessageParam( - role='assistant', - content='...**Silent minds awaken,** \n**Whispers of billions of words,** \n**Reasoning breaks the night.** \n\n— \n*This haiku blends the essence of LLaMA 4\'s capabilities with nature-inspired metaphor, evoking its vast training data and transformative potential.*', - ... - ), - ... - ) - ], - ... -) -``` -### Python SDK -```python -from llama_stack_client import LlamaStackClient - -client = LlamaStackClient(base_url=f"http://localhost:8321") - -model_id = "meta-llama/Llama-4-Scout-17B-16E-Instruct" -prompt = "Write a haiku about coding" - -print(f"User> {prompt}") -response = client.chat.completions.create( - model=model_id, - messages=[ - {"role": "system", "content": "You are a helpful assistant."}, - {"role": "user", "content": prompt}, - ], -) -print(f"Assistant> {response.choices[0].message.content}") -``` -As more providers start supporting Llama 4, you can use them in Llama Stack as well. We are adding to the list. Stay tuned! - - -
- ### šŸš€ One-Line Installer šŸš€ To try Llama Stack locally, run: