diff --git a/docs/my-website/docs/providers/llamafile.md b/docs/my-website/docs/providers/llamafile.md new file mode 100644 index 0000000000..3539bc2eb4 --- /dev/null +++ b/docs/my-website/docs/providers/llamafile.md @@ -0,0 +1,158 @@ +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# Llamafile + +LiteLLM supports all models on Llamafile. + +| Property | Details | +|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------| +| Description | llamafile lets you distribute and run LLMs with a single file. [Docs](https://github.com/Mozilla-Ocho/llamafile/blob/main/README.md) | +| Provider Route on LiteLLM | `llamafile/` (for OpenAI compatible server) | +| Provider Doc | [llamafile ↗](https://github.com/Mozilla-Ocho/llamafile/blob/main/llama.cpp/server/README.md#api-endpoints) | +| Supported Endpoints | `/chat/completions`, `/embeddings`, `/completions` | + + +# Quick Start + +## Usage - litellm.completion (calling OpenAI compatible endpoint) +llamafile Provides an OpenAI compatible endpoint for chat completions - here's how to call it with LiteLLM + +To use litellm to call llamafile add the following to your completion call + +* `model="llamafile/"` +* `api_base = "your-hosted-llamafile"` + +```python +import litellm + +response = litellm.completion( + model="llamafile/mistralai/mistral-7b-instruct-v0.2", # pass the llamafile model name for completeness + messages=messages, + api_base="http://localhost:8080/v1", + temperature=0.2, + max_tokens=80) + +print(response) +``` + + +## Usage - LiteLLM Proxy Server (calling OpenAI compatible endpoint) + +Here's how to call an OpenAI-Compatible Endpoint with the LiteLLM Proxy Server + +1. Modify the config.yaml + + ```yaml + model_list: + - model_name: my-model + litellm_params: + model: llamafile/mistralai/mistral-7b-instruct-v0.2 # add llamafile/ prefix to route as OpenAI provider + api_base: http://localhost:8080/v1 # add api base for OpenAI compatible provider + ``` + +1. Start the proxy + + ```bash + $ litellm --config /path/to/config.yaml + ``` + +1. Send Request to LiteLLM Proxy Server + + + + + + ```python + import openai + client = openai.OpenAI( + api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys + base_url="http://0.0.0.0:4000" # litellm-proxy-base url + ) + + response = client.chat.completions.create( + model="my-model", + messages = [ + { + "role": "user", + "content": "what llm are you" + } + ], + ) + + print(response) + ``` + + + + + ```shell + curl --location 'http://0.0.0.0:4000/chat/completions' \ + --header 'Authorization: Bearer sk-1234' \ + --header 'Content-Type: application/json' \ + --data '{ + "model": "my-model", + "messages": [ + { + "role": "user", + "content": "what llm are you" + } + ], + }' + ``` + + + + + +## Embeddings + + + + +```python +from litellm import embedding +import os + +os.environ["LLAMAFILE_API_BASE"] = "http://localhost:8080/v1" + + +embedding = embedding(model="llamafile/sentence-transformers/all-MiniLM-L6-v2", input=["Hello world"]) + +print(embedding) +``` + + + + +1. Setup config.yaml + +```yaml +model_list: + - model_name: my-model + litellm_params: + model: llamafile/sentence-transformers/all-MiniLM-L6-v2 # add llamafile/ prefix to route as OpenAI provider + api_base: http://localhost:8080/v1 # add api base for OpenAI compatible provider +``` + +1. Start the proxy + +```bash +$ litellm --config /path/to/config.yaml + +# RUNNING on http://0.0.0.0:4000 +``` + +1. Test it! + +```bash +curl -L -X POST 'http://0.0.0.0:4000/embeddings' \ +-H 'Authorization: Bearer sk-1234' \ +-H 'Content-Type: application/json' \ +-d '{"input": ["hello world"], "model": "my-model"}' +``` + +[See OpenAI SDK/Langchain/etc. examples](../proxy/user_keys.md#embeddings) + + + \ No newline at end of file diff --git a/docs/my-website/docs/providers/openai_compatible.md b/docs/my-website/docs/providers/openai_compatible.md index c7f9bf6f40..2f11379a8d 100644 --- a/docs/my-website/docs/providers/openai_compatible.md +++ b/docs/my-website/docs/providers/openai_compatible.md @@ -3,13 +3,26 @@ import TabItem from '@theme/TabItem'; # OpenAI-Compatible Endpoints +:::info + +Selecting `openai` as the provider routes your request to an OpenAI-compatible endpoint using the upstream +[official OpenAI Python API library](https://github.com/openai/openai-python/blob/main/README.md). + +This library **requires** an API key for all requests, either through the `api_key` parameter +or the `OPENAI_API_KEY` environment variable. + +If you don’t want to provide a fake API key in each request, consider using a provider that directly matches your +OpenAI-compatible endpoint, such as [`hosted_vllm`](/docs/providers/vllm) or [`llamafile`](/docs/providers/llamafile). + +::: + To call models hosted behind an openai proxy, make 2 changes: 1. For `/chat/completions`: Put `openai/` in front of your model name, so litellm knows you're trying to call an openai `/chat/completions` endpoint. -2. For `/completions`: Put `text-completion-openai/` in front of your model name, so litellm knows you're trying to call an openai `/completions` endpoint. [NOT REQUIRED for `openai/` endpoints called via `/v1/completions` route]. +1. For `/completions`: Put `text-completion-openai/` in front of your model name, so litellm knows you're trying to call an openai `/completions` endpoint. [NOT REQUIRED for `openai/` endpoints called via `/v1/completions` route]. -2. **Do NOT** add anything additional to the base url e.g. `/v1/embedding`. LiteLLM uses the openai-client to make these calls, and that automatically adds the relevant endpoints. +1. **Do NOT** add anything additional to the base url e.g. `/v1/embedding`. LiteLLM uses the openai-client to make these calls, and that automatically adds the relevant endpoints. ## Usage - completion diff --git a/docs/my-website/sidebars.js b/docs/my-website/sidebars.js index 60030a59bb..3954f663da 100644 --- a/docs/my-website/sidebars.js +++ b/docs/my-website/sidebars.js @@ -229,6 +229,7 @@ const sidebars = { "providers/fireworks_ai", "providers/clarifai", "providers/vllm", + "providers/llamafile", "providers/infinity", "providers/xinference", "providers/cloudflare_workers",