Update docs for OpenAI compatible providers, add Llamafile docs, include Llamafile in the sidebar

2025-04-24 18:24:20 +00:00 · 2025-04-22 16:36:07 +01:00 · 2025-04-22 16:36:07 +01:00 · db4e40d410
commit db4e40d410
parent 174a1aa007
3 changed files with 174 additions and 2 deletions
--- a/docs/my-website/docs/providers/llamafile.md
+++ b/docs/my-website/docs/providers/llamafile.md
@ -0,0 +1,158 @@
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# Llamafile
+
+LiteLLM supports all models on Llamafile.
+
+| Property                  | Details                                                                                                                              |
+|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
+| Description               | llamafile lets you distribute and run LLMs with a single file. [Docs](https://github.com/Mozilla-Ocho/llamafile/blob/main/README.md) |
+| Provider Route on LiteLLM | `llamafile/` (for OpenAI compatible server)                                                                                          |
+| Provider Doc              | [llamafile ↗](https://github.com/Mozilla-Ocho/llamafile/blob/main/llama.cpp/server/README.md#api-endpoints)                          |
+| Supported Endpoints       | `/chat/completions`, `/embeddings`, `/completions`                                                                                   |
+
+
+# Quick Start
+
+## Usage - litellm.completion (calling OpenAI compatible endpoint)
+llamafile Provides an OpenAI compatible endpoint for chat completions - here's how to call it with LiteLLM
+
+To use litellm to call llamafile add the following to your completion call
+
+* `model="llamafile/<your-llamafile-model-name>"` 
+* `api_base = "your-hosted-llamafile"`
+
+```python
+import litellm 
+
+response = litellm.completion(
+            model="llamafile/mistralai/mistral-7b-instruct-v0.2", # pass the llamafile model name for completeness
+            messages=messages,
+            api_base="http://localhost:8080/v1",
+            temperature=0.2,
+            max_tokens=80)
+
+print(response)
+```
+
+
+## Usage -  LiteLLM Proxy Server (calling OpenAI compatible endpoint)
+
+Here's how to call an OpenAI-Compatible Endpoint with the LiteLLM Proxy Server
+
+1. Modify the config.yaml 
+
+  ```yaml
+  model_list:
+    - model_name: my-model
+      litellm_params:
+        model: llamafile/mistralai/mistral-7b-instruct-v0.2 # add llamafile/ prefix to route as OpenAI provider
+        api_base: http://localhost:8080/v1 # add api base for OpenAI compatible provider
+  ```
+
+1. Start the proxy 
+
+  ```bash
+  $ litellm --config /path/to/config.yaml
+  ```
+
+1. Send Request to LiteLLM Proxy Server
+
+  <Tabs>
+
+  <TabItem value="openai" label="OpenAI Python v1.0.0+">
+
+  ```python
+  import openai
+  client = openai.OpenAI(
+      api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
+      base_url="http://0.0.0.0:4000" # litellm-proxy-base url
+  )
+
+  response = client.chat.completions.create(
+      model="my-model",
+      messages = [
+          {
+              "role": "user",
+              "content": "what llm are you"
+          }
+      ],
+  )
+
+  print(response)
+  ```
+  </TabItem>
+
+  <TabItem value="curl" label="curl">
+
+  ```shell
+  curl --location 'http://0.0.0.0:4000/chat/completions' \
+      --header 'Authorization: Bearer sk-1234' \
+      --header 'Content-Type: application/json' \
+      --data '{
+      "model": "my-model",
+      "messages": [
+          {
+          "role": "user",
+          "content": "what llm are you"
+          }
+      ],
+  }'
+  ```
+  </TabItem>
+
+  </Tabs>
+
+
+## Embeddings
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from litellm import embedding   
+import os
+
+os.environ["LLAMAFILE_API_BASE"] = "http://localhost:8080/v1"
+
+
+embedding = embedding(model="llamafile/sentence-transformers/all-MiniLM-L6-v2", input=["Hello world"])
+
+print(embedding)
+```
+
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+1. Setup config.yaml
+
+```yaml
+model_list:
+    - model_name: my-model
+      litellm_params:
+        model: llamafile/sentence-transformers/all-MiniLM-L6-v2 # add llamafile/ prefix to route as OpenAI provider
+        api_base: http://localhost:8080/v1 # add api base for OpenAI compatible provider
+```
+
+1. Start the proxy 
+
+```bash
+$ litellm --config /path/to/config.yaml
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+1. Test it! 
+
+```bash
+curl -L -X POST 'http://0.0.0.0:4000/embeddings' \
+-H 'Authorization: Bearer sk-1234' \
+-H 'Content-Type: application/json' \
+-d '{"input": ["hello world"], "model": "my-model"}'
+```
+
+[See OpenAI SDK/Langchain/etc. examples](../proxy/user_keys.md#embeddings)
+
+</TabItem>
+</Tabs>
--- a/docs/my-website/docs/providers/openai_compatible.md
+++ b/docs/my-website/docs/providers/openai_compatible.md
@ -3,13 +3,26 @@ import TabItem from '@theme/TabItem';

 # OpenAI-Compatible Endpoints

+:::info
+
+Selecting `openai` as the provider routes your request to an OpenAI-compatible endpoint using the upstream  
+[official OpenAI Python API library](https://github.com/openai/openai-python/blob/main/README.md).
+
+This library **requires** an API key for all requests, either through the `api_key` parameter 
+or the `OPENAI_API_KEY` environment variable.
+
+If you don’t want to provide a fake API key in each request, consider using a provider that directly matches your 
+OpenAI-compatible endpoint, such as [`hosted_vllm`](/docs/providers/vllm) or [`llamafile`](/docs/providers/llamafile).
+
+:::
+
 To call models hosted behind an openai proxy, make 2 changes:

 1. For `/chat/completions`: Put `openai/` in front of your model name, so litellm knows you're trying to call an openai `/chat/completions` endpoint. 

-2. For `/completions`: Put `text-completion-openai/` in front of your model name, so litellm knows you're trying to call an openai `/completions` endpoint. [NOT REQUIRED for `openai/` endpoints called via `/v1/completions` route].
+1. For `/completions`: Put `text-completion-openai/` in front of your model name, so litellm knows you're trying to call an openai `/completions` endpoint. [NOT REQUIRED for `openai/` endpoints called via `/v1/completions` route].

-2. **Do NOT** add anything additional to the base url e.g. `/v1/embedding`. LiteLLM uses the openai-client to make these calls, and that automatically adds the relevant endpoints. 
+1. **Do NOT** add anything additional to the base url e.g. `/v1/embedding`. LiteLLM uses the openai-client to make these calls, and that automatically adds the relevant endpoints. 


 ## Usage - completion
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -229,6 +229,7 @@ const sidebars = {
        "providers/fireworks_ai",
        "providers/clarifai",
        "providers/vllm",
+        "providers/llamafile",
        "providers/infinity",
        "providers/xinference",
        "providers/cloudflare_workers",