Update docs for OpenAI compatible providers, add Llamafile docs, include Llamafile in the sidebar

This commit is contained in:
Peter Wilson 2025-04-22 16:36:07 +01:00
parent 174a1aa007
commit db4e40d410
No known key found for this signature in database
GPG key ID: 3CECF55EBF09C069
3 changed files with 174 additions and 2 deletions

View file

@ -0,0 +1,158 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Llamafile
LiteLLM supports all models on Llamafile.
| Property | Details |
|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| Description | llamafile lets you distribute and run LLMs with a single file. [Docs](https://github.com/Mozilla-Ocho/llamafile/blob/main/README.md) |
| Provider Route on LiteLLM | `llamafile/` (for OpenAI compatible server) |
| Provider Doc | [llamafile ↗](https://github.com/Mozilla-Ocho/llamafile/blob/main/llama.cpp/server/README.md#api-endpoints) |
| Supported Endpoints | `/chat/completions`, `/embeddings`, `/completions` |
# Quick Start
## Usage - litellm.completion (calling OpenAI compatible endpoint)
llamafile Provides an OpenAI compatible endpoint for chat completions - here's how to call it with LiteLLM
To use litellm to call llamafile add the following to your completion call
* `model="llamafile/<your-llamafile-model-name>"`
* `api_base = "your-hosted-llamafile"`
```python
import litellm
response = litellm.completion(
model="llamafile/mistralai/mistral-7b-instruct-v0.2", # pass the llamafile model name for completeness
messages=messages,
api_base="http://localhost:8080/v1",
temperature=0.2,
max_tokens=80)
print(response)
```
## Usage - LiteLLM Proxy Server (calling OpenAI compatible endpoint)
Here's how to call an OpenAI-Compatible Endpoint with the LiteLLM Proxy Server
1. Modify the config.yaml
```yaml
model_list:
- model_name: my-model
litellm_params:
model: llamafile/mistralai/mistral-7b-instruct-v0.2 # add llamafile/ prefix to route as OpenAI provider
api_base: http://localhost:8080/v1 # add api base for OpenAI compatible provider
```
1. Start the proxy
```bash
$ litellm --config /path/to/config.yaml
```
1. Send Request to LiteLLM Proxy Server
<Tabs>
<TabItem value="openai" label="OpenAI Python v1.0.0+">
```python
import openai
client = openai.OpenAI(
api_key="sk-1234", # pass litellm proxy key, if you're using virtual keys
base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)
response = client.chat.completions.create(
model="my-model",
messages = [
{
"role": "user",
"content": "what llm are you"
}
],
)
print(response)
```
</TabItem>
<TabItem value="curl" label="curl">
```shell
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "my-model",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
```
</TabItem>
</Tabs>
## Embeddings
<Tabs>
<TabItem value="sdk" label="SDK">
```python
from litellm import embedding
import os
os.environ["LLAMAFILE_API_BASE"] = "http://localhost:8080/v1"
embedding = embedding(model="llamafile/sentence-transformers/all-MiniLM-L6-v2", input=["Hello world"])
print(embedding)
```
</TabItem>
<TabItem value="proxy" label="PROXY">
1. Setup config.yaml
```yaml
model_list:
- model_name: my-model
litellm_params:
model: llamafile/sentence-transformers/all-MiniLM-L6-v2 # add llamafile/ prefix to route as OpenAI provider
api_base: http://localhost:8080/v1 # add api base for OpenAI compatible provider
```
1. Start the proxy
```bash
$ litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
```
1. Test it!
```bash
curl -L -X POST 'http://0.0.0.0:4000/embeddings' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"input": ["hello world"], "model": "my-model"}'
```
[See OpenAI SDK/Langchain/etc. examples](../proxy/user_keys.md#embeddings)
</TabItem>
</Tabs>

View file

@ -3,13 +3,26 @@ import TabItem from '@theme/TabItem';
# OpenAI-Compatible Endpoints
:::info
Selecting `openai` as the provider routes your request to an OpenAI-compatible endpoint using the upstream
[official OpenAI Python API library](https://github.com/openai/openai-python/blob/main/README.md).
This library **requires** an API key for all requests, either through the `api_key` parameter
or the `OPENAI_API_KEY` environment variable.
If you dont want to provide a fake API key in each request, consider using a provider that directly matches your
OpenAI-compatible endpoint, such as [`hosted_vllm`](/docs/providers/vllm) or [`llamafile`](/docs/providers/llamafile).
:::
To call models hosted behind an openai proxy, make 2 changes:
1. For `/chat/completions`: Put `openai/` in front of your model name, so litellm knows you're trying to call an openai `/chat/completions` endpoint.
2. For `/completions`: Put `text-completion-openai/` in front of your model name, so litellm knows you're trying to call an openai `/completions` endpoint. [NOT REQUIRED for `openai/` endpoints called via `/v1/completions` route].
1. For `/completions`: Put `text-completion-openai/` in front of your model name, so litellm knows you're trying to call an openai `/completions` endpoint. [NOT REQUIRED for `openai/` endpoints called via `/v1/completions` route].
2. **Do NOT** add anything additional to the base url e.g. `/v1/embedding`. LiteLLM uses the openai-client to make these calls, and that automatically adds the relevant endpoints.
1. **Do NOT** add anything additional to the base url e.g. `/v1/embedding`. LiteLLM uses the openai-client to make these calls, and that automatically adds the relevant endpoints.
## Usage - completion

View file

@ -229,6 +229,7 @@ const sidebars = {
"providers/fireworks_ai",
"providers/clarifai",
"providers/vllm",
"providers/llamafile",
"providers/infinity",
"providers/xinference",
"providers/cloudflare_workers",