docs(proxy_server): doc cleanup

2023-10-07 17:28:55 -07:00 · 2023-10-07 17:28:55 -07:00 · 51e5e2b8d5
commit 51e5e2b8d5
parent 051b21b61f
3 changed files with 105 additions and 68 deletions
--- a/.gitignore
+++ b/.gitignore
@ -4,3 +4,4 @@ litellm_uuid.txt
 __pycache__/
 bun.lockb
 **/.DS_Store
 .aider*
--- a/docs/my-website/docs/proxy_server.md
+++ b/docs/my-website/docs/proxy_server.md
@ -3,38 +3,25 @@ import TabItem from '@theme/TabItem';
 # OpenAI Proxy Server
-CLI Tool to create a LLM Proxy Server to translate openai api calls to any non-openai model (e.g. Huggingface, TogetherAI, Ollama, etc.) 100+ models [Provider List](https://docs.litellm.ai/docs/providers).
+A local, fast, and lightweight OpenAI-compatible server to call 100+ LLM APIs. 
-## Quick start
+## usage 
 Call Ollama models through your OpenAI proxy.
 ### Start Proxy
 ```shell
-$ pip install litellm
+pip install litellm
 ```
 ```shell 
-$ litellm --model ollama/llama2
+$ litellm --model ollama/codellama 
-#INFO:     Uvicorn running on http://0.0.0.0:8000
+#INFO: Ollama running on http://0.0.0.0:8000
 ```
-This will host a local proxy api at: **http://0.0.0.0:8000**
+### test
-
+In a new shell, run: 
 Let's see if it works
 ```shell
-$ curl --location 'http://0.0.0.0:8000/chat/completions' \
+$ litellm --test
 --header 'Content-Type: application/json' \
 --data '{
  "messages": [
    {
      "role": "user", 
      "content": "what do you know?"
    }
  ], 
 }'
 ``` 
-### Replace OpenAI Base
+### replace openai base
 ```python
 import openai 
@ -145,6 +132,81 @@ $ litellm --model command-nightly
 [**Jump to Code**](https://github.com/BerriAI/litellm/blob/fef4146396d5d87006259e00095a62e3900d6bb4/litellm/proxy.py#L36)
 ## [tutorial]: Use with Aider/AutoGen/Continue-Dev
 Here's how to use the proxy to test codellama/mistral/etc. models for different github repos 
 ```shell
 pip install litellm
 ```
 ```shell
 $ ollama pull codellama # OUR Local CodeLlama  
 $ litellm --model ollama/codellama --temperature 0.3 --max_tokens 2048
 ```
 Implementation for different repos 
 <Tabs>
 <TabItem value="aider" label="Aider">
 ```shell
 $ pip install aider 
 $ aider --openai-api-base http://0.0.0.0:8000 --openai-api-key fake-key
 ```
 </TabItem>
 <TabItem value="continue-dev" label="ContinueDev">
 Continue-Dev brings ChatGPT to VSCode. See how to [install it here](https://continue.dev/docs/quickstart).
 In the [config.py](https://continue.dev/docs/reference/Models/openai) set this as your default model.
 ```python
  default=OpenAI(
      api_key="IGNORED",
      model="fake-model-name",
      context_length=2048,
      api_base="http://your_litellm_hostname:8000"
  ),
 ```
 Credits [@vividfog](https://github.com/jmorganca/ollama/issues/305#issuecomment-1751848077) for this tutorial. 
 </TabItem>
 <TabItem value="autogen" label="AutoGen">
 ```python
 pip install pyautogen
 ```
 ```python
 from autogen import AssistantAgent, UserProxyAgent, oai
 config_list=[
    {
        "model": "my-fake-model",
        "api_base": "http://localhost:8000/v1",  #litellm compatible endpoint
        "api_type": "open_ai",
        "api_key": "NULL", # just a placeholder
    }
 ]
 response = oai.Completion.create(config_list=config_list, prompt="Hi")
 print(response) # works fine
 assistant = AssistantAgent("assistant")
 user_proxy = UserProxyAgent("user_proxy")
 user_proxy.initiate_chat(assistant, message="Plot a chart of META and TESLA stock price change YTD.", config_list=config_list)
 # fails with the error: openai.error.AuthenticationError: No API key provided.
 ```
 Credits [@victordibia](https://github.com/microsoft/autogen/issues/45#issuecomment-1749921972) for this tutorial.
 </TabItem>
 </Tabs>
 :::note
 **Contribute** Using this server with a project? Contribute your tutorial here!
 ::: 
 ## Configure Model
 To save api keys and/or customize model prompt, run: 
@ -207,44 +269,3 @@ This will host a ChatCompletions API at: https://api.litellm.ai/44508ad4
 </TabItem>
 </Tabs>
 ## Tutorial - using HuggingFace LLMs with aider 
 [Aider](https://github.com/paul-gauthier/aider) is an AI pair programming in your terminal.
 But it only accepts OpenAI API Calls. 
 In this tutorial we'll use Aider with WizardCoder (hosted on HF Inference Endpoints).
 [NOTE]: To learn how to deploy a model on Huggingface 
 ### Step 1: Install aider and litellm
 ```shell 
 $ pip install aider-chat litellm
 ```
 ### Step 2: Spin up local proxy
 Save your huggingface api key in your local environment (can also do this via .env)
 ```shell
 $ export HUGGINGFACE_API_KEY=my-huggingface-api-key
 ```
 Point your local proxy to your model endpoint
 ```shell 
 $ litellm \
  --model huggingface/WizardLM/WizardCoder-Python-34B-V1.0 \
  --api_base https://my-endpoint.huggingface.com
 ```
 This will host a local proxy api at: **http://0.0.0.0:8000**
 ### Step 3: Replace openai api base in Aider
 Aider lets you set the openai api base. So lets point it to our proxy instead. 
 ```shell
 $ aider --openai-api-base http://0.0.0.0:8000
 ```
 And that's it! 
--- a/litellm/proxy/proxy_server.py
+++ b/litellm/proxy/proxy_server.py
@ -1,4 +1,4 @@
-import sys, os
+import sys, os, platform
 sys.path.insert(
    0, os.path.abspath("../..")
 )  # Adds the parent directory to the system path
@ -19,7 +19,7 @@ print()
 import litellm
 from fastapi import FastAPI, Request
 from fastapi.routing import APIRouter
-from fastapi.responses import StreamingResponse
+from fastapi.responses import StreamingResponse, FileResponse
 import json
 app = FastAPI()
@ -203,4 +203,19 @@ async def chat_completion(request: Request):
    print_verbose(f"response: {response}")
    return response
@router.get("/ollama_logs")
 async def retrieve_server_log(request: Request):
    filepath = os.path.expanduser('~/.ollama/logs/server.log')
    return FileResponse(filepath)
 # @router.get("/ollama_logs")
 # async def chat_completion(request: Request):
 #     if platform.system() == "Darwin":
 #         print("This is a MacOS system.")
 #     elif platform.system() == "Linux":
 #         print("This is a Linux system.")
 #     else:
 #         print("This is an unknown operating system.")
 app.include_router(router)