(docs) proxy

2023-11-07 09:48:58 -08:00 · 2023-11-07 09:48:58 -08:00 · 87d0b72a4a
commit 87d0b72a4a
parent 50564ab38e
1 changed files with 3 additions and 168 deletions
--- a/docs/my-website/docs/simple_proxy.md
+++ b/docs/my-website/docs/simple_proxy.md
@ -4,20 +4,11 @@ import TabItem from '@theme/TabItem';
 # 💥 Evaluate LLMs - OpenAI Proxy Server
 A simple, fast, and lightweight **OpenAI-compatible server** to call 100+ LLM APIs.
 LiteLLM Server supports:
-* Call [Huggingface/Bedrock/TogetherAI/etc.](#other-supported-models) in the OpenAI ChatCompletions format
+* Call Call 100+ LLMs [Huggingface/Bedrock/TogetherAI/etc.](#other-supported-models) in the OpenAI `ChatCompletions` & `Completions` format
-* Set custom prompt templates + model-specific configs (temperature, max_tokens, etc.)
+* Set custom prompt templates + model-specific configs (`temperature`, `max_tokens`, etc.)
-* Caching (In-memory + Redis)
+* Caching Responses
 [**See Code**](https://github.com/BerriAI/litellm/tree/main/litellm_server)
 :::info
 We want to learn how we can make the server better! Meet the [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
 join our [discord](https://discord.gg/wuPM9dRgDw)
 ::: 
 ## Quick Start 
@ -347,162 +338,6 @@ $ cd ./litellm/litellm_server
 $ uvicorn main:app --host 0.0.0.0 --port 8000
 ```
 ## Setting LLM API keys
 This server allows two ways of passing API keys to litellm
 - Environment Variables - This server by default assumes the LLM API Keys are stored in the environment variables
 - Dynamic Variables passed to `/chat/completions`
  - Set `AUTH_STRATEGY=DYNAMIC` in the Environment 
  - Pass required auth params `api_key`,`api_base`, `api_version` with the request params
 <Tabs>
 <TabItem value="gcp-run" label="Google Cloud Run">
 #### Deploy on Google Cloud Run
 **Click the button** to deploy to Google Cloud Run
 [![Deploy](https://deploy.cloud.run/button.svg)](https://l.linklyhq.com/l/1uHtX)
 On a successfull deploy your Cloud Run Shell will have this output
 <Image img={require('../img/cloud_run0.png')} />
 ### Testing your deployed server
 **Assuming the required keys are set as Environment Variables**
 https://litellm-7yjrj3ha2q-uc.a.run.app is our example server, substitute it with your deployed cloud run app
 <Tabs>
 <TabItem value="openai" label="OpenAI">
 ```shell
 curl https://litellm-7yjrj3ha2q-uc.a.run.app/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "gpt-3.5-turbo",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'
 ```
 </TabItem>
 <TabItem value="azure" label="Azure">
 ```shell
 curl https://litellm-7yjrj3ha2q-uc.a.run.app/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "azure/<your-deployment-name>",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'
 ```
 </TabItem>
 <TabItem value="anthropic" label="Anthropic">
 ```shell
 curl https://litellm-7yjrj3ha2q-uc.a.run.app/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "claude-2",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7,
   }'
 ```
 </TabItem>
 </Tabs>
 ### Set LLM API Keys
 #### Environment Variables 
 More info [here](https://cloud.google.com/run/docs/configuring/services/environment-variables#console)
 1. In the Google Cloud console, go to Cloud Run: [Go to Cloud Run](https://console.cloud.google.com/run)
 2. Click on the **litellm** service
 <Image img={require('../img/cloud_run1.png')} />
 3. Click **Edit and Deploy New Revision**
 <Image img={require('../img/cloud_run2.png')} />
 4. Enter your Environment Variables
 Example `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`
 <Image img={require('../img/cloud_run3.png')} />
 </TabItem>
 <TabItem value="render" label="Render">
 #### Deploy on Render
 **Click the button** to deploy to Render
 [![Deploy](https://render.com/images/deploy-to-render-button.svg)](https://l.linklyhq.com/l/1uHsr)
 On a successfull deploy https://dashboard.render.com/ should display the following
 <Image img={require('../img/render1.png')} />
 <Image img={require('../img/render2.png')} />
 </TabItem>
 <TabItem value="aws-apprunner" label="AWS Apprunner">
 #### Deploy on AWS Apprunner
 1. Fork LiteLLM https://github.com/BerriAI/litellm 
 2. Navigate to to App Runner on AWS Console: https://console.aws.amazon.com/apprunner/home#/services
 3. Follow the steps in the video below
 <iframe width="800" height="450" src="https://www.loom.com/embed/5fccced4dde8461a8caeee97addb2231?sid=eac60660-073e-455e-a737-b3d05a5a756a" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
 4. Testing your deployed endpoint
  **Assuming the required keys are set as Environment Variables** Example: `OPENAI_API_KEY`
  https://b2w6emmkzp.us-east-1.awsapprunner.com is our example server, substitute it with your deployed apprunner endpoint
  <Tabs>
  <TabItem value="openai" label="OpenAI">
  ```shell
  curl https://b2w6emmkzp.us-east-1.awsapprunner.com/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-3.5-turbo",
      "messages": [{"role": "user", "content": "Say this is a test!"}],
      "temperature": 0.7
    }'
  ```
  </TabItem>
  <TabItem value="azure" label="Azure">
  ```shell
  curl https://b2w6emmkzp.us-east-1.awsapprunner.com/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "azure/<your-deployment-name>",
      "messages": [{"role": "user", "content": "Say this is a test!"}],
      "temperature": 0.7
    }'
  ```
  </TabItem>
  <TabItem value="anthropic" label="Anthropic">
  ```shell
  curl https://b2w6emmkzp.us-east-1.awsapprunner.com/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "claude-2",
      "messages": [{"role": "user", "content": "Say this is a test!"}],
      "temperature": 0.7,
    }'
  ```
  </TabItem>
  </Tabs>
 </TabItem>
 </Tabs>
 ## Advanced
 ### Caching - Completion() and Embedding() Responses