ci(config.yml): run prisma generate before testing

2024-01-10 21:26:38 +05:30 · 2024-01-10 21:26:38 +05:30 · e44d3e51aa
commit e44d3e51aa
parent 7f269e92c5
2 changed files with 67 additions and 53 deletions
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@ -45,6 +45,13 @@ jobs:
          paths:
            - ./venv
          key: v1-dependencies-{{ checksum ".circleci/requirements.txt" }}
+      - run:
+          name: Run prisma ./entrypoint.sh
+          command: |
+            set +e
+            chmod +x entrypoint.sh
+            ./entrypoint.sh
+            set -e
      - run:
          name: Black Formatting
          command: |
--- a/docs/my-website/docs/routing.md
+++ b/docs/my-website/docs/routing.md
@ -77,7 +77,65 @@ print(response)
 Router provides 4 strategies for routing your calls across multiple deployments: 

 <Tabs>
-<TabItem value="simple-shuffle" label="Weighted Pick">
+<TabItem value="latency-based" label="Latency-Based">
+
+
+Picks the deployment with the lowest response time.
+
+It caches, and updates the response times for deployments based on when a request was sent and received from a deployment.
+
+[**How to test**](https://github.com/BerriAI/litellm/blob/main/litellm/tests/test_lowest_latency_routing.py)
+
+```python
+from litellm import Router 
+import asyncio
+
+model_list = [{ ... }]
+
+# init router
+router = Router(model_list=model_list, routing_strategy="latency-based-routing") # 👈 set routing strategy
+
+## CALL 1+2
+tasks = []
+response = None
+final_response = None
+for _ in range(2):
+	tasks.append(router.acompletion(model=model, messages=messages))
+response = await asyncio.gather(*tasks)
+
+if response is not None:
+	## CALL 3 
+	await asyncio.sleep(1)  # let the cache update happen
+	picked_deployment = router.lowestlatency_logger.get_available_deployments(
+		model_group=model, healthy_deployments=router.healthy_deployments
+	)
+	final_response = await router.acompletion(model=model, messages=messages)
+	print(f"min deployment id: {picked_deployment}")
+	print(f"model id: {final_response._hidden_params['model_id']}")
+	assert (
+		final_response._hidden_params["model_id"]
+		== picked_deployment["model_info"]["id"]
+	)
+```
+
+### Set Time Window 
+
+Set time window for how far back to consider when averaging latency for a deployment. 
+
+**In Router**
+```python 
+router = Router(..., routing_strategy_args={"ttl": 10})
+```
+
+**In Proxy**
+
+```yaml
+router_settings:
+	routing_strategy_args: {"ttl": 10}
+```
+
+</TabItem>
+<TabItem value="simple-shuffle" label="(Default) Weighted Pick">

 **Default** Picks a deployment based on the provided **Requests per minute (rpm) or Tokens per minute (tpm)**

@ -235,58 +293,7 @@ asyncio.run(router_acompletion())
 ```

 </TabItem>
-<TabItem value="latency-based" label="Latency-Based">

-
-Picks the deployment with the lowest response time.
-
-It caches, and updates the response times for deployments based on when a request was sent and received from a deployment.
-
-[**How to test**](https://github.com/BerriAI/litellm/blob/main/litellm/tests/test_lowest_latency_routing.py)
-
-```python
-from litellm import Router 
-import asyncio
-
-model_list = [{ # list of model deployments 
-	"model_name": "gpt-3.5-turbo", # model alias 
-	"litellm_params": { # params for litellm completion/embedding call 
-		"model": "azure/chatgpt-v-2", # actual model name
-		"api_key": os.getenv("AZURE_API_KEY"),
-		"api_version": os.getenv("AZURE_API_VERSION"),
-		"api_base": os.getenv("AZURE_API_BASE"),
-	}
-}, {
-    "model_name": "gpt-3.5-turbo", 
-	"litellm_params": { # params for litellm completion/embedding call 
-		"model": "azure/chatgpt-functioncalling", 
-		"api_key": os.getenv("AZURE_API_KEY"),
-		"api_version": os.getenv("AZURE_API_VERSION"),
-		"api_base": os.getenv("AZURE_API_BASE"),
-	}
-}, {
-    "model_name": "gpt-3.5-turbo", 
-	"litellm_params": { # params for litellm completion/embedding call 
-		"model": "gpt-3.5-turbo", 
-		"api_key": os.getenv("OPENAI_API_KEY"),
-	}
-}]
-
-# init router
-router = Router(model_list=model_list, routing_strategy="latency-based-routing")
-async def router_acompletion():
-	response = await router.acompletion(
-		model="gpt-3.5-turbo", 
-		messages=[{"role": "user", "content": "Hey, how's it going?"}]
-	)
-	print(response)
-	return response
-
-asyncio.run(router_acompletion())
-```
-
-
-</TabItem>
 </Tabs>

 ## Basic Reliability
@ -608,4 +615,4 @@ def __init__(
 		"latency-based-routing",
 	] = "simple-shuffle",
 ):
-```
+```