forked from phoenix/litellm-mirror
docs: add time.sleep() between streaming calls
LiteLLM's cache appears to be updated in the background. Without this `time.sleep()` call, both responses take `0.8s` to return, but after adding it, the second response returns in `0.006s`.
This commit is contained in:
parent
f1147696a3
commit
0533f77138
1 changed files with 4 additions and 1 deletions
|
@ -51,8 +51,10 @@ LiteLLM can cache your streamed responses for you
|
||||||
### Usage
|
### Usage
|
||||||
```python
|
```python
|
||||||
import litellm
|
import litellm
|
||||||
|
import time
|
||||||
from litellm import completion
|
from litellm import completion
|
||||||
from litellm.caching import Cache
|
from litellm.caching import Cache
|
||||||
|
|
||||||
litellm.cache = Cache(type="hosted")
|
litellm.cache = Cache(type="hosted")
|
||||||
|
|
||||||
# Make completion calls
|
# Make completion calls
|
||||||
|
@ -64,6 +66,7 @@ response1 = completion(
|
||||||
for chunk in response1:
|
for chunk in response1:
|
||||||
print(chunk)
|
print(chunk)
|
||||||
|
|
||||||
|
time.sleep(1) # cache is updated asynchronously
|
||||||
|
|
||||||
response2 = completion(
|
response2 = completion(
|
||||||
model="gpt-3.5-turbo",
|
model="gpt-3.5-turbo",
|
||||||
|
@ -72,4 +75,4 @@ response2 = completion(
|
||||||
caching=True)
|
caching=True)
|
||||||
for chunk in response2:
|
for chunk in response2:
|
||||||
print(chunk)
|
print(chunk)
|
||||||
```
|
```
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue