Compare commits

...
Sign in to create a new pull request.

11 commits

Author SHA1 Message Date
Krrish Dholakia
361d010068 test: update test to handle model overloaded error 2024-11-20 22:02:33 +05:30
Krrish Dholakia
f6ebba7538 test: fix test 2024-11-20 14:04:30 +05:30
Krrish Dholakia
4ddbf1395f test: remove duplicate test 2024-11-20 05:53:43 +05:30
Krrish Dholakia
f2e6c9c9d8 test: handle overloaded anthropic model error 2024-11-20 05:30:01 +05:30
Krrish Dholakia
7bc9f1299c docs(vertex.md): refactor docs 2024-11-20 05:28:35 +05:30
Krrish Dholakia
353558e978 test: fix test 2024-11-20 05:19:52 +05:30
Krrish Dholakia
aed1cd3283 fix: fix linting error 2024-11-20 05:17:27 +05:30
Krrish Dholakia
5be647dd76 build(ui/): add team admins via proxy ui 2024-11-20 05:11:28 +05:30
Krrish Dholakia
07ba537970 fix(databricks/chat.py): handle max_retries optional param handling for openai-like calls
Fixes issue with calling finetuned vertex ai models via databricks route
2024-11-20 04:56:42 +05:30
Show
e89dcccdd9
(feat): Add timestamp_granularities parameter to transcription API (#6457)
* Add timestamp_granularities parameter to transcription API

* add param to the local test
2024-11-20 04:34:33 +05:30
Krrish Dholakia
f4ec93fbc3 fix(anthropic/chat/transformation.py): add json schema as values: json_schema
fixes passing pydantic obj to anthropic

Fixes https://github.com/BerriAI/litellm/issues/6766
2024-11-19 22:04:39 +05:30
15 changed files with 200 additions and 193 deletions

View file

@ -572,6 +572,96 @@ Here's how to use Vertex AI with the LiteLLM Proxy Server
</Tabs>
## Authentication - vertex_project, vertex_location, etc.
Set your vertex credentials via:
- dynamic params
OR
- env vars
### **Dynamic Params**
You can set:
- `vertex_credentials` (str) - can be a json string or filepath to your vertex ai service account.json
- `vertex_location` (str) - place where vertex model is deployed (us-central1, asia-southeast1, etc.)
- `vertex_project` Optional[str] - use if vertex project different from the one in vertex_credentials
as dynamic params for a `litellm.completion` call.
<Tabs>
<TabItem value="sdk" label="SDK">
```python
from litellm import completion
import json
## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
response = completion(
model="vertex_ai/gemini-pro",
messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json,
vertex_project="my-special-project",
vertex_location="my-special-location"
)
```
</TabItem>
<TabItem value="proxy" label="PROXY">
```yaml
model_list:
- model_name: gemini-1.5-pro
litellm_params:
model: gemini-1.5-pro
vertex_credentials: os.environ/VERTEX_FILE_PATH_ENV_VAR # os.environ["VERTEX_FILE_PATH_ENV_VAR"] = "/path/to/service_account.json"
vertex_project: "my-special-project"
vertex_location: "my-special-location:
```
</TabItem>
</Tabs>
### **Environment Variables**
You can set:
- `GOOGLE_APPLICATION_CREDENTIALS` - store the filepath for your service_account.json in here (used by vertex sdk directly).
- VERTEXAI_LOCATION - place where vertex model is deployed (us-central1, asia-southeast1, etc.)
- VERTEXAI_PROJECT - Optional[str] - use if vertex project different from the one in vertex_credentials
1. GOOGLE_APPLICATION_CREDENTIALS
```bash
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json"
```
2. VERTEXAI_LOCATION
```bash
export VERTEXAI_LOCATION="us-central1" # can be any vertex location
```
3. VERTEXAI_PROJECT
```bash
export VERTEXAI_PROJECT="my-test-project" # ONLY use if model project is different from service account project
```
## Specifying Safety Settings
In certain use-cases you may need to make calls to the models and pass [safety settigns](https://ai.google.dev/docs/safety_setting_gemini) different from the defaults. To do so, simple pass the `safety_settings` argument to `completion` or `acompletion`. For example:
@ -2303,97 +2393,6 @@ print("response from proxy", response)
</TabItem>
</Tabs>
## Authentication - vertex_project, vertex_location, etc.
Set your vertex credentials via:
- dynamic params
OR
- env vars
### **Dynamic Params**
You can set:
- `vertex_credentials` (str) - can be a json string or filepath to your vertex ai service account.json
- `vertex_location` (str) - place where vertex model is deployed (us-central1, asia-southeast1, etc.)
- `vertex_project` Optional[str] - use if vertex project different from the one in vertex_credentials
as dynamic params for a `litellm.completion` call.
<Tabs>
<TabItem value="sdk" label="SDK">
```python
from litellm import completion
import json
## GET CREDENTIALS
file_path = 'path/to/vertex_ai_service_account.json'
# Load the JSON file
with open(file_path, 'r') as file:
vertex_credentials = json.load(file)
# Convert to JSON string
vertex_credentials_json = json.dumps(vertex_credentials)
response = completion(
model="vertex_ai/gemini-pro",
messages=[{"content": "You are a good bot.","role": "system"}, {"content": "Hello, how are you?","role": "user"}],
vertex_credentials=vertex_credentials_json,
vertex_project="my-special-project",
vertex_location="my-special-location"
)
```
</TabItem>
<TabItem value="proxy" label="PROXY">
```yaml
model_list:
- model_name: gemini-1.5-pro
litellm_params:
model: gemini-1.5-pro
vertex_credentials: os.environ/VERTEX_FILE_PATH_ENV_VAR # os.environ["VERTEX_FILE_PATH_ENV_VAR"] = "/path/to/service_account.json"
vertex_project: "my-special-project"
vertex_location: "my-special-location:
```
</TabItem>
</Tabs>
### **Environment Variables**
You can set:
- `GOOGLE_APPLICATION_CREDENTIALS` - store the filepath for your service_account.json in here (used by vertex sdk directly).
- VERTEXAI_LOCATION - place where vertex model is deployed (us-central1, asia-southeast1, etc.)
- VERTEXAI_PROJECT - Optional[str] - use if vertex project different from the one in vertex_credentials
1. GOOGLE_APPLICATION_CREDENTIALS
```bash
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service_account.json"
```
2. VERTEXAI_LOCATION
```bash
export VERTEXAI_LOCATION="us-central1" # can be any vertex location
```
3. VERTEXAI_PROJECT
```bash
export VERTEXAI_PROJECT="my-test-project" # ONLY use if model project is different from service account project
```
## Extra
### Using `GOOGLE_APPLICATION_CREDENTIALS`

View file

@ -374,7 +374,7 @@ class AnthropicConfig:
_input_schema["additionalProperties"] = True
_input_schema["properties"] = {}
else:
_input_schema["properties"] = json_schema
_input_schema["properties"] = {"values": json_schema}
_tool = AnthropicMessagesTool(name="json_tool_call", input_schema=_input_schema)
return _tool

View file

@ -470,6 +470,9 @@ class DatabricksChatCompletion(BaseLLM):
optional_params[k] = v
stream: bool = optional_params.get("stream", None) or False
optional_params.pop(
"max_retries", None
) # [TODO] add max retry support at llm api call level
optional_params["stream"] = stream
data = {

View file

@ -4729,6 +4729,7 @@ def transcription(
response_format: Optional[
Literal["json", "text", "srt", "verbose_json", "vtt"]
] = None,
timestamp_granularities: Optional[List[Literal["word", "segment"]]] = None,
temperature: Optional[int] = None, # openai defaults this to 0
## LITELLM PARAMS ##
user: Optional[str] = None,
@ -4778,6 +4779,7 @@ def transcription(
language=language,
prompt=prompt,
response_format=response_format,
timestamp_granularities=timestamp_granularities,
temperature=temperature,
custom_llm_provider=custom_llm_provider,
drop_params=drop_params,

View file

@ -1884,7 +1884,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 264,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"claude-3-5-haiku-20241022": {
"max_tokens": 8192,
@ -1900,7 +1901,8 @@
"tool_use_system_prompt_tokens": 264,
"supports_assistant_prefill": true,
"supports_prompt_caching": true,
"supports_pdf_input": true
"supports_pdf_input": true,
"supports_response_schema": true
},
"claude-3-opus-20240229": {
"max_tokens": 4096,
@ -1916,7 +1918,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 395,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"claude-3-sonnet-20240229": {
"max_tokens": 4096,
@ -1930,7 +1933,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 159,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"claude-3-5-sonnet-20240620": {
"max_tokens": 8192,
@ -1946,7 +1950,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 159,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"claude-3-5-sonnet-20241022": {
"max_tokens": 8192,
@ -1962,7 +1967,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 159,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"text-bison": {
"max_tokens": 2048,
@ -3852,22 +3858,6 @@
"supports_function_calling": true,
"tool_use_system_prompt_tokens": 264
},
"anthropic/claude-3-5-sonnet-20241022": {
"max_tokens": 8192,
"max_input_tokens": 200000,
"max_output_tokens": 8192,
"input_cost_per_token": 0.000003,
"output_cost_per_token": 0.000015,
"cache_creation_input_token_cost": 0.00000375,
"cache_read_input_token_cost": 0.0000003,
"litellm_provider": "anthropic",
"mode": "chat",
"supports_function_calling": true,
"supports_vision": true,
"tool_use_system_prompt_tokens": 159,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
},
"openrouter/anthropic/claude-3.5-sonnet": {
"max_tokens": 8192,
"max_input_tokens": 200000,

View file

@ -2125,6 +2125,7 @@ def get_optional_params_transcription(
prompt: Optional[str] = None,
response_format: Optional[str] = None,
temperature: Optional[int] = None,
timestamp_granularities: Optional[List[Literal["word", "segment"]]] = None,
custom_llm_provider: Optional[str] = None,
drop_params: Optional[bool] = None,
**kwargs,

View file

@ -1884,7 +1884,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 264,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"claude-3-5-haiku-20241022": {
"max_tokens": 8192,
@ -1900,7 +1901,8 @@
"tool_use_system_prompt_tokens": 264,
"supports_assistant_prefill": true,
"supports_prompt_caching": true,
"supports_pdf_input": true
"supports_pdf_input": true,
"supports_response_schema": true
},
"claude-3-opus-20240229": {
"max_tokens": 4096,
@ -1916,7 +1918,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 395,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"claude-3-sonnet-20240229": {
"max_tokens": 4096,
@ -1930,7 +1933,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 159,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"claude-3-5-sonnet-20240620": {
"max_tokens": 8192,
@ -1946,7 +1950,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 159,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"claude-3-5-sonnet-20241022": {
"max_tokens": 8192,
@ -1962,7 +1967,8 @@
"supports_vision": true,
"tool_use_system_prompt_tokens": 159,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
"supports_prompt_caching": true,
"supports_response_schema": true
},
"text-bison": {
"max_tokens": 2048,
@ -3852,22 +3858,6 @@
"supports_function_calling": true,
"tool_use_system_prompt_tokens": 264
},
"anthropic/claude-3-5-sonnet-20241022": {
"max_tokens": 8192,
"max_input_tokens": 200000,
"max_output_tokens": 8192,
"input_cost_per_token": 0.000003,
"output_cost_per_token": 0.000015,
"cache_creation_input_token_cost": 0.00000375,
"cache_read_input_token_cost": 0.0000003,
"litellm_provider": "anthropic",
"mode": "chat",
"supports_function_calling": true,
"supports_vision": true,
"tool_use_system_prompt_tokens": 159,
"supports_assistant_prefill": true,
"supports_prompt_caching": true
},
"openrouter/anthropic/claude-3.5-sonnet": {
"max_tokens": 8192,
"max_input_tokens": 200000,

View file

@ -42,11 +42,14 @@ class BaseLLMChatTest(ABC):
"content": [{"type": "text", "text": "Hello, how are you?"}],
}
]
response = litellm.completion(
**base_completion_call_args,
messages=messages,
)
assert response is not None
try:
response = litellm.completion(
**base_completion_call_args,
messages=messages,
)
assert response is not None
except litellm.InternalServerError:
pass
# for OpenAI the content contains the JSON schema, so we need to assert that the content is not None
assert response.choices[0].message.content is not None
@ -89,6 +92,36 @@ class BaseLLMChatTest(ABC):
# relevant issue: https://github.com/BerriAI/litellm/issues/6741
assert response.choices[0].message.content is not None
def test_json_response_pydantic_obj(self):
from pydantic import BaseModel
from litellm.utils import supports_response_schema
os.environ["LITELLM_LOCAL_MODEL_COST_MAP"] = "True"
litellm.model_cost = litellm.get_model_cost_map(url="")
class TestModel(BaseModel):
first_response: str
base_completion_call_args = self.get_base_completion_call_args()
if not supports_response_schema(base_completion_call_args["model"], None):
pytest.skip("Model does not support response schema")
try:
res = litellm.completion(
**base_completion_call_args,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": "What is the capital of France?",
},
],
response_format=TestModel,
)
assert res is not None
except litellm.InternalServerError:
pytest.skip("Model is overloaded")
def test_json_response_format_stream(self):
"""
Test that the JSON response format with streaming is supported by the LLM API

View file

@ -657,7 +657,7 @@ def test_create_json_tool_call_for_response_format():
_input_schema = tool.get("input_schema")
assert _input_schema is not None
assert _input_schema.get("type") == "object"
assert _input_schema.get("properties") == custom_schema
assert _input_schema.get("properties") == {"values": custom_schema}
assert "additionalProperties" not in _input_schema

View file

@ -923,7 +923,6 @@ def test_watsonx_text_top_k():
assert optional_params["top_k"] == 10
def test_together_ai_model_params():
optional_params = get_optional_params(
model="together_ai", custom_llm_provider="together_ai", logprobs=1
@ -931,6 +930,7 @@ def test_together_ai_model_params():
print(optional_params)
assert optional_params["logprobs"] == 1
def test_forward_user_param():
from litellm.utils import get_supported_openai_params, get_optional_params
@ -943,6 +943,7 @@ def test_forward_user_param():
assert optional_params["metadata"]["user_id"] == "test_user"
def test_lm_studio_embedding_params():
optional_params = get_optional_params_embeddings(
model="lm_studio/gemma2-9b-it",

View file

@ -3129,9 +3129,12 @@ async def test_vertexai_embedding_finetuned(respx_mock: MockRouter):
assert all(isinstance(x, float) for x in embedding["embedding"])
@pytest.mark.parametrize("max_retries", [None, 3])
@pytest.mark.asyncio
@pytest.mark.respx
async def test_vertexai_model_garden_model_completion(respx_mock: MockRouter):
async def test_vertexai_model_garden_model_completion(
respx_mock: MockRouter, max_retries
):
"""
Relevant issue: https://github.com/BerriAI/litellm/issues/6480
@ -3189,6 +3192,7 @@ async def test_vertexai_model_garden_model_completion(respx_mock: MockRouter):
messages=messages,
vertex_project="633608382793",
vertex_location="us-central1",
max_retries=max_retries,
)
# Assert request was made correctly

View file

@ -1222,32 +1222,6 @@ def test_completion_mistral_api_modified_input():
pytest.fail(f"Error occurred: {e}")
def test_completion_claude2_1():
try:
litellm.set_verbose = True
print("claude2.1 test request")
messages = [
{
"role": "system",
"content": "Your goal is generate a joke on the topic user gives.",
},
{"role": "user", "content": "Generate a 3 liner joke for me"},
]
# test without max tokens
response = completion(model="claude-2.1", messages=messages)
# Add any assertions here to check the response
print(response)
print(response.usage)
print(response.usage.completion_tokens)
print(response["usage"]["completion_tokens"])
# print("new cost tracking")
except Exception as e:
pytest.fail(f"Error occurred: {e}")
# test_completion_claude2_1()
@pytest.mark.asyncio
async def test_acompletion_claude2_1():
try:
@ -1268,6 +1242,8 @@ async def test_acompletion_claude2_1():
print(response.usage.completion_tokens)
print(response["usage"]["completion_tokens"])
# print("new cost tracking")
except litellm.InternalServerError:
pytest.skip("model is overloaded.")
except Exception as e:
pytest.fail(f"Error occurred: {e}")
@ -4514,19 +4490,22 @@ async def test_dynamic_azure_params(stream, sync_mode):
@pytest.mark.flaky(retries=3, delay=1)
async def test_completion_ai21_chat():
litellm.set_verbose = True
response = await litellm.acompletion(
model="jamba-1.5-large",
user="ishaan",
tool_choice="auto",
seed=123,
messages=[{"role": "user", "content": "what does the document say"}],
documents=[
{
"content": "hello world",
"metadata": {"source": "google", "author": "ishaan"},
}
],
)
try:
response = await litellm.acompletion(
model="jamba-1.5-large",
user="ishaan",
tool_choice="auto",
seed=123,
messages=[{"role": "user", "content": "what does the document say"}],
documents=[
{
"content": "hello world",
"metadata": {"source": "google", "author": "ishaan"},
}
],
)
except litellm.InternalServerError:
pytest.skip("Model is overloaded")
@pytest.mark.parametrize(

View file

@ -51,10 +51,15 @@ from litellm import Router
),
],
)
@pytest.mark.parametrize("response_format", ["json", "vtt"])
@pytest.mark.parametrize(
"response_format, timestamp_granularities",
[("json", None), ("vtt", None), ("verbose_json", ["word"])],
)
@pytest.mark.parametrize("sync_mode", [True, False])
@pytest.mark.asyncio
async def test_transcription(model, api_key, api_base, response_format, sync_mode):
async def test_transcription(
model, api_key, api_base, response_format, sync_mode, timestamp_granularities
):
if sync_mode:
transcript = litellm.transcription(
model=model,
@ -62,6 +67,7 @@ async def test_transcription(model, api_key, api_base, response_format, sync_mod
api_key=api_key,
api_base=api_base,
response_format=response_format,
timestamp_granularities=timestamp_granularities,
drop_params=True,
)
else:

View file

@ -314,13 +314,6 @@ const AdminPanel: React.FC<AdminPanelProps> = ({
className="px-3 py-2 border rounded-md w-full"
/>
</Form.Item>
{/* <div className="text-center mb-4">OR</div>
<Form.Item label="User ID" name="user_id" className="mb-4">
<Input
name="user_id"
className="px-3 py-2 border rounded-md w-full"
/>
</Form.Item> */}
</>
<div style={{ textAlign: "right", marginTop: "10px" }} className="mt-4">
<Button2 htmlType="submit">Add member</Button2>

View file

@ -381,7 +381,7 @@ const Team: React.FC<TeamProps> = ({
if (accessToken != null && teams != null) {
message.info("Adding Member");
const user_role: Member = {
role: "user",
role: formValues.role,
user_email: formValues.user_email,
user_id: formValues.user_id,
};
@ -809,6 +809,12 @@ const Team: React.FC<TeamProps> = ({
className="px-3 py-2 border rounded-md w-full"
/>
</Form.Item>
<Form.Item label="Member Role" name="role" className="mb-4">
<Select2 defaultValue="user">
<Select2.Option value="user">user</Select2.Option>
<Select2.Option value="admin">admin</Select2.Option>
</Select2>
</Form.Item>
</>
<div style={{ textAlign: "right", marginTop: "10px" }}>
<Button2 htmlType="submit">Add member</Button2>