docs(completion-docs): adds more details on provider-specific params

2025-04-27 11:43:54 +00:00 · 2023-10-07 13:49:19 -07:00 · 2023-10-07 13:49:19 -07:00 · c6d36fb59d
commit c6d36fb59d
parent b4501f0241
8 changed files with 749 additions and 35 deletions
--- a/docs/my-website/docs/completion/input.md
+++ b/docs/my-website/docs/completion/input.md
@ -1,5 +1,426 @@
-# Input Format
+import Tabs from '@theme/Tabs';
-The Input params are **exactly the same** as the <a href="https://platform.openai.com/docs/api-reference/chat/create" target="_blank" rel="noopener noreferrer">OpenAI Create chat completion</a>, and let you call 100+ models in the same format. 
+import TabItem from '@theme/TabItem';
 # Input Params
 ## Common Params 
 LiteLLM accepts and translates the [OpenAI Chat Completion params](https://platform.openai.com/docs/api-reference/chat/create) across all providers. 
 ### usage
 ```python
 import litellm
 # set env variables
 os.environ["OPENAI_API_KEY"] = "your-openai-key"
 ## SET MAX TOKENS - via completion() 
 response = litellm.completion(
            model="gpt-3.5-turbo",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 print(response)
 ```
 ### translated OpenAI params
 This is a list of openai params we translate across providers.
 This list is constantly being updated.
 | Provider | temperature | max_tokens | top_p | stream | stop | n | presence_penalty | frequency_penalty | functions | function_call |
 |---|---|---|---|---|---|---|---|---|---|---|
 |Anthropic| ✅ | ✅ | ✅ | ✅ | ✅ |  |  |   |  |   |
 |OpenAI| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 |Replicate | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |   |  |   |
 |Cohere| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |   |   |
 |Huggingface| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |  |   |    |
 |Openrouter| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 |AI21| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |   |
 |VertexAI| ✅ | ✅ |  | ✅ |  |  |  |  |  |   |
 |Bedrock| ✅ | ✅ | ✅ | ✅ | ✅ |  |  |   |  |   |
 |Sagemaker| ✅ | ✅ |  | ✅ |  |  |  |  |  |   |
 |TogetherAI| ✅ | ✅ | ✅ | ✅ | ✅ |  |  |   |  |   |
 |AlephAlpha| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |   |  |   |
 |Palm| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |  |  |   |
 |NLP Cloud| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |  |  |   |
 |Petals| ✅ | ✅ |  | ✅ | |  |   |  |  |   |
 |Ollama| ✅ | ✅ | ✅ | ✅ | ✅ |  |   | ✅ |  |   |n
 :::note
 By default, LiteLLM raises an exception if the openai param being passed in isn't supported. 
 To drop the param instead, set `litellm.drop_params = True`.
 ::: 
 ## Provider-specific Params
 Providers might offer params not supported by OpenAI (e.g. top_k). You can pass those in 2 ways: 
 - via completion(): We'll pass the non-openai param, straight to the provider as part of the request body.
    - e.g. `completion(model="claude-instant-1", top_k=3)`
 - via provider-specific config variable (e.g. `litellm.OpenAIConfig()`). 
 <Tabs>
 <TabItem value="openai" label="OpenAI">
 ```python
 import litellm, os
 # set env variables
 os.environ["OPENAI_API_KEY"] = "your-openai-key"
 ## SET MAX TOKENS - via completion() 
 response_1 = litellm.completion(
            model="gpt-3.5-turbo",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.OpenAIConfig(max_tokens=10)
 response_2 = litellm.completion(
            model="gpt-3.5-turbo",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="openai-text" label="OpenAI Text Completion">
 ```python
 import litellm, os
 # set env variables
 os.environ["OPENAI_API_KEY"] = "your-openai-key"
 ## SET MAX TOKENS - via completion() 
 response_1 = litellm.completion(
            model="text-davinci-003",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.OpenAITextCompletionConfig(max_tokens=10)
 response_2 = litellm.completion(
            model="text-davinci-003",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="azure-openai" label="Azure OpenAI">
 ```python
 import litellm, os
 # set env variables
 os.environ["AZURE_API_BASE"] = "your-azure-api-base"
 os.environ["AZURE_API_TYPE"] = "azure" # [OPTIONAL] 
 os.environ["AZURE_API_VERSION"] = "2023-07-01-preview" # [OPTIONAL]
 ## SET MAX TOKENS - via completion() 
 response_1 = litellm.completion(
            model="azure/chatgpt-v-2",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.AzureOpenAIConfig(max_tokens=10)
 response_2 = litellm.completion(
            model="azure/chatgpt-v-2",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="anthropic" label="Anthropic">
 ```python
 import litellm, os 
 # set env variables
 os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
 ## SET MAX TOKENS - via completion()
 response_1 = litellm.completion(
            model="claude-instant-1",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.AnthropicConfig(max_tokens_to_sample=200)
 response_2 = litellm.completion(
            model="claude-instant-1",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="huggingface" label="Huggingface">
 ```python
 import litellm, os 
 # set env variables
 os.environ["HUGGINGFACE_API_KEY"] = "your-huggingface-key" #[OPTIONAL]
 ## SET MAX TOKENS - via completion()
 response_1 = litellm.completion(
            model="huggingface/mistralai/Mistral-7B-Instruct-v0.1",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            api_base="https://your-huggingface-api-endpoint",
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.HuggingfaceConfig(max_new_tokens=200)
 response_2 = litellm.completion(
            model="huggingface/mistralai/Mistral-7B-Instruct-v0.1",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            api_base="https://your-huggingface-api-endpoint"
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="together_ai" label="TogetherAI">
 ```python
 import litellm, os 
 # set env variables
 os.environ["TOGETHERAI_API_KEY"] = "your-togetherai-key" 
 ## SET MAX TOKENS - via completion()
 response_1 = litellm.completion(
            model="together_ai/togethercomputer/llama-2-70b-chat",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.TogetherAIConfig(max_tokens_to_sample=200)
 response_2 = litellm.completion(
            model="together_ai/togethercomputer/llama-2-70b-chat",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="replicate" label="Replicate">
 ```python
 import litellm, os 
 # set env variables
 os.environ["REPLICATE_API_KEY"] = "your-replicate-key" 
 ## SET MAX TOKENS - via completion()
 response_1 = litellm.completion(
            model="replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.ReplicateConfig(max_new_tokens=200)
 response_2 = litellm.completion(
            model="replicate/meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="petals" label="Petals">
 ```python
 import litellm
 ## SET MAX TOKENS - via completion()
 response_1 = litellm.completion(
            model="petals/petals-team/StableBeluga2",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            api_base="https://chat.petals.dev/api/v1/generate",
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.PetalsConfig(max_new_tokens=10)
 response_2 = litellm.completion(
            model="petals/petals-team/StableBeluga2",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            api_base="https://chat.petals.dev/api/v1/generate",
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="palm" label="Palm">
 ```python
 import litellm, os 
 # set env variables
 os.environ["PALM_API_KEY"] = "your-palm-key"  
 ## SET MAX TOKENS - via completion()
 response_1 = litellm.completion(
            model="palm/chat-bison",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.PalmConfig(maxOutputTokens=10)
 response_2 = litellm.completion(
            model="palm/chat-bison",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="ai21" label="AI21">
 ```python
 import litellm, os 
 # set env variables
 os.environ["AI21_API_KEY"] = "your-ai21-key"  
 ## SET MAX TOKENS - via completion()
 response_1 = litellm.completion(
            model="j2-mid",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.AI21Config(maxOutputTokens=10)
 response_2 = litellm.completion(
            model="j2-mid",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 <TabItem value="cohere" label="Cohere">
 ```python
 import litellm, os 
 # set env variables
 os.environ["COHERE_API_KEY"] = "your-cohere-key"   
 ## SET MAX TOKENS - via completion()
 response_1 = litellm.completion(
            model="command-nightly",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
 response_1_text = response_1.choices[0].message.content
 ## SET MAX TOKENS - via config
 litellm.CohereConfig(max_tokens=200)
 response_2 = litellm.completion(
            model="command-nightly",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
 response_2_text = response_2.choices[0].message.content
 ## TEST OUTPUT
 assert len(response_2_text) > len(response_1_text)
 ```
 </TabItem>
 </Tabs>
 [**Check out the tutorial!**](../tutorials/provider_specific_params.md)
 ## Input - Request Body
 # Request Body
@ -54,31 +475,3 @@ The Input params are **exactly the same** as the <a href="https://platform.opena
 - `logit_bias`: *map (optional)* - Used to modify the probability of specific tokens appearing in the completion.
 - `user`: *string (optional)* - A unique identifier representing your end-user. This can help OpenAI to monitor and detect abuse.
 # Params supported across providers
 This is a list of openai params we translate across providers. You can  send any provider-specific param by just including it in completion(). 
 E.g. If Anthropic supports top_k, then `completion(model="claude-2", .., top_k=3)` would send the value straight to Anthropic.
 This list is constantly being updated.
 | Provider | temperature | max_tokens | top_p | stream | stop | n | presence_penalty | frequency_penalty | functions | function_call |
 |---|---|---|---|---|---|---|---|---|---|---|
 |Anthropic| ✅ | ✅ | ✅ | ✅ | ✅ |  |  |   |  |   |
 |OpenAI| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 |Replicate | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |   |  |   |
 |Cohere| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |   |   |
 |Huggingface| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |  |   |    |
 |Openrouter| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
 |AI21| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |   |
 |VertexAI| ✅ | ✅ |  | ✅ |  |  |  |  |  |   |
 |Bedrock| ✅ | ✅ | ✅ | ✅ | ✅ |  |  |   |  |   |
 |Sagemaker| ✅ | ✅ |  | ✅ |  |  |  |  |  |   |
 |TogetherAI| ✅ | ✅ | ✅ | ✅ | ✅ |  |  |   |  |   |
 |AlephAlpha| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |   |  |   |
 |Palm| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |  |  |   |
 |NLP Cloud| ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |  |  |  |   |
 |Petals| ✅ | ✅ |  | ✅ | |  |   |  |  |   |
 |Ollama| ✅ | ✅ | ✅ | ✅ | ✅ |  |   | ✅ |  |   |n
 By default, LiteLLM raises an exception if the param being passed in isn't supported. However, if you want to just drop the param, instead of raising an exception, just set `litellm.drop_params = True`. 
--- a/docs/my-website/docs/tutorials/provider_specific_params.md
+++ b/docs/my-website/docs/tutorials/provider_specific_params.md
@ -0,0 +1,34 @@
 ### Setting provider-specific Params
 Goal: Set max tokens across OpenAI + Cohere
 **1. via completion**
 LiteLLM will automatically translate max_tokens to the naming convention followed by that specific model provider.
 ```python
 from litellm import completion
 import os
 ## set ENV variables 
 os.environ["OPENAI_API_KEY"] = "your-openai-key" 
 os.environ["COHERE_API_KEY"] = "your-cohere-key" 
 messages = [{ "content": "Hello, how are you?","role": "user"}]
 # openai call
 response = completion(model="gpt-3.5-turbo", messages=messages, max_tokens=100)
 # cohere call
 response = completion(model="command-nightly", messages=messages, max_tokens=100)
 print(response)
 ```
 **2. via provider-specific config**
 For every provider on LiteLLM, we've gotten their specific params (following their naming conventions, etc.). You can just set it for that provider by pulling up that provider via `litellm.<provider_name>Config`. 
 All provider configs are typed and have docstrings, so you should see them autocompleted for you in VSCode with an explanation of what it means. 
 Here's an example of setting max tokens through provider configs. 
--- a/litellm/init.py
+++ b/litellm/init.py
@ -325,6 +325,7 @@ from .llms.vertex_ai import VertexAIConfig
 from .llms.sagemaker import SagemakerConfig
 from .llms.ollama import OllamaConfig
 from .llms.bedrock import AmazonTitanConfig, AmazonAI21Config, AmazonAnthropicConfig, AmazonCohereConfig
 from .llms.openai import OpenAIConfig, OpenAITextCompletionConfig, AzureOpenAIConfig
 from .main import *  # type: ignore
 from .integrations import *
 from .exceptions import (
--- a/litellm/pycache/init.cpython-311.pyc
+++ b/litellm/pycache/init.cpython-311.pyc
--- a/litellm/pycache/main.cpython-311.pyc
+++ b/litellm/pycache/main.cpython-311.pyc
--- a/litellm/llms/openai.py
+++ b/litellm/llms/openai.py
@ -0,0 +1,184 @@
 from typing import Optional, Union
 import types
 # This file just has the openai config classes. 
 # For implementation check out completion() in main.py
 class OpenAIConfig():
    """
    Reference: https://platform.openai.com/docs/api-reference/chat/create
    The class `OpenAIConfig` provides configuration for the OpenAI's Chat API interface. Below are the parameters:
    - `frequency_penalty` (number or null): Defaults to 0. Allows a value between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, thereby minimizing repetition.
    - `function_call` (string or object): This optional parameter controls how the model calls functions.
    - `functions` (array): An optional parameter. It is a list of functions for which the model may generate JSON inputs.
    - `logit_bias` (map): This optional parameter modifies the likelihood of specified tokens appearing in the completion.
    - `max_tokens` (integer or null): This optional parameter helps to set the maximum number of tokens to generate in the chat completion.
    - `n` (integer or null): This optional parameter helps to set how many chat completion choices to generate for each input message.
    - `presence_penalty` (number or null): Defaults to 0. It penalizes new tokens based on if they appear in the text so far, hence increasing the model's likelihood to talk about new topics.
    - `stop` (string / array / null): Specifies up to 4 sequences where the API will stop generating further tokens.
    - `temperature` (number or null): Defines the sampling temperature to use, varying between 0 and 2.
    - `top_p` (number or null): An alternative to sampling with temperature, used for nucleus sampling. 
    """
    frequency_penalty: Optional[int]=None
    function_call: Optional[Union[str, dict]]=None
    functions: Optional[list]=None
    logit_bias: Optional[dict]=None
    max_tokens: Optional[int]=None
    n: Optional[int]=None
    presence_penalty: Optional[int]=None
    stop: Optional[Union[str, list]]=None
    temperature: Optional[int]=None
    top_p: Optional[int]=None
    def __init__(self,
                 frequency_penalty: Optional[int]=None,
                 function_call: Optional[Union[str, dict]]=None,
                 functions: Optional[list]=None,
                 logit_bias: Optional[dict]=None,
                 max_tokens: Optional[int]=None,
                 n: Optional[int]=None,
                 presence_penalty: Optional[int]=None,
                 stop: Optional[Union[str, list]]=None,
                 temperature: Optional[int]=None,
                 top_p: Optional[int]=None,) -> None:
        locals_ = locals()
        for key, value in locals_.items():
            if key != 'self' and value is not None:
                setattr(self.__class__, key, value)
    @classmethod
    def get_config(cls):
        return {k: v for k, v in cls.__dict__.items() 
                if not k.startswith('__') 
                and not isinstance(v, (types.FunctionType, types.BuiltinFunctionType, classmethod, staticmethod)) 
                and v is not None}
 class OpenAITextCompletionConfig():
    """
    Reference: https://platform.openai.com/docs/api-reference/completions/create
    The class `OpenAITextCompletionConfig` provides configuration for the OpenAI's text completion API interface. Below are the parameters:
    - `best_of` (integer or null): This optional parameter generates server-side completions and returns the one with the highest log probability per token.
    - `echo` (boolean or null): This optional parameter will echo back the prompt in addition to the completion.
    - `frequency_penalty` (number or null): Defaults to 0. It is a numbers from -2.0 to 2.0, where positive values decrease the model's likelihood to repeat the same line.
    - `logit_bias` (map): This optional parameter modifies the likelihood of specified tokens appearing in the completion.
    - `logprobs` (integer or null): This optional parameter includes the log probabilities on the most likely tokens as well as the chosen tokens.
    - `max_tokens` (integer or null): This optional parameter sets the maximum number of tokens to generate in the completion.
    - `n` (integer or null): This optional parameter sets how many completions to generate for each prompt.
    - `presence_penalty` (number or null): Defaults to 0 and can be between -2.0 and 2.0. Positive values increase the model's likelihood to talk about new topics.
    - `stop` (string / array / null): Specifies up to 4 sequences where the API will stop generating further tokens.
    - `suffix` (string or null): Defines the suffix that comes after a completion of inserted text.
    - `temperature` (number or null): This optional parameter defines the sampling temperature to use.
    - `top_p` (number or null): An alternative to sampling with temperature, used for nucleus sampling.
    """
    best_of: Optional[int]=None
    echo: Optional[bool]=None
    frequency_penalty: Optional[int]=None
    logit_bias: Optional[dict]=None
    logprobs: Optional[int]=None
    max_tokens: Optional[int]=None
    n: Optional[int]=None
    presence_penalty: Optional[int]=None
    stop: Optional[Union[str, list]]=None
    suffix: Optional[str]=None
    temperature: Optional[float]=None
    top_p: Optional[float]=None
    def __init__(self,
                 best_of: Optional[int]=None,
                 echo: Optional[bool]=None,
                 frequency_penalty: Optional[int]=None,
                 logit_bias: Optional[dict]=None,
                 logprobs: Optional[int]=None,
                 max_tokens: Optional[int]=None,
                 n: Optional[int]=None,
                 presence_penalty: Optional[int]=None,
                 stop: Optional[Union[str, list]]=None,
                 suffix: Optional[str]=None,
                 temperature: Optional[float]=None,
                 top_p: Optional[float]=None) -> None:
        locals_ = locals()
        for key, value in locals_.items():
            if key != 'self' and value is not None:
                setattr(self.__class__, key, value)
    @classmethod
    def get_config(cls):
        return {k: v for k, v in cls.__dict__.items() 
                if not k.startswith('__') 
                and not isinstance(v, (types.FunctionType, types.BuiltinFunctionType, classmethod, staticmethod)) 
                and v is not None}
 class AzureOpenAIConfig(OpenAIConfig):
    """
    Reference: https://platform.openai.com/docs/api-reference/chat/create
    The class `AzureOpenAIConfig` provides configuration for the OpenAI's Chat API interface, for use with Azure. It inherits from `OpenAIConfig`. Below are the parameters::
    - `frequency_penalty` (number or null): Defaults to 0. Allows a value between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, thereby minimizing repetition.
    - `function_call` (string or object): This optional parameter controls how the model calls functions.
    - `functions` (array): An optional parameter. It is a list of functions for which the model may generate JSON inputs.
    - `logit_bias` (map): This optional parameter modifies the likelihood of specified tokens appearing in the completion.
    - `max_tokens` (integer or null): This optional parameter helps to set the maximum number of tokens to generate in the chat completion.
    - `n` (integer or null): This optional parameter helps to set how many chat completion choices to generate for each input message.
    - `presence_penalty` (number or null): Defaults to 0. It penalizes new tokens based on if they appear in the text so far, hence increasing the model's likelihood to talk about new topics.
    - `stop` (string / array / null): Specifies up to 4 sequences where the API will stop generating further tokens.
    - `temperature` (number or null): Defines the sampling temperature to use, varying between 0 and 2.
    - `top_p` (number or null): An alternative to sampling with temperature, used for nucleus sampling. 
    """
    def __init__(self, 
                 frequency_penalty: int | None = None, 
                 function_call: str | dict | None = None, 
                 functions: list | None = None, 
                 logit_bias: dict | None = None, 
                 max_tokens: int | None = None, 
                 n: int | None = None, 
                 presence_penalty: int | None = None, 
                 stop: str | list | None = None, 
                 temperature: int | None = None, 
                 top_p: int | None = None) -> None:
        super().__init__(frequency_penalty, 
                         function_call, 
                         functions, 
                         logit_bias, 
                         max_tokens, 
                         n, 
                         presence_penalty, 
                         stop, 
                         temperature, 
                         top_p)
--- a/litellm/main.py
+++ b/litellm/main.py
@ -66,7 +66,6 @@ from litellm.utils import (
 ####### ENVIRONMENT VARIABLES ###################
 dotenv.load_dotenv()  # Loading env variables using dotenv
 ####### COMPLETION ENDPOINTS ################
 async def acompletion(*args, **kwargs):
@ -310,6 +309,12 @@ def completion(
                get_secret("AZURE_API_KEY")
            )
            ## LOAD CONFIG - if set
            config=litellm.AzureOpenAIConfig.get_config()
            for k, v in config.items():
                if k not in optional_params: # completion(top_k=3) > azure_config(top_k=3) <- allows for dynamic variables to be passed in
                    optional_params[k] = v
            ## LOGGING
            logging.pre_call(
                input=messages,
@ -368,6 +373,13 @@ def completion(
                litellm.openai_key or
                get_secret("OPENAI_API_KEY")
            )
            ## LOAD CONFIG - if set
            config=litellm.OpenAIConfig.get_config()
            for k, v in config.items():
                if k not in optional_params: # completion(top_k=3) > openai_config(top_k=3) <- allows for dynamic variables to be passed in
                    optional_params[k] = v
            ## LOGGING
            logging.pre_call(
                input=messages,
@ -436,6 +448,12 @@ def completion(
                get_secret("OPENAI_API_KEY")
            )
            ## LOAD CONFIG - if set
            config=litellm.OpenAITextCompletionConfig.get_config()
            for k, v in config.items():
                if k not in optional_params: # completion(top_k=3) > openai_text_config(top_k=3) <- allows for dynamic variables to be passed in
                    optional_params[k] = v
            if litellm.organization:
                openai.organization = litellm.organization
--- a/litellm/tests/test_provider_specific_config.py
+++ b/litellm/tests/test_provider_specific_config.py
@ -50,7 +50,7 @@ def claude_test_completion():
    try:
        # OVERRIDE WITH DYNAMIC MAX TOKENS
        response_1 = litellm.completion(
-            model="claude-instant-1",
+            model="together_ai/togethercomputer/llama-2-70b-chat",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
            max_tokens=10
        )
@ -60,7 +60,7 @@ def claude_test_completion():
        # USE CONFIG TOKENS
        response_2 = litellm.completion(
-            model="claude-instant-1",
+            model="together_ai/togethercomputer/llama-2-70b-chat",
            messages=[{ "content": "Hello, how are you?","role": "user"}],
        )
        # Add any assertions here to check the response
@ -393,3 +393,87 @@ def bedrock_test_completion():
        pytest.fail(f"Error occurred: {e}")
 # bedrock_test_completion()
 # OpenAI Chat Completion
 def openai_test_completion():
    litellm.OpenAIConfig(max_tokens=10)
    # litellm.set_verbose=True
    try:
        # OVERRIDE WITH DYNAMIC MAX TOKENS
        response_1 = litellm.completion(
            model="gpt-3.5-turbo",
            messages=[{ "content": "Hello, how are you? Be as verbose as possible","role": "user"}],
            max_tokens=100
        )
        response_1_text = response_1.choices[0].message.content
        print(f"response_1_text: {response_1_text}")
        # USE CONFIG TOKENS
        response_2 = litellm.completion(
            model="gpt-3.5-turbo",
            messages=[{ "content": "Hello, how are you? Be as verbose as possible","role": "user"}],
        )
        response_2_text = response_2.choices[0].message.content
        print(f"response_2_text: {response_2_text}")
        assert len(response_2_text) < len(response_1_text)
    except Exception as e:
        pytest.fail(f"Error occurred: {e}")
 # openai_test_completion()
 # OpenAI Text Completion
 def openai_text_completion_test():
    litellm.OpenAITextCompletionConfig(max_tokens=10)
    # litellm.set_verbose=True
    try:
        # OVERRIDE WITH DYNAMIC MAX TOKENS
        response_1 = litellm.completion(
            model="text-davinci-003",
            messages=[{ "content": "Hello, how are you? Be as verbose as possible","role": "user"}],
            max_tokens=100
        )
        response_1_text = response_1.choices[0].message.content
        print(f"response_1_text: {response_1_text}")
        # USE CONFIG TOKENS
        response_2 = litellm.completion(
            model="text-davinci-003",
            messages=[{ "content": "Hello, how are you? Be as verbose as possible","role": "user"}],
        )
        response_2_text = response_2.choices[0].message.content
        print(f"response_2_text: {response_2_text}")
        assert len(response_2_text) < len(response_1_text)
    except Exception as e:
        pytest.fail(f"Error occurred: {e}")
 # openai_text_completion_test()
 # Azure OpenAI 
 def azure_openai_test_completion():
    litellm.AzureOpenAIConfig(max_tokens=10)
    # litellm.set_verbose=True
    try:
        # OVERRIDE WITH DYNAMIC MAX TOKENS
        response_1 = litellm.completion(
            model="azure/chatgpt-v-2",
            messages=[{ "content": "Hello, how are you? Be as verbose as possible","role": "user"}],
            max_tokens=100
        )
        response_1_text = response_1.choices[0].message.content
        print(f"response_1_text: {response_1_text}")
        # USE CONFIG TOKENS
        response_2 = litellm.completion(
            model="azure/chatgpt-v-2",
            messages=[{ "content": "Hello, how are you? Be as verbose as possible","role": "user"}],
        )
        response_2_text = response_2.choices[0].message.content
        print(f"response_2_text: {response_2_text}")
        assert len(response_2_text) < len(response_1_text)
    except Exception as e:
        pytest.fail(f"Error occurred: {e}")
 # azure_openai_test_completion()