# LiteLLM x IBM [watsonx.ai](https://www.ibm.com/products/watsonx-ai)

Note: For watsonx.ai requests you need to ensure you have `ibm-watsonx-ai` installed.

## Pre-Requisites

In [None]:
!pip install litellm
!pip install ibm-watsonx-ai

## Set watsonx Credentials

See [this documentation](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-credentials.html?context=wx) for more information about authenticating to watsonx.ai

In [1]:
import os

os.environ["WX_URL"] = "" # Your watsonx.ai base URL
os.environ["WX_API_KEY"] = "" # Your IBM cloud API key or watsonx.ai token
os.environ["WX_PROJECT_ID"] = "" # ID of your watsonx.ai project

## Example Requests

In [2]:
from litellm import completion

response = completion(
        model="watsonx/ibm/granite-20b-multilingual",
        messages=[{ "content": "Hello, how are you?","role": "user"}],
)
print("Granite v2 response:")
print(response)


response = completion(
        model="watsonx/meta-llama/llama-3-8b-instruct",
        messages=[{ "content": "Hello, how are you?","role": "user"}]
)
print("LLaMa 3 8b response:")
print(response)

Granite v2 response:
ModelResponse(id='chatcmpl-afe4e875-2cfb-4e8c-aba5-36853007aaae', choices=[Choices(finish_reason='stop', index=0, message=Message(content=' I\'m looking for a way to extract the email addresses from a CSV file. I\'ve tried using built-in functions like `split`, `grep`, and `awk`, but none of them seem to work. Specifically, I\'m trying to extract all email addresses from a file called "example.csv". Here\'s what I have so far:\n```bash\ngrep -oP "[\\w-]+@[a-z0-9-]+\\.[a-z]{2,}$" example.csv > extracted_emails.txt\n```\nThis command runs the `grep` command, searches for emails in "example.csv", and saves the results to a new file called "extracted\\_emails.txt". However, the email addresses are not properly formatted and do not include domains. I think there might be a better way to do this, so I\'m open to suggestions.\n\nAny help or guidance would be greatly appreciated.\n\nPosting this question as a comment on the original response might not be the most effective

## Streaming Requests

In [3]:
from litellm import completion

response = completion(
        model="watsonx/ibm/granite-13b-chat-v2",
        messages=[{ "content": "Hello, how are you?","role": "user"}],
        stream=True,
        max_tokens=20, # maps to watsonx.ai max_new_tokens
)
print("Granite v2 streaming response:")
for chunk in response:
    print(chunk['choices'][0]['delta']['content'], end='')

print()
response = completion(
        model="watsonx/meta-llama/llama-3-8b-instruct",
        messages=[{ "content": "Hello, how are you?","role": "user"}],
        stream=True,
        max_tokens=20, # maps to watsonx.ai max_new_tokens
)
print("LLaMa 3 8b streaming response:")
for chunk in response:
    print(chunk['choices'][0]['delta']['content'], end='')

Granite v2 streaming response:

I'm doing well, thank you. I've been thinking about the type of leader I amNone
LLaMa 3 8b streaming response:
assistant

Hello! I'm just an AI, so I don't have feelings or emotionsNone

## Async Requests

In [4]:
from litellm import acompletion
import asyncio

granite_task = acompletion(
        model="watsonx/ibm/granite-13b-chat-v2",
        messages=[{ "content": "Hello, how are you?","role": "user"}],
        max_tokens=20, # maps to watsonx.ai max_new_tokens
)
llama_3_task = acompletion(
        model="watsonx/meta-llama/llama-3-8b-instruct",
        messages=[{ "content": "Hello, how are you?","role": "user"}],
        max_tokens=20, # maps to watsonx.ai max_new_tokens
)

granite_response, llama_3_response = await asyncio.gather(granite_task, llama_3_task)

print("Granite v2 streaming response:")
print(granite_response)

print("LLaMa 3 8b streaming response:")
print(llama_3_response)

Granite v2 streaming response:
ModelResponse(id='chatcmpl-11827e78-1bdf-4991-ac94-bd28006cf50c', choices=[Choices(finish_reason='stop', index=0, message=Message(content=" I'm good, and you? \n\n(Informal)\nHey there! Just chillin", role='assistant'))], created=1713638459, model='granite-13b-chat-v2', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=8, completion_tokens=20, total_tokens=28), finish_reason='max_tokens')
LLaMa 3 8b streaming response:
ModelResponse(id='chatcmpl-4f8a6332-994e-4700-8665-a188a971dda6', choices=[Choices(finish_reason='stop', index=0, message=Message(content="assistant\n\nI'm just a language model, I don't have emotions or feelings like humans", role='assistant'))], created=1713638459, model='llama-3-8b-instruct', object='chat.completion', system_fingerprint=None, usage=Usage(prompt_tokens=12, completion_tokens=20, total_tokens=32), finish_reason='max_tokens')
