Merge dcf9372943 into b82af5b826

2025-04-24 10:14:26 +00:00 · 2025-04-24 01:01:03 -07:00 · 2025-04-24 01:01:03 -07:00 · b331afba81
commit b331afba81
parent b82af5b826 dcf9372943
10 changed files with 173 additions and 62 deletions
--- a/docs/my-website/docs/observability/wandb_integration.md
+++ b/docs/my-website/docs/observability/wandb_integration.md
@ -1,61 +0,0 @@
-import Image from '@theme/IdealImage';
-
-# Weights & Biases - Logging LLM Input/Output
-
-
-:::tip
-
-This is community maintained, Please make an issue if you run into a bug
-https://github.com/BerriAI/litellm
-
-:::
-
-
-Weights & Biases helps AI developers build better models faster https://wandb.ai
-
-<Image img={require('../../img/wandb.png')} />
-
-:::info
-We want to learn how we can make the callbacks better! Meet the LiteLLM [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or
-join our [discord](https://discord.gg/wuPM9dRgDw)
-::: 
-
-## Pre-Requisites
-Ensure you have run `pip install wandb` for this integration
-```shell
-pip install wandb litellm
-```
-
-## Quick Start
-Use just 2 lines of code, to instantly log your responses **across all providers** with Weights & Biases
-
-```python
-litellm.success_callback = ["wandb"]
-```
-```python
-# pip install wandb 
-import litellm
-import os
-
-os.environ["WANDB_API_KEY"] = ""
-# LLM API Keys
-os.environ['OPENAI_API_KEY']=""
-
-# set wandb as a callback, litellm will send the data to Weights & Biases
-litellm.success_callback = ["wandb"] 
- 
-# openai call
-response = litellm.completion(
-  model="gpt-3.5-turbo",
-  messages=[
-    {"role": "user", "content": "Hi 👋 - i'm openai"}
-  ]
-)
-```
-
-## Support & Talk to Founders
-
- [Schedule Demo 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
- Our numbers 📞 +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
--- a/docs/my-website/docs/observability/weave_integration.md
+++ b/docs/my-website/docs/observability/weave_integration.md
@ -0,0 +1,171 @@
+import Image from '@theme/IdealImage';
+
+# Weights & Biases Weave - Tracing and Evaluation
+
+## What is W&B Weave?
+
+Weights and Biases (W&B) Weave is a framework for tracking, experimenting with, evaluating, deploying, and improving LLM-based applications. Designed for flexibility and scalability, Weave supports every stage of your LLM application development workflow.
+
+W&B Weave's integration with LiteLLM enables you to trace, version control and debug your LLM applications. It enables you to easily evaluate your AI systems with the flexibility of LiteLLM.
+
+Get started with just 2 lines of code and track your LiteLLM calls with W&B Weave. Learn more about W&B Weave [here](https://weave-docs.wandb.ai).
+
+<Image img={require('../../img/weave_litellm.png')} />
+
+## Quick Start
+
+Install W&B Weave
+```shell
+pip install weave
+```
+
+Use just 2 lines of code, to instantly log your responses **across all providers** with Weave.
+
+```python
+import weave
+
+weave_client = weave.init("my-llm-application")
+```
+
+You will be asked to set your W&B API key for authentication. Get your free API key [here](https://wandb.ai/authorize).
+
+Once done, you can use LiteLLM as usual.
+
+```python
+import litellm
+import os
+
+# Set your LLM provider's API key
+os.environ["OPENAI_API_KEY"] = ""
+
+# Call LiteLLM with the model you want to use
+messages = [
+  {"role": "user", "content": "What is the meaning of life?"}
+]
+
+response = litellm.completion(model="gpt-4o", messages=messages)
+print(response)
+```
+
+You will get a Weave URL in the stdout. Open it up to see the trace, cost, token usage, and more!
+
+<Image img={require('../../img/weave_trace.png')} />
+
+## Building a simple LLM application
+
+Now let's use LiteLLM and W&B Weave to build a simple LLM application to translate text from source language to target language.
+
+The function `translate` takes in a text and target language, and returns the translated text using the model of your choice. Note that the `translate` function is decorated with [`weave.op()`](https://weave-docs.wandb.ai/guides/tracking/ops). This is how W&B Weave knows that this function is a part of your application and will be traced when called along with the inputs to the function and the output(s) from the function.
+
+Since the underlying LiteLLM calls are automatically traced, you can see a nested trace of the LiteLLM call(s) made with details like the model, cost, token usage, etc.
+
+```python
+@weave.op()
+def translate(text: str, target_language: str, model: str) -> str:
+    response = litellm.completion(
+        model=model,
+        messages=[
+            {"role": "user", "content": f"Translate '{text}' to {target_language}"}
+        ],
+    )
+    return response.choices[0].message.content
+
+print(translate("Hello, how are you?", "French", "gpt-4o"))
+```
+
+<Image img={require('../../img/weave_trace_application.png')} />
+
+
+## Building an evaluation pipeline
+
+LiteLLM is powerful for building evaluation pipelines because of the flexibility it provides. Together with W&B Weave, building such pipelines is super easy.
+
+Below we are building an evaluation pipeline to evaluate LLM's ability to solve maths problems. We first need an evaluation dataset. 
+
+```python
+samples = [
+    {"question": "What is the sum of 45 and 67?", "answer": "112"},
+    {"question": "If a triangle has sides 3 cm, 4 cm, and 5 cm, what is its area?", "answer": "6 square cm"},
+    {"question": "What is the derivative of x^2 + 3x with respect to x?", "answer": "2x + 3"},
+    {"question": "What is the result of 12 multiplied by 8?", "answer": "96"},
+    {"question": "What is the value of 10! (10 factorial)?", "answer": "3628800"}
+]
+```
+
+Next up we write a simple function that can take in a sample question and return the solution to the problem. We will write this function as a method (`predict`) of our `SimpleMathsSolver` class which is inheriting from the [`weave.Model`](https://weave-docs.wandb.ai/guides/core-types/models) class. This allows us to easily track the attributes (hyperparameters) of our model.
+
+```python
+class SimpleMathsSolver(weave.Model):
+    model_name: str
+    temperature: float
+
+    @weave.op()
+    def predict(self, question: str) -> str:
+        response = litellm.completion(
+            model=self.model_name,
+            messages=[
+                {
+                    "role": "system", 
+                    "content": "You are given maths problems. Think step by step to solve it. Only return the exact answer without any explanation in \\boxed{}"
+                },
+                {
+                    "role": "user",
+                    "content": f"{question}"
+                }
+            ],
+        )
+        return response.choices[0].message.content
+
+maths_solver = SimpleMathsSolver(
+    model_name="gpt-4o",
+    temperature=0.0,
+)
+
+print(maths_solver.predict("What is 2+3?"))
+```
+
+<Image img={require('../../img/weave_maths_solver.png')} />
+
+Now what we have the dataset and the model, let's define a simple exact match evaluation metric and setup our evaluation pipeline using [`weave.Evaluation`](https://weave-docs.wandb.ai/guides/core-types/evaluations).
+
+```python
+@weave.op()
+def exact_match(answer: str, output: str):
+    pattern = r"\\boxed\{(.+?)\}"
+    match = re.search(pattern, output)
+
+    if match:
+      extracted_value = match.group(1)
+      is_correct = extracted_value == answer
+      return is_correct
+    else:
+      return None
+
+evaluation_pipeline = weave.Evaluation(
+    dataset=samples, scorers=[exact_match]
+)
+
+asyncio.run(evaluation_pipeline.evaluate(maths_solver))
+```
+
+The evaluation page will show as below. Here you can see the overall score as well as the score for each sample. This is a powerful way to debug the limitations of your LLM application while keeping track of everything that matters in a sane way.
+
+<Image img={require('../../img/weave_evaluation.png')} />
+
+Now say you want to compare the performance of your current model with a different model using the comparison feature in the UI. LiteLLM's flexibility allows you to do this easily and W&B Weave evaluation pipeline will help you do this in a structured way.
+
+```python
+new_maths_solver = SimpleMathsSolver(
+    model_name="gpt-3.5-turbo",
+    temperature=0.0,
+)
+
+asyncio.run(evaluation_pipeline.evaluate(new_maths_solver))
+```
+
+<Image img={require('../../img/weave_comparison_view.png')} />
+
+## Support
+
+* For advanced usage of Weave, visit the [Weave documentation](https://weave-docs.wandb.ai).
+* For any question or issue with this integration, please [submit an issue](https://github.com/wandb/weave/issues/new?template=Blank+issue) on our [Github](https://github.com/wandb/weave) repository!
--- a/docs/my-website/img/weave_comparison_view.png
+++ b/docs/my-website/img/weave_comparison_view.png
--- a/docs/my-website/img/weave_evaluation.png
+++ b/docs/my-website/img/weave_evaluation.png
--- a/docs/my-website/img/weave_litellm.png
+++ b/docs/my-website/img/weave_litellm.png
--- a/docs/my-website/img/weave_maths_solver.png
+++ b/docs/my-website/img/weave_maths_solver.png
--- a/docs/my-website/img/weave_trace.png
+++ b/docs/my-website/img/weave_trace.png
--- a/docs/my-website/img/weave_trace_application.png
+++ b/docs/my-website/img/weave_trace_application.png
--- a/docs/my-website/sidebars.js
+++ b/docs/my-website/sidebars.js
@ -434,7 +434,7 @@ const sidebars = {
        "observability/helicone_integration",
        "observability/openmeter",
        "observability/promptlayer_integration",
-        "observability/wandb_integration",
+        "observability/weave_integration",
        "observability/slack_integration",
        "observability/athina_integration",
        "observability/greenscale_integration",
--- a/litellm/integrations/weights_biases.py
+++ b/litellm/integrations/weights_biases.py
@ -197,6 +197,7 @@ class WeightsBiasesLogger:

        try:
            print_verbose(f"W&B Logging - Enters logging function for model {kwargs}")
+            print_verbose("`WeightsBiasesLogger` is deprecated. Please use the new W&B `weave` integration instead.")
            run = wandb.init()
            print_verbose(response_obj)