Commit graph

47 commits

Author SHA1 Message Date
raghotham
5a422e236c
chore: make cprint write to stderr (#2250)
Also do sys.exit(1) in case of errors
2025-05-24 23:39:57 -07:00
Ashwin Bharambe
51945f1e57
feat: accept MCP authorization headers for MCP toolgroups (#2230)
The most interesting MCP servers are those with an authorization wall in
front of them. This PR uses the existing `provider_data` mechanism of
passing provider API keys for passing MCP access tokens (in fact,
arbitrary headers in the style of the OpenAI Responses API) from the
client through to the MCP server.

```
class MCPProviderDataValidator(BaseModel):
    # mcp_endpoint => list of headers to send
    mcp_headers: dict[str, list[str]] | None = None
```

Note how we must stuff the headers for all MCP endpoints into a single
"MCPProviderDataValidator". Unlike existing providers (e.g., Together
and Fireworks for inference) where we could name the provider api keys
clearly (`together_api_key`, `fireworks_api_key`), we cannot name these
keys for MCP. We have a single generic MCP provider which can serve
multiple "toolgroups". So we use a dict to combine all the headers for
all MCP endpoints you may want to use in an agentic call.


## Test Plan

See the added integration test for usage.
2025-05-23 08:52:18 -07:00
Michael Anstis
0cc0731189
fix: Pass external_config_dir to BuildConfig (#2190)
# What does this PR do?

The `external_config_dir` configuration parameter is not being passed to
the `BuildConfig` for `LlamaStackAsLibraryClient`.

This prevents _plugin_ providers from being loaded when `llama-stack` is
uses as a library.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
I ran `LlamaStackAsLibraryClient` with a configuration file that
contained `external_config_dir` and related configuration.

It does not work without this change: _external_ providers are not
resolved.

It does work with this change 👍 

[//]: # (## Documentation)
2025-05-19 14:01:28 +02:00
Sébastien Han
1a529705da
chore: more mypy fixes (#2029)
# What does this PR do?

Mainly tried to cover the entire llama_stack/apis directory, we only
have one left. Some excludes were just noop.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-05-06 09:52:31 -07:00
Ihar Hrachyshka
9e6561a1ec
chore: enable pyupgrade fixes (#1806)
# What does this PR do?

The goal of this PR is code base modernization.

Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)

Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-05-01 14:23:50 -07:00
Sébastien Han
69554158fa
feat: add health to all providers through providers endpoint (#1418)
The `/v1/providers` now reports the health status of each
provider when implemented.

```
curl -L http://127.0.0.1:8321/v1/providers|jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4072  100  4072    0     0   246k      0 --:--:-- --:--:-- --:--:--  248k
{
  "data": [
    {
      "api": "inference",
      "provider_id": "ollama",
      "provider_type": "remote::ollama",
      "config": {
        "url": "http://localhost:11434"
      },
      "health": {
        "status": "OK"
      }
    },
    {
      "api": "vector_io",
      "provider_id": "faiss",
      "provider_type": "inline::faiss",
      "config": {
        "kvstore": {
          "type": "sqlite",
          "namespace": null,
          "db_path": "/Users/leseb/.llama/distributions/ollama/faiss_store.db"
        }
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "safety",
      "provider_id": "llama-guard",
      "provider_type": "inline::llama-guard",
      "config": {
        "excluded_categories": []
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "agents",
      "provider_id": "meta-reference",
      "provider_type": "inline::meta-reference",
      "config": {
        "persistence_store": {
          "type": "sqlite",
          "namespace": null,
          "db_path": "/Users/leseb/.llama/distributions/ollama/agents_store.db"
        }
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "telemetry",
      "provider_id": "meta-reference",
      "provider_type": "inline::meta-reference",
      "config": {
        "service_name": "llama-stack",
        "sinks": "console,sqlite",
        "sqlite_db_path": "/Users/leseb/.llama/distributions/ollama/trace_store.db"
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "eval",
      "provider_id": "meta-reference",
      "provider_type": "inline::meta-reference",
      "config": {
        "kvstore": {
          "type": "sqlite",
          "namespace": null,
          "db_path": "/Users/leseb/.llama/distributions/ollama/meta_reference_eval.db"
        }
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "datasetio",
      "provider_id": "huggingface",
      "provider_type": "remote::huggingface",
      "config": {
        "kvstore": {
          "type": "sqlite",
          "namespace": null,
          "db_path": "/Users/leseb/.llama/distributions/ollama/huggingface_datasetio.db"
        }
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "datasetio",
      "provider_id": "localfs",
      "provider_type": "inline::localfs",
      "config": {
        "kvstore": {
          "type": "sqlite",
          "namespace": null,
          "db_path": "/Users/leseb/.llama/distributions/ollama/localfs_datasetio.db"
        }
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "scoring",
      "provider_id": "basic",
      "provider_type": "inline::basic",
      "config": {},
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "scoring",
      "provider_id": "llm-as-judge",
      "provider_type": "inline::llm-as-judge",
      "config": {},
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "scoring",
      "provider_id": "braintrust",
      "provider_type": "inline::braintrust",
      "config": {
        "openai_api_key": "********"
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "tool_runtime",
      "provider_id": "brave-search",
      "provider_type": "remote::brave-search",
      "config": {
        "api_key": "********",
        "max_results": 3
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "tool_runtime",
      "provider_id": "tavily-search",
      "provider_type": "remote::tavily-search",
      "config": {
        "api_key": "********",
        "max_results": 3
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "tool_runtime",
      "provider_id": "code-interpreter",
      "provider_type": "inline::code-interpreter",
      "config": {},
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "tool_runtime",
      "provider_id": "rag-runtime",
      "provider_type": "inline::rag-runtime",
      "config": {},
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "tool_runtime",
      "provider_id": "model-context-protocol",
      "provider_type": "remote::model-context-protocol",
      "config": {},
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    },
    {
      "api": "tool_runtime",
      "provider_id": "wolfram-alpha",
      "provider_type": "remote::wolfram-alpha",
      "config": {
        "api_key": "********"
      },
      "health": {
        "status": "Not Implemented",
        "message": "Provider does not implement health check"
      }
    }
  ]
}
```

Per providers too:

```
curl -L http://127.0.0.1:8321/v1/providers/ollama
{"api":"inference","provider_id":"ollama","provider_type":"remote::ollama","config":{"url":"http://localhost:11434"},"health":{"status":"OK"}}
```

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-04-14 11:59:36 +02:00
ehhuang
3a2314dcef
fix(telemetry): library client does not log span (#1833) 2025-03-29 14:55:31 -07:00
ehhuang
06788643b3
feat(telemetry): clean up spans (#1760) 2025-03-21 20:05:11 -07:00
ehhuang
34f89bfbd6
feat(telemetry): use zero-width space to avoid clutter (#1754)
# What does this PR do?
Before 
<img width="858" alt="image"
src="https://github.com/user-attachments/assets/6cefb1ae-5603-4818-85ea-a0c337b986bc"
/>

Note the redundant 'llama-stack' in front of every span

## Test Plan
<img width="1171" alt="image"
src="https://github.com/user-attachments/assets/bdc5fd5b-ff1f-4f10-8b40-cff2ea93dd1f"
/>
2025-03-21 12:02:10 -07:00
Dinesh Yeduguru
58d08d100e
feat: Add back inference metrics and preserve context variables across asyncio boundary (#1552)
# What does this PR do?
This PR adds back the changes in #1300  which were reverted in  #1476 .

It also adds logic to preserve context variables across asyncio
boundary. this is needed with the library client since the async
generator logic yields control to code outside the event loop, and on
resuming, does not have the same context as before and this requires
preserving the context vars.

address #1477 
## Test Plan


```
 curl --request POST \
  --url http://localhost:8321/v1/inference/chat-completion \
  --header 'content-type: application/json' \
  --data '{
  "model_id": "meta-llama/Llama-3.1-70B-Instruct",
  "messages": [
    {
      "role": "user",
      "content": {
        "type": "text",
        "text": "where do humans live"
      }
    }
  ],
  "stream": false
}' | jq .

{
  "metrics": [
    {
      "trace_id": "kCZwO3tyQC-FuAGb",
      "span_id": "bsP_5a5O",
      "timestamp": "2025-03-11T16:47:38.549084Z",
      "attributes": {
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "fireworks"
      },
      "type": "metric",
      "metric": "prompt_tokens",
      "value": 10,
      "unit": "tokens"
    },
    {
      "trace_id": "kCZwO3tyQC-FuAGb",
      "span_id": "bsP_5a5O",
      "timestamp": "2025-03-11T16:47:38.549449Z",
      "attributes": {
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "fireworks"
      },
      "type": "metric",
      "metric": "completion_tokens",
      "value": 369,
      "unit": "tokens"
    },
    {
      "trace_id": "kCZwO3tyQC-FuAGb",
      "span_id": "bsP_5a5O",
      "timestamp": "2025-03-11T16:47:38.549457Z",
      "attributes": {
        "model_id": "meta-llama/Llama-3.1-70B-Instruct",
        "provider_id": "fireworks"
      },
      "type": "metric",
      "metric": "total_tokens",
      "value": 379,
      "unit": "tokens"
    }
  ],
  "completion_message": {
    "role": "assistant",
    "content": "Humans live on the planet Earth, specifically on its landmasses and in its oceans. Here's a breakdown of where humans live:\n\n1. **Continents:** Humans inhabit all seven continents:\n\t* Africa\n\t* Antarctica ( temporary residents, mostly scientists and researchers)\n\t* Asia\n\t* Australia\n\t* Europe\n\t* North America\n\t* South America\n2. **Countries:** There are 196 countries recognized by the United Nations, and humans live in almost all of them.\n3. **Cities and towns:** Many humans live in urban areas, such as cities and towns, which are often located near coastlines, rivers, or other bodies of water.\n4. **Rural areas:** Some humans live in rural areas, such as villages, farms, and countryside.\n5. **Islands:** Humans inhabit many islands around the world, including those in the Pacific, Indian, and Atlantic Oceans.\n6. **Mountains and highlands:** Humans live in mountainous regions, such as the Himalayas, the Andes, and the Rocky Mountains.\n7. **Deserts:** Some humans live in desert regions, such as the Sahara, the Mojave, and the Atacama.\n8. **Coastal areas:** Many humans live in coastal areas, such as beaches, ports, and coastal cities.\n9. **Underwater habitats:** A few humans live in underwater habitats, such as research stations and submarines.\n10. **Space:** A small number of humans have lived in space, including astronauts on the International Space Station and those who have visited the Moon.\n\nOverall, humans can be found living in almost every environment on Earth, from the frozen tundra to the hottest deserts, and from the highest mountains to the deepest oceans.",
    "stop_reason": "end_of_turn",
    "tool_calls": []
  },
  "logprobs": null
}

```

Orignal repro no longer showing any error:
```
LLAMA_STACK_DISABLE_VERSION_CHECK=true llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
python -m examples.agents.e2e_loop_with_client_tools localhost 8321
```

client logs:
https://gist.github.com/dineshyv/047c7e87b18a5792aa660e311ea53166
server logs:
https://gist.github.com/dineshyv/97a2174099619e9916c7c490be26e559
2025-03-12 12:01:03 -07:00
Ashwin Bharambe
70ff226b6a fix(library_client): ensure pending asyncio tasks like generator athrow are executed 2025-03-09 16:17:27 -07:00
Ashwin Bharambe
205661bc78
fix: Use re-entrancy and concurrency safe context managers for provider data (#1498)
Concurrent requests should not trample (or reuse) each others' provider
data. Provider data should be scoped to each request.

## Test Plan

Set the uvicorn server to have a single worker process + thread by
updating the config:
```python
    uvicorn_config = {
        ...
        "workers": 1,
        "loop": "asyncio",
    }
```

Then perform the following steps on `origin/main` (without this change).

(1) Run the server using `llama stack run dev` without having
`FIREWORKS_API_KEY` in the environment.

(2) Run a test by specifying the FIREWORKS_API_KEY env var so it gets
stored in the thread local
```
pytest -s -v tests/integration/inference/test_text_inference.py \
    --stack-config http://localhost:8321 \
    --text-model accounts/fireworks/models/llama-v3p1-8b-instruct \
    -k test_text_chat_completion_with_tool_calling_and_streaming \
     --env FIREWORKS_API_KEY=<...>
``` 
Ensure you don't have any other API keys in the environment (otherwise
the bug will not reproduce due to other specifics in our testing code.)
Verify this works.

(3) Run the same command again without specifying FIREWORKS_API_KEY. See
that the request actually succeeds when it *should have failed*.


----
Now do the same tests on this branch, verify step (3) results in
failure.

Finally, run the full `test_text_inference.py` test suite with this
change, verify it succeeds.
2025-03-08 22:56:30 -08:00
ehhuang
7cff9f504f
fix: raise error when request param failed to convert (#1339)
# Summary:
This led to extremely hard to debug messages.
Before:

llama_stack/distribution/library_client.py:275: in request
    response = await self._call_non_streaming(
llama_stack/distribution/library_client.py:322: in _call_non_streaming
    result = await matched_func(**body)
llama_stack/providers/utils/telemetry/trace_protocol.py:102: in
async_wrapper
    result = await method(self, *args, **kwargs)
llama_stack/providers/inline/agents/meta_reference/agents.py:80: in
create_agent
    value=agent_config.model_dump_json(),
E   AttributeError: 'dict' object has no attribute 'model_dump_json'

After:

E ValueError: Failed to convert parameter {'model':
'meta-llama/Llama-3.1-8B-Instruct', 'instructions': 'You are a helpful
assistant', 'sampling_params': {'strategy': {'type': 'top_p',
'temperature': 0.0001, 'top_p': 0.9}}, 'toolgroups': [{'name':
'builtin::rag'}], 'input_shields': ['meta-llama/Llama-Guard-3-8B'],
'output_shields': ['meta-llama/Llama-Guard-3-8B'],
'enable_session_persistence': False} into <class
'llama_stack.apis.agents.agents.AgentConfig'>: 2 validation errors for
AgentConfig
E   toolgroups.0.str
E Input should be a valid string [type=string_type, input_value={'name':
'builtin::rag'}, input_type=dict]
E For further information visit
https://errors.pydantic.dev/2.10/v/string_type
E   toolgroups.0.AgentToolGroupWithArgs.args
E Field required [type=missing, input_value={'name': 'builtin::rag'},
input_type=dict]
E For further information visit
https://errors.pydantic.dev/2.10/v/missing

# Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8B
2025-03-01 10:39:05 -08:00
Ashwin Bharambe
23b65b6cee
fix(test): update client-sdk tests to handle tool format parametrization better (#1287)
# What does this PR do?

Tool format depends on the model. @ehhuang introduced a
`get_default_tool_prompt_format` function for this purpose. We should
use that instead of hacky model ID matching we had before.

Secondly, non llama models don't have this concept so testing with those
models should work as is.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```bash
for distro in fireworks ollama; do
  LLAMA_STACK_CONFIG=$distro \
    pytest -s -v tests/client-sdk/inference/test_text_inference.py \
       --inference-model=meta-llama/Llama-3.2-3B-Instruct \
       --vision-inference-model=""
done

LLAMA_STACK_CONFIG=dev \
   pytest -s -v tests/client-sdk/inference/test_text_inference.py \
       --inference-model=openai/gpt-4o \
       --vision-inference-model=""

```

[//]: # (## Documentation)
2025-02-26 21:16:00 -08:00
Ashwin Bharambe
6227e1e3b9
fix: update virtualenv building so llamastack- prefix is not added, make notebook experience easier (#1225)
Make sure venv behaves like conda (no prefix is added to image_name) and
`--image-type venv` inside a notebook "just works" without any fiddling
2025-02-23 16:57:11 -08:00
Ashwin Bharambe
e7d261ef4a Fix test infra, sentence embeddings mixin 2025-02-21 15:11:46 -08:00
Xi Yan
37cf60b732
style: remove prints in codebase (#1146)
# What does this PR do?
- replace prints in codebase with logger
- update print_table to use rich Table

## Test Plan
- library client script in
https://github.com/meta-llama/llama-stack/pull/1145

```
llama stack list-providers
```
<img width="1407" alt="image"
src="https://github.com/user-attachments/assets/906b4f54-9e42-4e55-8968-7e3aa45525b2"
/>


[//]: # (## Documentation)
2025-02-18 19:41:37 -08:00
Xi Yan
e8cb9e0adb
fix: direct client pydantic type casting (#1145)
# What does this PR do?
- Closes #1142 
- Root cause is due to having `Union[str, AgenToolGroupWithArgs]`

## Test Plan
- Test with script described in issue. 

- Print out final converted pydantic object
<img width="1470" alt="image"
src="https://github.com/user-attachments/assets/15dc9cd0-f37a-4b91-905f-3fe4f59a08c6"
/>


[//]: # (## Documentation)
2025-02-18 16:07:54 -08:00
Hardik Shah
b0b696cb4f
fix: regex pattern matching to support :path suffix in the routes (#1089)
This PR fixes client sdk test failure --
3720312204

by updating the regex matching pattern to also consider `:path` in the
routes
2025-02-13 18:18:23 -08:00
Sébastien Han
e4a1579e63
build: format codebase imports using ruff linter (#1028)
# What does this PR do?

- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 10:06:21 -08:00
Hardik Shah
c335ed8765 raise when client initialize fails 2025-02-07 12:24:07 -08:00
Charlie Doern
2a4a612373
fix: Ensure a better error stack trace when llama-stack is not built (#950)
# What does this PR do?

currently this is the output when you run a distribution locally without
running `llama stack build`:

```
Traceback (most recent call last):
  File "/Users/charliedoern/Documents/llama-sdk.py", line 25, in <module>
    models = client.models.list()
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 107, in list
    raise exc
  File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 95, in list
    return self._get(
           ^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/_base_client.py", line 1212, in get
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 168, in request
    return asyncio.run(self.async_client.request(*args, **kwargs))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 258, in request
    if not self.endpoint_impls:
           ^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncLlamaStackAsLibraryClient' object has no attribute 'endpoint_impls'
```

the intended exception is never raised, add an except for an
AttributeError so users can catch when they call things like
`models.list()` and so that a more useful error telling them that the
client is not properly initialized is printed.

## Test Plan

Please describe:
- I ran the script found here:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#run-inference-with-python-sdk
locally with the changes in this PR and the exception was caught
successfully.

## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-07 09:47:02 -08:00
Charlie Doern
26aef50bc5
if client.initialize fails, the example should exit (#954)
# What does this PR do?

the example script can gracefully exit if the boolean returned from
initialize is used properly

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-02-04 13:54:21 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Ashwin Bharambe
0d96070af9
Update OpenAPI generator to add param and field documentation (#896)
We desperately need to document our APIs. This is the basic requirement
of having a Spec :)

This PR updates the OpenAPI generator so documentation for request
parameters and object fields can be properly added to the OpenAPI specs.
From there, this should get picked by Stainless, etc.

## Test Plan:

Updated client-sdk (See
https://github.com/meta-llama/llama-stack-client-python/pull/104) and
then ran:

```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=../../llama_stack/templates/fireworks/run.yaml pytest -s -v inference/test_inference.py agents/test_agents.py
```
2025-01-29 10:04:30 -08:00
Dinesh Yeduguru
cb11336886
remove logger handler only in notebook (#868)
remove logger handler only in notebook
2025-01-23 16:58:17 -08:00
Dinesh Yeduguru
05f6b44da7
Fix telemetry (#787)
# What does this PR do?

PR fixes couple of issues with telemetry:
1) The REST refactor changed the method from get_span_tree to
query_span_tree, which is causing the server side to return empty spans
2) Library client has introduced a new event loop, which required
changing the location of where start and end trace are called


## Test Plan

LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/agents/test_agents.py -k
"test_builtin_tool_web_search"


And querying for spans from the agent run using the library client.
2025-01-16 10:36:13 -08:00
Dinesh Yeduguru
8fd9bcb8cd
fix routing in library client (#776)
# What does this PR do?

Library client needs to match the impl based on both the path and
method. Since path is no longer static, this PR uses the inefficient way
of using regexes computed based on the annotated route path to match
against the incoming request path. The variables now also can come to
the impl from both path or the body, which is also handled cleanly by
finding all the regex matches.



## Test Plan


LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml"
pytest -v tests/client-sdk/agents/test_agents.py
2025-01-15 15:59:45 -08:00
Xi Yan
194d12b304
[bugfix] fix streaming GeneratorExit exception with LlamaStackAsLibraryClient (#760)
# What does this PR do?

#### Issue
- Using Jupyter notebook with LlamaStackAsLibraryClient + streaming
gives exception
```
Exception ignored in: <async_generator object HTTP11ConnectionByteStream.__aiter__ at 0x32a95a740>
Traceback (most recent call last):
  File "/opt/anaconda3/envs/fresh/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 404, in _aiter_
    yield part
RuntimeError: async generator ignored GeneratorExit
```

- Reproduce w/
https://github.com/meta-llama/llama-stack/blob/notebook-streaming-debug/inline.ipynb

#### Fix
- Issue likely comes from stream_across_asyncio_run_boundary closing
connection too soon when interacting in jupyter environment
- This uses an alternative way to convert AsyncStream to SyncStream
return type by sync version of LlamaStackAsLibraryClient, which calls
AsyncLlamaStackAsLibraryClient calling async impls under the hood

#### Additional changes
- Moved tracing logic into AsyncLlamaStackAsLibraryClient.request s.t.
streaming / non-streaming request for LlamaStackAsLibraryClient shares
same code

## Test Plan

- Test w/ together & fireworks & ollama with streaming and non-streaming
using notebook in:
https://github.com/meta-llama/llama-stack/blob/notebook-streaming-debug/inline.ipynb
- Note: need to restart kernel and run pip install -e . in jupyter
interpreter for local code change to take effect

<img width="826" alt="image"
src="https://github.com/user-attachments/assets/5f90985d-1aee-452c-a599-2157f5654fea"
/>


## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-01-14 10:58:46 -08:00
Dinesh Yeduguru
314806cde3
Add provider data passing for library client (#750)
# What does this PR do?

This PR adds the provider data passing for the library client and
changes the provider's api keys be unique


## Test Plan

LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-fireworks/fireworks-run.yaml"
pytest -v tests/client-sdk/agents/test_agents.py

run.yaml:
https://gist.github.com/dineshyv/0c10b5c7d0a2fb7ba4f0ecc8dcf860d1
2025-01-13 15:12:10 -08:00
Dinesh Yeduguru
a5c57cd381
agents to use tools api (#673)
# What does this PR do?

PR #639 introduced the notion of Tools API and ability to invoke tools
through API just as any resource. This PR changes the Agents to start
using the Tools API to invoke tools. Major changes include:
1) Ability to specify tool groups with AgentConfig
2) Agent gets the corresponding tool definitions for the specified tools
and pass along to the model
3) Attachements are now named as Documents and their behavior is mostly
unchanged from user perspective
4) You can specify args that can be injected to a tool call through
Agent config. This is especially useful in case of memory tool, where
you want the tool to operate on a specific memory bank.
5) You can also register tool groups with args, which lets the agent
inject these as well into the tool call.
6) All tests have been migrated to use new tools API and fixtures
including client SDK tests
7) Telemetry just works with tools API because of our trace protocol
decorator


## Test Plan
```
pytest -s -v -k fireworks llama_stack/providers/tests/agents/test_agents.py  \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct

pytest -s -v -k together  llama_stack/providers/tests/tools/test_tools.py \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct

LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py
```
run.yaml:
https://gist.github.com/dineshyv/0365845ad325e1c2cab755788ccc5994

Notebook:
https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d?usp=sharing
2025-01-08 19:01:00 -08:00
Dinesh Yeduguru
0bc5d05243
remove default logger handlers when using libcli with notebook (#718)
# What does this PR do?

Remove the default log handlers for notebook to avoid polluting logs
2025-01-06 13:06:22 -08:00
Ashwin Bharambe
e3f187fb83 Redact sensitive information from configs when printing, etc. 2025-01-02 13:54:02 -08:00
Dinesh Yeduguru
8b8d1c1ef4
fix trace starting in library client (#655)
# What does this PR do?

Because of the way library client sets up async io boundaries, tracing
was broken with streaming. This PR fixes the tracing to start at the
right way to caputre the life time of async gen functions correctly.

Test plan:
Script ran:
https://gist.github.com/yanxi0830/f6645129e55ab12de3cd6ec71564c69e

Before: No spans returned for a session


Now: We see spans
<img width="1678" alt="Screenshot 2024-12-18 at 9 50 46 PM"
src="https://github.com/user-attachments/assets/58a3b0dd-a41c-489a-b89a-075e698a2c03"
/>
2024-12-19 16:13:52 -08:00
Ashwin Bharambe
8de8eb03c8
Update the "InterleavedTextMedia" type (#635)
## What does this PR do?

This is a long-pending change and particularly important to get done
now.

Specifically:
- we cannot "localize" (aka download) any URLs from media attachments
anywhere near our modeling code. it must be done within llama-stack.
- `PIL.Image` is infesting all our APIs via `ImageMedia ->
InterleavedTextMedia` and that cannot be right at all. Anything in the
API surface must be "naturally serializable". We need a standard `{
type: "image", image_url: "<...>" }` which is more extensible
- `UserMessage`, `SystemMessage`, etc. are moved completely to
llama-stack from the llama-models repository.

See https://github.com/meta-llama/llama-models/pull/244 for the
corresponding PR in llama-models.

## Test Plan

```bash
cd llama_stack/providers/tests

pytest -s -v -k "fireworks or ollama or together" inference/test_vision_inference.py
pytest -s -v -k "(fireworks or ollama or together) and llama_3b" inference/test_text_inference.py
pytest -s -v -k chroma memory/test_memory.py \
  --env EMBEDDING_DIMENSION=384 --env CHROMA_DB_PATH=/tmp/foobar

pytest -s -v -k fireworks agents/test_agents.py  \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct
```

Updated the client sdk (see PR ...), installed the SDK in the same
environment and then ran the SDK tests:

```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=together pytest -s -v agents/test_agents.py
LLAMA_STACK_CONFIG=ollama pytest -s -v memory/test_memory.py

# this one needed a bit of hacking in the run.yaml to ensure I could register the vision model correctly
INFERENCE_MODEL=llama3.2-vision:latest LLAMA_STACK_CONFIG=ollama pytest -s -v inference/test_inference.py
```
2024-12-17 11:18:31 -08:00
Ashwin Bharambe
eb37fba9da Small fix to library client 2024-12-16 14:08:30 -08:00
Dinesh Yeduguru
e128f2547a
add tracing back to the lib cli (#595)
Adds back all the tracing logic removed from library client. also adds
back the logging to agent_instance.
2024-12-11 08:44:20 -08:00
Dinesh Yeduguru
2e3d3a62a5 Revert "add tracing to library client (#591)"
This reverts commit bc1fddf1df.
2024-12-10 08:50:20 -08:00
Dinesh Yeduguru
16d103842a Revert "await end_trace in libcli"
This reverts commit 7615da78b8.
2024-12-10 08:47:32 -08:00
Dinesh Yeduguru
f969b561ea Revert "Disable telemetry in library client for now"
This reverts commit 176ebddf47.
2024-12-10 08:47:18 -08:00
Ashwin Bharambe
176ebddf47 Disable telemetry in library client for now 2024-12-09 22:17:25 -08:00
Ashwin Bharambe
a4d8a6009a
Fixes for library client (#587)
Library client used _server_ side types which was no bueno. The fix here
is not the completely correct fix but it is good for enough and for the
demo notebook.
2024-12-09 17:14:37 -08:00
Dinesh Yeduguru
7615da78b8 await end_trace in libcli 2024-12-09 15:54:42 -08:00
Dinesh Yeduguru
bc1fddf1df
add tracing to library client (#591) 2024-12-09 15:46:26 -08:00
Ashwin Bharambe
a2170353af better detection for jupyter 2024-12-09 09:38:11 -08:00
Ashwin Bharambe
e951852848 Miscellaneous fixes around telemetry, library client and run yaml autogen
Also add a `venv` image-type for llama stack build
2024-12-08 20:40:22 -08:00
Ashwin Bharambe
14f973a64f
Make LlamaStackLibraryClient work correctly (#581)
This PR does a few things:

- it moves "direct client" to llama-stack repo instead of being in the
llama-stack-client-python repo
- renames it to `LlamaStackLibraryClient`
- actually makes synchronous generators work 
- makes streaming and non-streaming work properly

In many ways, this PR makes things finally "work"

## Test Plan

See a `library_client_test.py` I added. This isn't really quite a test
yet but it demonstrates that this mode now works. Here's the invocation
and the response:

```
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct python llama_stack/distribution/tests/library_client_test.py ollama
```


![image](https://github.com/user-attachments/assets/17d4e116-4457-4755-a14e-d9a668801fe0)
2024-12-07 14:59:36 -08:00