Merge branch 'rag_scoring_fn_1' into rag_scoring_fn_2

This commit is contained in:
Xi Yan 2024-12-30 17:20:35 -08:00
commit dbecff60a4
128 changed files with 6391 additions and 493 deletions

2
.github/CODEOWNERS vendored
View file

@ -2,4 +2,4 @@
# These owners will be the default owners for everything in # These owners will be the default owners for everything in
# the repo. Unless a later match takes precedence, # the repo. Unless a later match takes precedence,
* @ashwinb @yanxi0830 @hardikjshah @dltn @raghotham @dineshyv * @ashwinb @yanxi0830 @hardikjshah @dltn @raghotham @dineshyv @vladimirivic

View file

@ -127,7 +127,7 @@ You have two ways to install this repository:
conda activate stack conda activate stack
cd llama-stack cd llama-stack
$CONDA_PREFIX/bin/pip install -e . pip install -e .
``` ```
## Documentation ## Documentation

4636
docs/getting_started.ipynb Normal file

File diff suppressed because one or more lines are too long

View file

@ -43,7 +43,7 @@ Configuration for this is available at `distributions/ollama/run.yaml`.
### 3. Use the Llama Stack client SDK ### 3. Use the Llama Stack client SDK
You can interact with the Llama Stack server using various client SDKs. We will use the Python SDK which you can install using: You can interact with the Llama Stack server using various client SDKs. We will use the Python SDK which you can install using the following command. Note that you must be using Python 3.10 or newer:
```bash ```bash
pip install llama-stack-client pip install llama-stack-client
``` ```
@ -51,7 +51,8 @@ pip install llama-stack-client
Let's use the `llama-stack-client` CLI to check the connectivity to the server. Let's use the `llama-stack-client` CLI to check the connectivity to the server.
```bash ```bash
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT models list llama-stack-client configure --endpoint http://localhost:$LLAMA_STACK_PORT
llama-stack-client models list
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ identifier ┃ provider_id ┃ provider_resource_id ┃ metadata ┃ ┃ identifier ┃ provider_id ┃ provider_resource_id ┃ metadata ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
@ -61,7 +62,7 @@ llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT models list
You can test basic Llama inference completion using the CLI too. You can test basic Llama inference completion using the CLI too.
```bash ```bash
llama-stack-client --endpoint http://localhost:$LLAMA_STACK_PORT \ llama-stack-client \
inference chat-completion \ inference chat-completion \
--message "hello, what model are you?" --message "hello, what model are you?"
``` ```
@ -153,10 +154,3 @@ if __name__ == "__main__":
- Learn how to [Build Llama Stacks](../distributions/index.md) - Learn how to [Build Llama Stacks](../distributions/index.md)
- See [References](../references/index.md) for more details about the llama CLI and Python SDK - See [References](../references/index.md) for more details about the llama CLI and Python SDK
- For example applications and more detailed tutorials, visit our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository. - For example applications and more detailed tutorials, visit our [llama-stack-apps](https://github.com/meta-llama/llama-stack-apps/tree/main/examples) repository.
## Thinking out aloud here in terms of what to write in the docs
- how to get a llama stack server running
- what are all the different client sdks
- what are the components of building agents

View file

@ -16,7 +16,7 @@ Interactive pages for users to play with and explore Llama Stack API capabilitie
##### Chatbot ##### Chatbot
```{eval-rst} ```{eval-rst}
.. video:: https://github.com/user-attachments/assets/6ca617e8-32ca-49b2-9774-185020ff5204 .. video:: https://github.com/user-attachments/assets/8d2ef802-5812-4a28-96e1-316038c84cbf
:autoplay: :autoplay:
:playsinline: :playsinline:
:muted: :muted:

View file

@ -47,7 +47,7 @@ This first example walks you through how to evaluate a model candidate served by
- [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions. - [SimpleQA](https://openai.com/index/introducing-simpleqa/): Benchmark designed to access models to answer short, fact-seeking questions.
#### 1.1 Running MMMU #### 1.1 Running MMMU
- We will use a pre-processed MMMU dataset from [llamastack/mmmu](https://huggingface.co/datasets/llamastack/mmmu). The preprocessing code is shown in in this [Github Gist](https://gist.github.com/yanxi0830/118e9c560227d27132a7fd10e2c92840). The dataset is obtained by transforming the original [MMMU/MMMU](https://huggingface.co/datasets/MMMU/MMMU) dataset into correct format by `inference/chat-completion` API. - We will use a pre-processed MMMU dataset from [llamastack/mmmu](https://huggingface.co/datasets/llamastack/mmmu). The preprocessing code is shown in this [GitHub Gist](https://gist.github.com/yanxi0830/118e9c560227d27132a7fd10e2c92840). The dataset is obtained by transforming the original [MMMU/MMMU](https://huggingface.co/datasets/MMMU/MMMU) dataset into correct format by `inference/chat-completion` API.
```python ```python
import datasets import datasets

View file

@ -67,7 +67,7 @@
"from termcolor import cprint\n", "from termcolor import cprint\n",
"\n", "\n",
"from llama_stack.distribution.datatypes import RemoteProviderConfig\n", "from llama_stack.distribution.datatypes import RemoteProviderConfig\n",
"from llama_stack.apis.safety import * # noqa: F403\n", "from llama_stack.apis.safety import Safety\n",
"from llama_stack_client import LlamaStackClient\n", "from llama_stack_client import LlamaStackClient\n",
"\n", "\n",
"\n", "\n",
@ -127,7 +127,7 @@
"name": "python", "name": "python",
"nbconvert_exporter": "python", "nbconvert_exporter": "python",
"pygments_lexer": "ipython3", "pygments_lexer": "ipython3",
"version": "3.10.15" "version": "3.11.10"
} }
}, },
"nbformat": 4, "nbformat": 4,

View file

@ -18,18 +18,30 @@ from typing import (
Union, Union,
) )
from llama_models.llama3.api.datatypes import ToolParamDefinition
from llama_models.schema_utils import json_schema_type, webmethod from llama_models.schema_utils import json_schema_type, webmethod
from pydantic import BaseModel, ConfigDict, Field from pydantic import BaseModel, ConfigDict, Field
from typing_extensions import Annotated from typing_extensions import Annotated
from llama_stack.providers.utils.telemetry.trace_protocol import trace_protocol
from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_stack.apis.common.deployment_types import * # noqa: F403
from llama_stack.apis.inference import * # noqa: F403
from llama_stack.apis.safety import * # noqa: F403
from llama_stack.apis.memory import * # noqa: F403
from llama_stack.apis.common.content_types import InterleavedContent, URL from llama_stack.apis.common.content_types import InterleavedContent, URL
from llama_stack.apis.common.deployment_types import RestAPIExecutionConfig
from llama_stack.apis.inference import (
CompletionMessage,
SamplingParams,
ToolCall,
ToolCallDelta,
ToolChoice,
ToolPromptFormat,
ToolResponse,
ToolResponseMessage,
UserMessage,
)
from llama_stack.apis.memory import MemoryBank
from llama_stack.apis.safety import SafetyViolation
from llama_stack.providers.utils.telemetry.trace_protocol import trace_protocol
@json_schema_type @json_schema_type

View file

@ -6,13 +6,14 @@
from typing import Optional from typing import Optional
from llama_models.llama3.api.datatypes import * # noqa: F403 from llama_models.llama3.api.datatypes import ToolPromptFormat
from llama_models.llama3.api.tool_utils import ToolUtils from llama_models.llama3.api.tool_utils import ToolUtils
from termcolor import cprint from termcolor import cprint
from llama_stack.apis.agents import AgentTurnResponseEventType, StepType from llama_stack.apis.agents import AgentTurnResponseEventType, StepType
from llama_stack.apis.inference import ToolResponseMessage
class LogEvent: class LogEvent:
def __init__( def __init__(

View file

@ -10,8 +10,16 @@ from llama_models.schema_utils import json_schema_type, webmethod
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from llama_models.llama3.api.datatypes import * # noqa: F403 from llama_stack.apis.inference import (
from llama_stack.apis.inference import * # noqa: F403 CompletionMessage,
InterleavedContent,
LogProbConfig,
Message,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
@json_schema_type @json_schema_type

View file

@ -4,11 +4,12 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
import base64
from typing import Annotated, List, Literal, Optional, Union from typing import Annotated, List, Literal, Optional, Union
from llama_models.schema_utils import json_schema_type, register_schema from llama_models.schema_utils import json_schema_type, register_schema
from pydantic import BaseModel, Field, model_validator from pydantic import BaseModel, Field, field_serializer, model_validator
@json_schema_type @json_schema_type
@ -27,6 +28,12 @@ class _URLOrData(BaseModel):
return values return values
return {"url": values} return {"url": values}
@field_serializer("data")
def serialize_data(self, data: Optional[bytes], _info):
if data is None:
return None
return base64.b64encode(data).decode("utf-8")
@json_schema_type @json_schema_type
class ImageContentItem(_URLOrData): class ImageContentItem(_URLOrData):

View file

@ -9,7 +9,7 @@ from typing import Any, Dict, List, Optional, Protocol, runtime_checkable
from llama_models.schema_utils import json_schema_type, webmethod from llama_models.schema_utils import json_schema_type, webmethod
from pydantic import BaseModel from pydantic import BaseModel
from llama_stack.apis.datasets import * # noqa: F403 from llama_stack.apis.datasets import Dataset
@json_schema_type @json_schema_type

View file

@ -4,18 +4,18 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import Literal, Optional, Protocol, Union from typing import Any, Dict, List, Literal, Optional, Protocol, Union
from llama_models.llama3.api.datatypes import BaseModel, Field
from llama_models.schema_utils import json_schema_type, webmethod
from typing_extensions import Annotated from typing_extensions import Annotated
from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_models.schema_utils import json_schema_type, webmethod
from llama_stack.apis.scoring_functions import * # noqa: F403
from llama_stack.apis.agents import AgentConfig from llama_stack.apis.agents import AgentConfig
from llama_stack.apis.common.job_types import Job, JobStatus from llama_stack.apis.common.job_types import Job, JobStatus
from llama_stack.apis.scoring import * # noqa: F403
from llama_stack.apis.eval_tasks import * # noqa: F403
from llama_stack.apis.inference import SamplingParams, SystemMessage from llama_stack.apis.inference import SamplingParams, SystemMessage
from llama_stack.apis.scoring import ScoringResult
from llama_stack.apis.scoring_functions import ScoringFnParams
@json_schema_type @json_schema_type

View file

@ -7,7 +7,9 @@
from enum import Enum from enum import Enum
from typing import ( from typing import (
Any,
AsyncIterator, AsyncIterator,
Dict,
List, List,
Literal, Literal,
Optional, Optional,
@ -32,8 +34,9 @@ from typing_extensions import Annotated
from llama_stack.apis.common.content_types import InterleavedContent from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.models import Model
from llama_stack.providers.utils.telemetry.trace_protocol import trace_protocol from llama_stack.providers.utils.telemetry.trace_protocol import trace_protocol
from llama_stack.apis.models import * # noqa: F403
class LogProbConfig(BaseModel): class LogProbConfig(BaseModel):

View file

@ -7,17 +7,17 @@
from datetime import datetime from datetime import datetime
from enum import Enum from enum import Enum
from typing import Any, Dict, List, Optional, Protocol, Union from typing import Any, Dict, List, Literal, Optional, Protocol, Union
from llama_models.schema_utils import json_schema_type, webmethod from llama_models.schema_utils import json_schema_type, webmethod
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from typing_extensions import Annotated from typing_extensions import Annotated
from llama_models.llama3.api.datatypes import * # noqa: F403 from llama_stack.apis.common.content_types import URL
from llama_stack.apis.common.job_types import JobStatus from llama_stack.apis.common.job_types import JobStatus
from llama_stack.apis.datasets import * # noqa: F403 from llama_stack.apis.common.training_types import Checkpoint
from llama_stack.apis.common.training_types import * # noqa: F403
@json_schema_type @json_schema_type

View file

@ -18,6 +18,8 @@ class ResourceType(Enum):
dataset = "dataset" dataset = "dataset"
scoring_function = "scoring_function" scoring_function = "scoring_function"
eval_task = "eval_task" eval_task = "eval_task"
tool = "tool"
tool_group = "tool_group"
class Resource(BaseModel): class Resource(BaseModel):

View file

@ -4,13 +4,12 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import Any, Dict, List, Protocol, runtime_checkable from typing import Any, Dict, List, Optional, Protocol, runtime_checkable
from llama_models.schema_utils import json_schema_type, webmethod from llama_models.schema_utils import json_schema_type, webmethod
from pydantic import BaseModel from pydantic import BaseModel
from llama_models.llama3.api.datatypes import * # noqa: F403 from llama_stack.apis.scoring_functions import ScoringFn, ScoringFnParams
from llama_stack.apis.scoring_functions import * # noqa: F403
# mapping of metric to value # mapping of metric to value

View file

@ -6,13 +6,12 @@
from enum import Enum from enum import Enum
from typing import Any, Dict, List, Optional, Protocol from typing import Any, Dict, List, Optional, Protocol, Union
from llama_models.schema_utils import json_schema_type, webmethod from llama_models.schema_utils import json_schema_type, webmethod
from pydantic import BaseModel from pydantic import BaseModel
from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_stack.apis.inference import Message from llama_stack.apis.inference import Message

View file

@ -0,0 +1,7 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from .tools import * # noqa: F401 F403

View file

@ -0,0 +1,141 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from typing import Annotated, Any, Dict, List, Literal, Optional, Union
from llama_models.llama3.api.datatypes import ToolPromptFormat
from llama_models.schema_utils import json_schema_type, register_schema, webmethod
from pydantic import BaseModel, Field
from typing_extensions import Protocol, runtime_checkable
from llama_stack.apis.common.content_types import InterleavedContent, URL
from llama_stack.apis.resource import Resource, ResourceType
from llama_stack.providers.utils.telemetry.trace_protocol import trace_protocol
@json_schema_type
class ToolParameter(BaseModel):
name: str
parameter_type: str
description: str
@json_schema_type
class Tool(Resource):
type: Literal[ResourceType.tool.value] = ResourceType.tool.value
tool_group: str
description: str
parameters: List[ToolParameter]
provider_id: Optional[str] = None
metadata: Optional[Dict[str, Any]] = None
tool_prompt_format: Optional[ToolPromptFormat] = Field(
default=ToolPromptFormat.json
)
@json_schema_type
class ToolDef(BaseModel):
name: str
description: str
parameters: List[ToolParameter]
metadata: Dict[str, Any]
tool_prompt_format: Optional[ToolPromptFormat] = Field(
default=ToolPromptFormat.json
)
@json_schema_type
class MCPToolGroupDef(BaseModel):
"""
A tool group that is defined by in a model context protocol server.
Refer to https://modelcontextprotocol.io/docs/concepts/tools for more information.
"""
type: Literal["model_context_protocol"] = "model_context_protocol"
endpoint: URL
@json_schema_type
class UserDefinedToolGroupDef(BaseModel):
type: Literal["user_defined"] = "user_defined"
tools: List[ToolDef]
ToolGroupDef = register_schema(
Annotated[
Union[MCPToolGroupDef, UserDefinedToolGroupDef], Field(discriminator="type")
],
name="ToolGroup",
)
class ToolGroup(Resource):
type: Literal[ResourceType.tool_group.value] = ResourceType.tool_group.value
@json_schema_type
class ToolInvocationResult(BaseModel):
content: InterleavedContent
error_message: Optional[str] = None
error_code: Optional[int] = None
class ToolStore(Protocol):
def get_tool(self, tool_name: str) -> Tool: ...
@runtime_checkable
@trace_protocol
class ToolGroups(Protocol):
@webmethod(route="/toolgroups/register", method="POST")
async def register_tool_group(
self,
tool_group_id: str,
tool_group: ToolGroupDef,
provider_id: Optional[str] = None,
) -> None:
"""Register a tool group"""
...
@webmethod(route="/toolgroups/get", method="GET")
async def get_tool_group(
self,
tool_group_id: str,
) -> ToolGroup: ...
@webmethod(route="/toolgroups/list", method="GET")
async def list_tool_groups(self) -> List[ToolGroup]:
"""List tool groups with optional provider"""
...
@webmethod(route="/tools/list", method="GET")
async def list_tools(self, tool_group_id: Optional[str] = None) -> List[Tool]:
"""List tools with optional tool group"""
...
@webmethod(route="/tools/get", method="GET")
async def get_tool(self, tool_name: str) -> Tool: ...
@webmethod(route="/toolgroups/unregister", method="POST")
async def unregister_tool_group(self, tool_group_id: str) -> None:
"""Unregister a tool group"""
...
@runtime_checkable
@trace_protocol
class ToolRuntime(Protocol):
tool_store: ToolStore
@webmethod(route="/tool-runtime/discover", method="POST")
async def discover_tools(self, tool_group: ToolGroupDef) -> List[ToolDef]: ...
@webmethod(route="/tool-runtime/invoke", method="POST")
async def invoke_tool(
self, tool_name: str, args: Dict[str, Any]
) -> ToolInvocationResult:
"""Run a tool with the given arguments"""
...

View file

@ -6,11 +6,12 @@
from typing import Any, Dict, Optional from typing import Any, Dict, Optional
from pydantic import BaseModel, ConfigDict, Field from llama_models.datatypes import CheckpointQuantizationFormat
from llama_models.llama3.api.datatypes import SamplingParams
from llama_models.datatypes import * # noqa: F403
from llama_models.sku_list import LlamaDownloadInfo from llama_models.sku_list import LlamaDownloadInfo
from pydantic import BaseModel, ConfigDict, Field
class PromptGuardModel(BaseModel): class PromptGuardModel(BaseModel):
"""Make a 'fake' Model-like object for Prompt Guard. Eventually this will be removed.""" """Make a 'fake' Model-like object for Prompt Guard. Eventually this will be removed."""

View file

@ -3,21 +3,28 @@
# #
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
import argparse import argparse
from llama_stack.cli.subcommand import Subcommand
from llama_stack.distribution.datatypes import * # noqa: F403
import os import os
import shutil import shutil
from functools import lru_cache from functools import lru_cache
from pathlib import Path from pathlib import Path
from typing import List, Optional
import pkg_resources import pkg_resources
from llama_stack.cli.subcommand import Subcommand
from llama_stack.distribution.datatypes import (
BuildConfig,
DistributionSpec,
Provider,
StackRunConfig,
)
from llama_stack.distribution.distribution import get_provider_registry from llama_stack.distribution.distribution import get_provider_registry
from llama_stack.distribution.resolver import InvalidProviderError from llama_stack.distribution.resolver import InvalidProviderError
from llama_stack.distribution.utils.dynamic import instantiate_class_type from llama_stack.distribution.utils.dynamic import instantiate_class_type
from llama_stack.providers.datatypes import Api
TEMPLATES_PATH = Path(__file__).parent.parent.parent / "templates" TEMPLATES_PATH = Path(__file__).parent.parent.parent / "templates"
@ -100,7 +107,7 @@ class StackBuild(Subcommand):
build_config.image_type = args.image_type build_config.image_type = args.image_type
else: else:
self.parser.error( self.parser.error(
f"Please specify a image-type (docker | conda) for {args.template}" f"Please specify a image-type (docker | conda | venv) for {args.template}"
) )
self._run_stack_build_command_from_build_config( self._run_stack_build_command_from_build_config(
build_config, template_name=args.template build_config, template_name=args.template
@ -122,7 +129,7 @@ class StackBuild(Subcommand):
) )
image_type = prompt( image_type = prompt(
"> Enter the image type you want your Llama Stack to be built as (docker or conda): ", "> Enter the image type you want your Llama Stack to be built as (docker or conda or venv): ",
validator=Validator.from_callable( validator=Validator.from_callable(
lambda x: x in ["docker", "conda", "venv"], lambda x: x in ["docker", "conda", "venv"],
error_message="Invalid image type, please enter conda or docker or venv", error_message="Invalid image type, please enter conda or docker or venv",

View file

@ -6,21 +6,22 @@
import logging import logging
from enum import Enum from enum import Enum
from typing import List
from pathlib import Path
from typing import Dict, List
import pkg_resources import pkg_resources
from pydantic import BaseModel from pydantic import BaseModel
from termcolor import cprint from termcolor import cprint
from llama_stack.distribution.utils.exec import run_with_pty from llama_stack.distribution.datatypes import BuildConfig, Provider
from llama_stack.distribution.datatypes import * # noqa: F403
from pathlib import Path
from llama_stack.distribution.distribution import get_provider_registry from llama_stack.distribution.distribution import get_provider_registry
from llama_stack.distribution.utils.config_dirs import BUILDS_BASE_DIR from llama_stack.distribution.utils.config_dirs import BUILDS_BASE_DIR
from llama_stack.distribution.utils.exec import run_with_pty
from llama_stack.providers.datatypes import Api
log = logging.getLogger(__name__) log = logging.getLogger(__name__)

View file

@ -6,10 +6,14 @@
import logging import logging
import textwrap import textwrap
from typing import Any from typing import Any, Dict
from llama_stack.distribution.datatypes import * # noqa: F403
from llama_stack.distribution.datatypes import (
DistributionSpec,
LLAMA_STACK_RUN_CONFIG_VERSION,
Provider,
StackRunConfig,
)
from llama_stack.distribution.distribution import ( from llama_stack.distribution.distribution import (
builtin_automatically_routed_apis, builtin_automatically_routed_apis,
get_provider_registry, get_provider_registry,
@ -17,10 +21,7 @@ from llama_stack.distribution.distribution import (
from llama_stack.distribution.utils.dynamic import instantiate_class_type from llama_stack.distribution.utils.dynamic import instantiate_class_type
from llama_stack.distribution.utils.prompt_for_config import prompt_for_config from llama_stack.distribution.utils.prompt_for_config import prompt_for_config
from llama_stack.providers.datatypes import Api, ProviderSpec
from llama_stack.apis.models import * # noqa: F403
from llama_stack.apis.shields import * # noqa: F403
from llama_stack.apis.memory_banks import * # noqa: F403
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)

View file

@ -4,23 +4,24 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import Dict, List, Optional, Union from typing import Annotated, Any, Dict, List, Optional, Union
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
from llama_stack.providers.datatypes import * # noqa: F403
from llama_stack.apis.models import * # noqa: F403
from llama_stack.apis.shields import * # noqa: F403
from llama_stack.apis.memory_banks import * # noqa: F403
from llama_stack.apis.datasets import * # noqa: F403
from llama_stack.apis.scoring_functions import * # noqa: F403
from llama_stack.apis.datasetio import DatasetIO from llama_stack.apis.datasetio import DatasetIO
from llama_stack.apis.datasets import Dataset, DatasetInput
from llama_stack.apis.eval import Eval from llama_stack.apis.eval import Eval
from llama_stack.apis.eval_tasks import EvalTaskInput from llama_stack.apis.eval_tasks import EvalTask, EvalTaskInput
from llama_stack.apis.inference import Inference from llama_stack.apis.inference import Inference
from llama_stack.apis.memory import Memory from llama_stack.apis.memory import Memory
from llama_stack.apis.memory_banks import MemoryBank, MemoryBankInput
from llama_stack.apis.models import Model, ModelInput
from llama_stack.apis.safety import Safety from llama_stack.apis.safety import Safety
from llama_stack.apis.scoring import Scoring from llama_stack.apis.scoring import Scoring
from llama_stack.apis.scoring_functions import ScoringFn, ScoringFnInput
from llama_stack.apis.shields import Shield, ShieldInput
from llama_stack.apis.tools import Tool, ToolGroup, ToolRuntime
from llama_stack.providers.datatypes import Api, ProviderSpec
from llama_stack.providers.utils.kvstore.config import KVStoreConfig from llama_stack.providers.utils.kvstore.config import KVStoreConfig
LLAMA_STACK_BUILD_CONFIG_VERSION = "2" LLAMA_STACK_BUILD_CONFIG_VERSION = "2"
@ -37,6 +38,8 @@ RoutableObject = Union[
Dataset, Dataset,
ScoringFn, ScoringFn,
EvalTask, EvalTask,
Tool,
ToolGroup,
] ]
@ -48,6 +51,8 @@ RoutableObjectWithProvider = Annotated[
Dataset, Dataset,
ScoringFn, ScoringFn,
EvalTask, EvalTask,
Tool,
ToolGroup,
], ],
Field(discriminator="type"), Field(discriminator="type"),
] ]
@ -59,6 +64,7 @@ RoutedProtocol = Union[
DatasetIO, DatasetIO,
Scoring, Scoring,
Eval, Eval,
ToolRuntime,
] ]

View file

@ -47,6 +47,10 @@ def builtin_automatically_routed_apis() -> List[AutoRoutedApiInfo]:
routing_table_api=Api.eval_tasks, routing_table_api=Api.eval_tasks,
router_api=Api.eval, router_api=Api.eval,
), ),
AutoRoutedApiInfo(
routing_table_api=Api.tool_groups,
router_api=Api.tool_runtime,
),
] ]

View file

@ -5,12 +5,12 @@
# the root directory of this source tree. # the root directory of this source tree.
from typing import Dict, List from typing import Dict, List
from llama_stack.apis.inspect import * # noqa: F403
from pydantic import BaseModel from pydantic import BaseModel
from llama_stack.apis.inspect import HealthInfo, Inspect, ProviderInfo, RouteInfo
from llama_stack.distribution.datatypes import StackRunConfig
from llama_stack.distribution.server.endpoints import get_all_api_endpoints from llama_stack.distribution.server.endpoints import get_all_api_endpoints
from llama_stack.providers.datatypes import * # noqa: F403
from llama_stack.distribution.datatypes import * # noqa: F403
class DistributionInspectConfig(BaseModel): class DistributionInspectConfig(BaseModel):

View file

@ -67,6 +67,7 @@ def in_notebook():
def stream_across_asyncio_run_boundary( def stream_across_asyncio_run_boundary(
async_gen_maker, async_gen_maker,
pool_executor: ThreadPoolExecutor, pool_executor: ThreadPoolExecutor,
path: Optional[str] = None,
) -> Generator[T, None, None]: ) -> Generator[T, None, None]:
result_queue = queue.Queue() result_queue = queue.Queue()
stop_event = threading.Event() stop_event = threading.Event()
@ -74,6 +75,7 @@ def stream_across_asyncio_run_boundary(
async def consumer(): async def consumer():
# make sure we make the generator in the event loop context # make sure we make the generator in the event loop context
gen = await async_gen_maker() gen = await async_gen_maker()
await start_trace(path, {"__location__": "library_client"})
try: try:
async for item in await gen: async for item in await gen:
result_queue.put(item) result_queue.put(item)
@ -85,6 +87,7 @@ def stream_across_asyncio_run_boundary(
finally: finally:
result_queue.put(StopIteration) result_queue.put(StopIteration)
stop_event.set() stop_event.set()
await end_trace()
def run_async(): def run_async():
# Run our own loop to avoid double async generator cleanup which is done # Run our own loop to avoid double async generator cleanup which is done
@ -186,14 +189,34 @@ class LlamaStackAsLibraryClient(LlamaStackClient):
return asyncio.run(self.async_client.initialize()) return asyncio.run(self.async_client.initialize())
def _get_path(
self,
cast_to: Any,
options: Any,
*,
stream=False,
stream_cls=None,
):
return options.url
def request(self, *args, **kwargs): def request(self, *args, **kwargs):
path = self._get_path(*args, **kwargs)
if kwargs.get("stream"): if kwargs.get("stream"):
return stream_across_asyncio_run_boundary( return stream_across_asyncio_run_boundary(
lambda: self.async_client.request(*args, **kwargs), lambda: self.async_client.request(*args, **kwargs),
self.pool_executor, self.pool_executor,
path=path,
) )
else: else:
return asyncio.run(self.async_client.request(*args, **kwargs))
async def _traced_request():
await start_trace(path, {"__location__": "library_client"})
try:
return await self.async_client.request(*args, **kwargs)
finally:
await end_trace()
return asyncio.run(_traced_request())
class AsyncLlamaStackAsLibraryClient(AsyncLlamaStackClient): class AsyncLlamaStackAsLibraryClient(AsyncLlamaStackClient):
@ -206,7 +229,10 @@ class AsyncLlamaStackAsLibraryClient(AsyncLlamaStackClient):
# when using the library client, we should not log to console since many # when using the library client, we should not log to console since many
# of our logs are intended for server-side usage # of our logs are intended for server-side usage
os.environ["TELEMETRY_SINKS"] = "sqlite" current_sinks = os.environ.get("TELEMETRY_SINKS", "sqlite").split(",")
os.environ["TELEMETRY_SINKS"] = ",".join(
sink for sink in current_sinks if sink != "console"
)
if config_path_or_template_name.endswith(".yaml"): if config_path_or_template_name.endswith(".yaml"):
config_path = Path(config_path_or_template_name) config_path = Path(config_path_or_template_name)
@ -295,41 +321,37 @@ class AsyncLlamaStackAsLibraryClient(AsyncLlamaStackClient):
body = options.params or {} body = options.params or {}
body |= options.json_data or {} body |= options.json_data or {}
await start_trace(path, {"__location__": "library_client"}) func = self.endpoint_impls.get(path)
try: if not func:
func = self.endpoint_impls.get(path) raise ValueError(f"No endpoint found for {path}")
if not func:
raise ValueError(f"No endpoint found for {path}")
body = self._convert_body(path, body) body = self._convert_body(path, body)
result = await func(**body) result = await func(**body)
json_content = json.dumps(convert_pydantic_to_json_value(result)) json_content = json.dumps(convert_pydantic_to_json_value(result))
mock_response = httpx.Response( mock_response = httpx.Response(
status_code=httpx.codes.OK, status_code=httpx.codes.OK,
content=json_content.encode("utf-8"), content=json_content.encode("utf-8"),
headers={ headers={
"Content-Type": "application/json", "Content-Type": "application/json",
}, },
request=httpx.Request( request=httpx.Request(
method=options.method, method=options.method,
url=options.url, url=options.url,
params=options.params, params=options.params,
headers=options.headers, headers=options.headers,
json=options.json_data, json=options.json_data,
), ),
) )
response = APIResponse( response = APIResponse(
raw=mock_response, raw=mock_response,
client=self, client=self,
cast_to=cast_to, cast_to=cast_to,
options=options, options=options,
stream=False, stream=False,
stream_cls=None, stream_cls=None,
) )
return response.parse() return response.parse()
finally:
await end_trace()
async def _call_streaming( async def _call_streaming(
self, self,
@ -341,51 +363,47 @@ class AsyncLlamaStackAsLibraryClient(AsyncLlamaStackClient):
path = options.url path = options.url
body = options.params or {} body = options.params or {}
body |= options.json_data or {} body |= options.json_data or {}
await start_trace(path, {"__location__": "library_client"}) func = self.endpoint_impls.get(path)
try: if not func:
func = self.endpoint_impls.get(path) raise ValueError(f"No endpoint found for {path}")
if not func:
raise ValueError(f"No endpoint found for {path}")
body = self._convert_body(path, body) body = self._convert_body(path, body)
async def gen(): async def gen():
async for chunk in await func(**body): async for chunk in await func(**body):
data = json.dumps(convert_pydantic_to_json_value(chunk)) data = json.dumps(convert_pydantic_to_json_value(chunk))
sse_event = f"data: {data}\n\n" sse_event = f"data: {data}\n\n"
yield sse_event.encode("utf-8") yield sse_event.encode("utf-8")
mock_response = httpx.Response( mock_response = httpx.Response(
status_code=httpx.codes.OK, status_code=httpx.codes.OK,
content=gen(), content=gen(),
headers={ headers={
"Content-Type": "application/json", "Content-Type": "application/json",
}, },
request=httpx.Request( request=httpx.Request(
method=options.method, method=options.method,
url=options.url, url=options.url,
params=options.params, params=options.params,
headers=options.headers, headers=options.headers,
json=options.json_data, json=options.json_data,
), ),
) )
# we use asynchronous impl always internally and channel all requests to AsyncLlamaStackClient # we use asynchronous impl always internally and channel all requests to AsyncLlamaStackClient
# however, the top-level caller may be a SyncAPIClient -- so its stream_cls might be a Stream (SyncStream) # however, the top-level caller may be a SyncAPIClient -- so its stream_cls might be a Stream (SyncStream)
# so we need to convert it to AsyncStream # so we need to convert it to AsyncStream
args = get_args(stream_cls) args = get_args(stream_cls)
stream_cls = AsyncStream[args[0]] stream_cls = AsyncStream[args[0]]
response = AsyncAPIResponse( response = AsyncAPIResponse(
raw=mock_response, raw=mock_response,
client=self, client=self,
cast_to=cast_to, cast_to=cast_to,
options=options, options=options,
stream=True, stream=True,
stream_cls=stream_cls, stream_cls=stream_cls,
) )
return await response.parse() return await response.parse()
finally:
await end_trace()
def _convert_body(self, path: str, body: Optional[dict] = None) -> dict: def _convert_body(self, path: str, body: Optional[dict] = None) -> dict:
if not body: if not body:

View file

@ -6,14 +6,10 @@
import importlib import importlib
import inspect import inspect
from typing import Any, Dict, List, Set
from llama_stack.providers.datatypes import * # noqa: F403
from llama_stack.distribution.datatypes import * # noqa: F403
import logging import logging
from typing import Any, Dict, List, Set
from llama_stack.apis.agents import Agents from llama_stack.apis.agents import Agents
from llama_stack.apis.datasetio import DatasetIO from llama_stack.apis.datasetio import DatasetIO
from llama_stack.apis.datasets import Datasets from llama_stack.apis.datasets import Datasets
@ -30,11 +26,34 @@ from llama_stack.apis.scoring import Scoring
from llama_stack.apis.scoring_functions import ScoringFunctions from llama_stack.apis.scoring_functions import ScoringFunctions
from llama_stack.apis.shields import Shields from llama_stack.apis.shields import Shields
from llama_stack.apis.telemetry import Telemetry from llama_stack.apis.telemetry import Telemetry
from llama_stack.apis.tools import ToolGroups, ToolRuntime
from llama_stack.distribution.client import get_client_impl from llama_stack.distribution.client import get_client_impl
from llama_stack.distribution.datatypes import (
AutoRoutedProviderSpec,
Provider,
RoutingTableProviderSpec,
StackRunConfig,
)
from llama_stack.distribution.distribution import builtin_automatically_routed_apis from llama_stack.distribution.distribution import builtin_automatically_routed_apis
from llama_stack.distribution.store import DistributionRegistry from llama_stack.distribution.store import DistributionRegistry
from llama_stack.distribution.utils.dynamic import instantiate_class_type from llama_stack.distribution.utils.dynamic import instantiate_class_type
from llama_stack.providers.datatypes import (
Api,
DatasetsProtocolPrivate,
EvalTasksProtocolPrivate,
InlineProviderSpec,
MemoryBanksProtocolPrivate,
ModelsProtocolPrivate,
ProviderSpec,
RemoteProviderConfig,
RemoteProviderSpec,
ScoringFunctionsProtocolPrivate,
ShieldsProtocolPrivate,
ToolsProtocolPrivate,
)
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
@ -60,12 +79,15 @@ def api_protocol_map() -> Dict[Api, Any]:
Api.eval: Eval, Api.eval: Eval,
Api.eval_tasks: EvalTasks, Api.eval_tasks: EvalTasks,
Api.post_training: PostTraining, Api.post_training: PostTraining,
Api.tool_groups: ToolGroups,
Api.tool_runtime: ToolRuntime,
} }
def additional_protocols_map() -> Dict[Api, Any]: def additional_protocols_map() -> Dict[Api, Any]:
return { return {
Api.inference: (ModelsProtocolPrivate, Models, Api.models), Api.inference: (ModelsProtocolPrivate, Models, Api.models),
Api.tool_groups: (ToolsProtocolPrivate, ToolGroups, Api.tool_groups),
Api.memory: (MemoryBanksProtocolPrivate, MemoryBanks, Api.memory_banks), Api.memory: (MemoryBanksProtocolPrivate, MemoryBanks, Api.memory_banks),
Api.safety: (ShieldsProtocolPrivate, Shields, Api.shields), Api.safety: (ShieldsProtocolPrivate, Shields, Api.shields),
Api.datasetio: (DatasetsProtocolPrivate, Datasets, Api.datasets), Api.datasetio: (DatasetsProtocolPrivate, Datasets, Api.datasets),

View file

@ -4,11 +4,12 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import Any from typing import Any, Dict
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.distribution.datatypes import RoutedProtocol
from llama_stack.distribution.store import DistributionRegistry from llama_stack.distribution.store import DistributionRegistry
from llama_stack.providers.datatypes import Api, RoutingTable
from .routing_tables import ( from .routing_tables import (
DatasetsRoutingTable, DatasetsRoutingTable,
@ -17,6 +18,7 @@ from .routing_tables import (
ModelsRoutingTable, ModelsRoutingTable,
ScoringFunctionsRoutingTable, ScoringFunctionsRoutingTable,
ShieldsRoutingTable, ShieldsRoutingTable,
ToolGroupsRoutingTable,
) )
@ -33,6 +35,7 @@ async def get_routing_table_impl(
"datasets": DatasetsRoutingTable, "datasets": DatasetsRoutingTable,
"scoring_functions": ScoringFunctionsRoutingTable, "scoring_functions": ScoringFunctionsRoutingTable,
"eval_tasks": EvalTasksRoutingTable, "eval_tasks": EvalTasksRoutingTable,
"tool_groups": ToolGroupsRoutingTable,
} }
if api.value not in api_to_tables: if api.value not in api_to_tables:
@ -51,6 +54,7 @@ async def get_auto_router_impl(api: Api, routing_table: RoutingTable, _deps) ->
MemoryRouter, MemoryRouter,
SafetyRouter, SafetyRouter,
ScoringRouter, ScoringRouter,
ToolRuntimeRouter,
) )
api_to_routers = { api_to_routers = {
@ -60,6 +64,7 @@ async def get_auto_router_impl(api: Api, routing_table: RoutingTable, _deps) ->
"datasetio": DatasetIORouter, "datasetio": DatasetIORouter,
"scoring": ScoringRouter, "scoring": ScoringRouter,
"eval": EvalRouter, "eval": EvalRouter,
"tool_runtime": ToolRuntimeRouter,
} }
if api.value not in api_to_routers: if api.value not in api_to_routers:
raise ValueError(f"API {api.value} not found in router map") raise ValueError(f"API {api.value} not found in router map")

View file

@ -6,15 +6,40 @@
from typing import Any, AsyncGenerator, Dict, List, Optional from typing import Any, AsyncGenerator, Dict, List, Optional
from llama_stack.apis.datasetio.datasetio import DatasetIO from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.datasetio import DatasetIO, PaginatedRowsResult
from llama_stack.apis.eval import (
AppEvalTaskConfig,
Eval,
EvalTaskConfig,
EvaluateResponse,
Job,
JobStatus,
)
from llama_stack.apis.inference import (
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.apis.memory import Memory, MemoryBankDocument, QueryDocumentsResponse
from llama_stack.apis.memory_banks.memory_banks import BankParams from llama_stack.apis.memory_banks.memory_banks import BankParams
from llama_stack.distribution.datatypes import RoutingTable from llama_stack.apis.models import ModelType
from llama_stack.apis.memory import * # noqa: F403 from llama_stack.apis.safety import RunShieldResponse, Safety
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.scoring import (
from llama_stack.apis.safety import * # noqa: F403 ScoreBatchResponse,
from llama_stack.apis.datasetio import * # noqa: F403 ScoreResponse,
from llama_stack.apis.scoring import * # noqa: F403 Scoring,
from llama_stack.apis.eval import * # noqa: F403 ScoringFnParams,
)
from llama_stack.apis.shields import Shield
from llama_stack.apis.tools import Tool, ToolGroupDef, ToolRuntime
from llama_stack.providers.datatypes import RoutingTable
class MemoryRouter(Memory): class MemoryRouter(Memory):
@ -329,7 +354,6 @@ class EvalRouter(Eval):
task_config=task_config, task_config=task_config,
) )
@webmethod(route="/eval/evaluate_rows", method="POST")
async def evaluate_rows( async def evaluate_rows(
self, self,
task_id: str, task_id: str,
@ -372,3 +396,28 @@ class EvalRouter(Eval):
task_id, task_id,
job_id, job_id,
) )
class ToolRuntimeRouter(ToolRuntime):
def __init__(
self,
routing_table: RoutingTable,
) -> None:
self.routing_table = routing_table
async def initialize(self) -> None:
pass
async def shutdown(self) -> None:
pass
async def invoke_tool(self, tool_name: str, args: Dict[str, Any]) -> Any:
return await self.routing_table.get_provider_impl(tool_name).invoke_tool(
tool_name=tool_name,
args=args,
)
async def discover_tools(self, tool_group: ToolGroupDef) -> List[Tool]:
return await self.routing_table.get_provider_impl(
tool_group.name
).discover_tools(tool_group)

View file

@ -8,19 +8,40 @@ from typing import Any, Dict, List, Optional
from pydantic import parse_obj_as from pydantic import parse_obj_as
from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_stack.apis.models import * # noqa: F403
from llama_stack.apis.shields import * # noqa: F403
from llama_stack.apis.memory_banks import * # noqa: F403
from llama_stack.apis.datasets import * # noqa: F403
from llama_stack.apis.eval_tasks import * # noqa: F403
from llama_stack.apis.common.content_types import URL from llama_stack.apis.common.content_types import URL
from llama_stack.apis.common.type_system import ParamType from llama_stack.apis.common.type_system import ParamType
from llama_stack.apis.datasets import Dataset, Datasets
from llama_stack.apis.eval_tasks import EvalTask, EvalTasks
from llama_stack.apis.memory_banks import (
BankParams,
MemoryBank,
MemoryBanks,
MemoryBankType,
)
from llama_stack.apis.models import Model, Models, ModelType
from llama_stack.apis.resource import ResourceType
from llama_stack.apis.scoring_functions import (
ScoringFn,
ScoringFnParams,
ScoringFunctions,
)
from llama_stack.apis.shields import Shield, Shields
from llama_stack.apis.tools import (
MCPToolGroupDef,
Tool,
ToolGroup,
ToolGroupDef,
ToolGroups,
UserDefinedToolGroupDef,
)
from llama_stack.distribution.datatypes import (
RoutableObject,
RoutableObjectWithProvider,
RoutedProtocol,
)
from llama_stack.distribution.store import DistributionRegistry from llama_stack.distribution.store import DistributionRegistry
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.providers.datatypes import Api, RoutingTable
def get_impl_api(p: Any) -> Api: def get_impl_api(p: Any) -> Api:
@ -45,6 +66,8 @@ async def register_object_with_provider(obj: RoutableObject, p: Any) -> Routable
return await p.register_scoring_function(obj) return await p.register_scoring_function(obj)
elif api == Api.eval: elif api == Api.eval:
return await p.register_eval_task(obj) return await p.register_eval_task(obj)
elif api == Api.tool_runtime:
return await p.register_tool(obj)
else: else:
raise ValueError(f"Unknown API {api} for registering object with provider") raise ValueError(f"Unknown API {api} for registering object with provider")
@ -57,6 +80,8 @@ async def unregister_object_from_provider(obj: RoutableObject, p: Any) -> None:
return await p.unregister_model(obj.identifier) return await p.unregister_model(obj.identifier)
elif api == Api.datasetio: elif api == Api.datasetio:
return await p.unregister_dataset(obj.identifier) return await p.unregister_dataset(obj.identifier)
elif api == Api.tool_runtime:
return await p.unregister_tool(obj.identifier)
else: else:
raise ValueError(f"Unregister not supported for {api}") raise ValueError(f"Unregister not supported for {api}")
@ -104,6 +129,8 @@ class CommonRoutingTableImpl(RoutingTable):
await add_objects(scoring_functions, pid, ScoringFn) await add_objects(scoring_functions, pid, ScoringFn)
elif api == Api.eval: elif api == Api.eval:
p.eval_task_store = self p.eval_task_store = self
elif api == Api.tool_runtime:
p.tool_store = self
async def shutdown(self) -> None: async def shutdown(self) -> None:
for p in self.impls_by_provider_id.values(): for p in self.impls_by_provider_id.values():
@ -125,6 +152,8 @@ class CommonRoutingTableImpl(RoutingTable):
return ("Scoring", "scoring_function") return ("Scoring", "scoring_function")
elif isinstance(self, EvalTasksRoutingTable): elif isinstance(self, EvalTasksRoutingTable):
return ("Eval", "eval_task") return ("Eval", "eval_task")
elif isinstance(self, ToolGroupsRoutingTable):
return ("Tools", "tool")
else: else:
raise ValueError("Unknown routing table type") raise ValueError("Unknown routing table type")
@ -461,3 +490,88 @@ class EvalTasksRoutingTable(CommonRoutingTableImpl, EvalTasks):
provider_resource_id=provider_eval_task_id, provider_resource_id=provider_eval_task_id,
) )
await self.register_object(eval_task) await self.register_object(eval_task)
class ToolGroupsRoutingTable(CommonRoutingTableImpl, ToolGroups):
async def list_tools(self, tool_group_id: Optional[str] = None) -> List[Tool]:
tools = await self.get_all_with_type("tool")
if tool_group_id:
tools = [tool for tool in tools if tool.tool_group == tool_group_id]
return tools
async def list_tool_groups(self) -> List[ToolGroup]:
return await self.get_all_with_type("tool_group")
async def get_tool_group(self, tool_group_id: str) -> ToolGroup:
return await self.get_object_by_identifier("tool_group", tool_group_id)
async def get_tool(self, tool_name: str) -> Tool:
return await self.get_object_by_identifier("tool", tool_name)
async def register_tool_group(
self,
tool_group_id: str,
tool_group: ToolGroupDef,
provider_id: Optional[str] = None,
) -> None:
tools = []
tool_defs = []
if provider_id is None:
if len(self.impls_by_provider_id.keys()) > 1:
raise ValueError(
f"No provider_id specified and multiple providers available. Please specify a provider_id. Available providers: {', '.join(self.impls_by_provider_id.keys())}"
)
provider_id = list(self.impls_by_provider_id.keys())[0]
if isinstance(tool_group, MCPToolGroupDef):
tool_defs = await self.impls_by_provider_id[provider_id].discover_tools(
tool_group
)
elif isinstance(tool_group, UserDefinedToolGroupDef):
tool_defs = tool_group.tools
else:
raise ValueError(f"Unknown tool group: {tool_group}")
for tool_def in tool_defs:
tools.append(
Tool(
identifier=tool_def.name,
tool_group=tool_group_id,
description=tool_def.description,
parameters=tool_def.parameters,
provider_id=provider_id,
tool_prompt_format=tool_def.tool_prompt_format,
provider_resource_id=tool_def.name,
metadata=tool_def.metadata,
)
)
for tool in tools:
existing_tool = await self.get_tool(tool.identifier)
# Compare existing and new object if one exists
if existing_tool:
existing_dict = existing_tool.model_dump()
new_dict = tool.model_dump()
if existing_dict != new_dict:
raise ValueError(
f"Object {tool.identifier} already exists in registry. Please use a different identifier."
)
await self.register_object(tool)
await self.dist_registry.register(
ToolGroup(
identifier=tool_group_id,
provider_id=provider_id,
provider_resource_id=tool_group_id,
)
)
async def unregister_tool_group(self, tool_group_id: str) -> None:
tool_group = await self.get_tool_group(tool_group_id)
if tool_group is None:
raise ValueError(f"Tool group {tool_group_id} not found")
tools = await self.list_tools(tool_group_id)
for tool in tools:
await self.unregister_object(tool)
await self.unregister_object(tool_group)

View file

@ -28,14 +28,9 @@ from pydantic import BaseModel, ValidationError
from termcolor import cprint from termcolor import cprint
from typing_extensions import Annotated from typing_extensions import Annotated
from llama_stack.distribution.distribution import builtin_automatically_routed_apis from llama_stack.distribution.datatypes import StackRunConfig
from llama_stack.providers.utils.telemetry.tracing import ( from llama_stack.distribution.distribution import builtin_automatically_routed_apis
end_trace,
setup_logger,
start_trace,
)
from llama_stack.distribution.datatypes import * # noqa: F403
from llama_stack.distribution.request_headers import set_request_provider_data from llama_stack.distribution.request_headers import set_request_provider_data
from llama_stack.distribution.resolver import InvalidProviderError from llama_stack.distribution.resolver import InvalidProviderError
from llama_stack.distribution.stack import ( from llama_stack.distribution.stack import (
@ -43,11 +38,19 @@ from llama_stack.distribution.stack import (
replace_env_vars, replace_env_vars,
validate_env_pair, validate_env_pair,
) )
from llama_stack.providers.datatypes import Api
from llama_stack.providers.inline.telemetry.meta_reference.config import TelemetryConfig from llama_stack.providers.inline.telemetry.meta_reference.config import TelemetryConfig
from llama_stack.providers.inline.telemetry.meta_reference.telemetry import ( from llama_stack.providers.inline.telemetry.meta_reference.telemetry import (
TelemetryAdapter, TelemetryAdapter,
) )
from llama_stack.providers.utils.telemetry.tracing import (
end_trace,
setup_logger,
start_trace,
)
from .endpoints import get_all_api_endpoints from .endpoints import get_all_api_endpoints

View file

@ -8,32 +8,31 @@ import logging
import os import os
import re import re
from pathlib import Path from pathlib import Path
from typing import Any, Dict from typing import Any, Dict, Optional
import pkg_resources import pkg_resources
import yaml import yaml
from termcolor import colored from termcolor import colored
from llama_models.llama3.api.datatypes import * # noqa: F403 from llama_stack.apis.agents import Agents
from llama_stack.apis.agents import * # noqa: F403 from llama_stack.apis.batch_inference import BatchInference
from llama_stack.apis.datasets import * # noqa: F403 from llama_stack.apis.datasetio import DatasetIO
from llama_stack.apis.datasetio import * # noqa: F403 from llama_stack.apis.datasets import Datasets
from llama_stack.apis.scoring import * # noqa: F403 from llama_stack.apis.eval import Eval
from llama_stack.apis.scoring_functions import * # noqa: F403 from llama_stack.apis.eval_tasks import EvalTasks
from llama_stack.apis.eval import * # noqa: F403 from llama_stack.apis.inference import Inference
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.inspect import Inspect
from llama_stack.apis.batch_inference import * # noqa: F403 from llama_stack.apis.memory import Memory
from llama_stack.apis.memory import * # noqa: F403 from llama_stack.apis.memory_banks import MemoryBanks
from llama_stack.apis.telemetry import * # noqa: F403 from llama_stack.apis.models import Models
from llama_stack.apis.post_training import * # noqa: F403 from llama_stack.apis.post_training import PostTraining
from llama_stack.apis.synthetic_data_generation import * # noqa: F403 from llama_stack.apis.safety import Safety
from llama_stack.apis.safety import * # noqa: F403 from llama_stack.apis.scoring import Scoring
from llama_stack.apis.models import * # noqa: F403 from llama_stack.apis.scoring_functions import ScoringFunctions
from llama_stack.apis.memory_banks import * # noqa: F403 from llama_stack.apis.shields import Shields
from llama_stack.apis.shields import * # noqa: F403 from llama_stack.apis.synthetic_data_generation import SyntheticDataGeneration
from llama_stack.apis.inspect import * # noqa: F403 from llama_stack.apis.telemetry import Telemetry
from llama_stack.apis.eval_tasks import * # noqa: F403
from llama_stack.distribution.datatypes import StackRunConfig from llama_stack.distribution.datatypes import StackRunConfig
from llama_stack.distribution.distribution import get_provider_registry from llama_stack.distribution.distribution import get_provider_registry

View file

@ -13,11 +13,8 @@ import pydantic
from llama_stack.distribution.datatypes import KVStoreConfig, RoutableObjectWithProvider from llama_stack.distribution.datatypes import KVStoreConfig, RoutableObjectWithProvider
from llama_stack.distribution.utils.config_dirs import DISTRIBS_BASE_DIR from llama_stack.distribution.utils.config_dirs import DISTRIBS_BASE_DIR
from llama_stack.providers.utils.kvstore import ( from llama_stack.providers.utils.kvstore import KVStore, kvstore_impl
KVStore, from llama_stack.providers.utils.kvstore.config import SqliteKVStoreConfig
kvstore_impl,
SqliteKVStoreConfig,
)
class DistributionRegistry(Protocol): class DistributionRegistry(Protocol):

View file

@ -8,11 +8,14 @@ import os
import pytest import pytest
import pytest_asyncio import pytest_asyncio
from llama_stack.distribution.store import * # noqa F403
from llama_stack.apis.inference import Model from llama_stack.apis.inference import Model
from llama_stack.apis.memory_banks import VectorMemoryBank from llama_stack.apis.memory_banks import VectorMemoryBank
from llama_stack.distribution.store.registry import (
CachedDiskDistributionRegistry,
DiskDistributionRegistry,
)
from llama_stack.providers.utils.kvstore import kvstore_impl, SqliteKVStoreConfig from llama_stack.providers.utils.kvstore import kvstore_impl, SqliteKVStoreConfig
from llama_stack.distribution.datatypes import * # noqa F403
@pytest.fixture @pytest.fixture

View file

@ -17,6 +17,7 @@ from llama_stack.apis.memory_banks.memory_banks import MemoryBank
from llama_stack.apis.models import Model from llama_stack.apis.models import Model
from llama_stack.apis.scoring_functions import ScoringFn from llama_stack.apis.scoring_functions import ScoringFn
from llama_stack.apis.shields import Shield from llama_stack.apis.shields import Shield
from llama_stack.apis.tools import Tool
@json_schema_type @json_schema_type
@ -29,6 +30,7 @@ class Api(Enum):
scoring = "scoring" scoring = "scoring"
eval = "eval" eval = "eval"
post_training = "post_training" post_training = "post_training"
tool_runtime = "tool_runtime"
telemetry = "telemetry" telemetry = "telemetry"
@ -38,6 +40,7 @@ class Api(Enum):
datasets = "datasets" datasets = "datasets"
scoring_functions = "scoring_functions" scoring_functions = "scoring_functions"
eval_tasks = "eval_tasks" eval_tasks = "eval_tasks"
tool_groups = "tool_groups"
# built-in API # built-in API
inspect = "inspect" inspect = "inspect"
@ -75,6 +78,12 @@ class EvalTasksProtocolPrivate(Protocol):
async def register_eval_task(self, eval_task: EvalTask) -> None: ... async def register_eval_task(self, eval_task: EvalTask) -> None: ...
class ToolsProtocolPrivate(Protocol):
async def register_tool(self, tool: Tool) -> None: ...
async def unregister_tool(self, tool_id: str) -> None: ...
@json_schema_type @json_schema_type
class ProviderSpec(BaseModel): class ProviderSpec(BaseModel):
api: Api api: Api

View file

@ -13,19 +13,64 @@ import secrets
import string import string
import uuid import uuid
from datetime import datetime from datetime import datetime
from typing import AsyncGenerator, List, Tuple from typing import AsyncGenerator, Dict, List, Optional, Tuple
from urllib.parse import urlparse from urllib.parse import urlparse
import httpx import httpx
from llama_models.llama3.api.datatypes import BuiltinTool
from llama_stack.apis.agents import * # noqa: F403 from llama_stack.apis.agents import (
from llama_stack.apis.inference import * # noqa: F403 AgentConfig,
from llama_stack.apis.memory import * # noqa: F403 AgentTool,
from llama_stack.apis.memory_banks import * # noqa: F403 AgentTurnCreateRequest,
from llama_stack.apis.safety import * # noqa: F403 AgentTurnResponseEvent,
AgentTurnResponseEventType,
AgentTurnResponseStepCompletePayload,
AgentTurnResponseStepProgressPayload,
AgentTurnResponseStepStartPayload,
AgentTurnResponseStreamChunk,
AgentTurnResponseTurnCompletePayload,
AgentTurnResponseTurnStartPayload,
Attachment,
CodeInterpreterToolDefinition,
FunctionCallToolDefinition,
InferenceStep,
MemoryRetrievalStep,
MemoryToolDefinition,
PhotogenToolDefinition,
SearchToolDefinition,
ShieldCallStep,
StepType,
ToolExecutionStep,
Turn,
WolframAlphaToolDefinition,
)
from llama_stack.apis.common.content_types import InterleavedContent, TextContentItem from llama_stack.apis.common.content_types import (
InterleavedContent,
TextContentItem,
URL,
)
from llama_stack.apis.inference import (
ChatCompletionResponseEventType,
CompletionMessage,
Inference,
Message,
SamplingParams,
StopReason,
SystemMessage,
ToolCallDelta,
ToolCallParseStatus,
ToolChoice,
ToolDefinition,
ToolResponse,
ToolResponseMessage,
UserMessage,
)
from llama_stack.apis.memory import Memory, MemoryBankDocument, QueryDocumentsResponse
from llama_stack.apis.memory_banks import MemoryBanks, VectorMemoryBankParams
from llama_stack.apis.safety import Safety
from llama_stack.providers.utils.kvstore import KVStore from llama_stack.providers.utils.kvstore import KVStore
from llama_stack.providers.utils.memory.vector_store import concat_interleaved_content from llama_stack.providers.utils.memory.vector_store import concat_interleaved_content

View file

@ -9,15 +9,26 @@ import logging
import shutil import shutil
import tempfile import tempfile
import uuid import uuid
from typing import AsyncGenerator from typing import AsyncGenerator, List, Optional, Union
from termcolor import colored from termcolor import colored
from llama_stack.apis.inference import Inference from llama_stack.apis.agents import (
AgentConfig,
AgentCreateResponse,
Agents,
AgentSessionCreateResponse,
AgentStepResponse,
AgentTurnCreateRequest,
Attachment,
Session,
Turn,
)
from llama_stack.apis.inference import Inference, ToolResponseMessage, UserMessage
from llama_stack.apis.memory import Memory from llama_stack.apis.memory import Memory
from llama_stack.apis.memory_banks import MemoryBanks from llama_stack.apis.memory_banks import MemoryBanks
from llama_stack.apis.safety import Safety from llama_stack.apis.safety import Safety
from llama_stack.apis.agents import * # noqa: F403
from llama_stack.providers.utils.kvstore import InmemoryKVStoreImpl, kvstore_impl from llama_stack.providers.utils.kvstore import InmemoryKVStoreImpl, kvstore_impl

View file

@ -10,9 +10,11 @@ import uuid
from datetime import datetime from datetime import datetime
from typing import List, Optional from typing import List, Optional
from llama_stack.apis.agents import * # noqa: F403
from pydantic import BaseModel from pydantic import BaseModel
from llama_stack.apis.agents import Turn
from llama_stack.providers.utils.kvstore import KVStore from llama_stack.providers.utils.kvstore import KVStore
log = logging.getLogger(__name__) log = logging.getLogger(__name__)

View file

@ -7,8 +7,6 @@
from typing import List from typing import List
from jinja2 import Template from jinja2 import Template
from llama_models.llama3.api import * # noqa: F403
from llama_stack.apis.agents import ( from llama_stack.apis.agents import (
DefaultMemoryQueryGeneratorConfig, DefaultMemoryQueryGeneratorConfig,
@ -16,7 +14,7 @@ from llama_stack.apis.agents import (
MemoryQueryGenerator, MemoryQueryGenerator,
MemoryQueryGeneratorConfig, MemoryQueryGeneratorConfig,
) )
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.inference import Message, UserMessage
from llama_stack.providers.utils.inference.prompt_adapter import ( from llama_stack.providers.utils.inference.prompt_adapter import (
interleaved_content_as_str, interleaved_content_as_str,
) )

View file

@ -9,7 +9,9 @@ import logging
from typing import List from typing import List
from llama_stack.apis.safety import * # noqa: F403 from llama_stack.apis.inference import Message
from llama_stack.apis.safety import Safety, SafetyViolation, ViolationLevel
log = logging.getLogger(__name__) log = logging.getLogger(__name__)

View file

@ -8,10 +8,26 @@ from typing import AsyncIterator, List, Optional, Union
import pytest import pytest
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.agents import (
from llama_stack.apis.memory import * # noqa: F403 AgentConfig,
from llama_stack.apis.safety import * # noqa: F403 AgentTurnCreateRequest,
from llama_stack.apis.agents import * # noqa: F403 AgentTurnResponseTurnCompletePayload,
)
from llama_stack.apis.inference import (
ChatCompletionResponse,
ChatCompletionResponseEvent,
ChatCompletionResponseStreamChunk,
CompletionMessage,
Message,
ResponseFormat,
SamplingParams,
ToolChoice,
ToolDefinition,
UserMessage,
)
from llama_stack.apis.memory import MemoryBank
from llama_stack.apis.safety import RunShieldResponse
from ..agents import ( from ..agents import (
AGENT_INSTANCES_BY_ID, AGENT_INSTANCES_BY_ID,

View file

@ -7,7 +7,7 @@
from typing import List from typing import List
from llama_stack.apis.inference import Message from llama_stack.apis.inference import Message
from llama_stack.apis.safety import * # noqa: F403 from llama_stack.apis.safety import Safety
from ..safety import ShieldRunnerMixin from ..safety import ShieldRunnerMixin
from .builtin import BaseTool from .builtin import BaseTool

View file

@ -3,7 +3,7 @@
# #
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from llama_stack.apis.datasetio import * # noqa: F401, F403 from pydantic import BaseModel
class LocalFSDatasetIOConfig(BaseModel): ... class LocalFSDatasetIOConfig(BaseModel): ...

View file

@ -3,18 +3,19 @@
# #
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import Any, Dict, List, Optional
import pandas
from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_stack.apis.datasetio import * # noqa: F403
import base64 import base64
import os import os
from abc import ABC, abstractmethod from abc import ABC, abstractmethod
from dataclasses import dataclass from dataclasses import dataclass
from typing import Any, Dict, List, Optional
from urllib.parse import urlparse from urllib.parse import urlparse
import pandas
from llama_stack.apis.common.content_types import URL
from llama_stack.apis.datasetio import DatasetIO, PaginatedRowsResult
from llama_stack.apis.datasets import Dataset
from llama_stack.providers.datatypes import DatasetsProtocolPrivate from llama_stack.providers.datatypes import DatasetsProtocolPrivate
from llama_stack.providers.utils.datasetio.url_utils import get_dataframe_from_url from llama_stack.providers.utils.datasetio.url_utils import get_dataframe_from_url

View file

@ -8,6 +8,11 @@ from typing import Any, Dict, List, Optional
from tqdm import tqdm from tqdm import tqdm
from llama_stack.apis.agents import Agents from llama_stack.apis.agents import Agents
from llama_stack.apis.common.type_system import (
ChatCompletionInputType,
CompletionInputType,
StringType,
)
from llama_stack.apis.datasetio import DatasetIO from llama_stack.apis.datasetio import DatasetIO
from llama_stack.apis.datasets import Datasets from llama_stack.apis.datasets import Datasets
from llama_stack.apis.eval_tasks import EvalTask from llama_stack.apis.eval_tasks import EvalTask

View file

@ -6,11 +6,10 @@
from typing import Any, Dict, Optional from typing import Any, Dict, Optional
from llama_models.datatypes import * # noqa: F403
from llama_stack.apis.inference import * # noqa: F401, F403
from pydantic import BaseModel, field_validator from pydantic import BaseModel, field_validator
from llama_stack.apis.inference import QuantizationConfig
from llama_stack.providers.utils.inference import supported_inference_models from llama_stack.providers.utils.inference import supported_inference_models

View file

@ -32,11 +32,16 @@ from llama_models.llama3.reference_impl.multimodal.model import (
CrossAttentionTransformer, CrossAttentionTransformer,
) )
from llama_models.sku_list import resolve_model from llama_models.sku_list import resolve_model
from pydantic import BaseModel
from llama_stack.apis.inference import * # noqa: F403
from lmformatenforcer import JsonSchemaParser, TokenEnforcer, TokenEnforcerTokenizerData from lmformatenforcer import JsonSchemaParser, TokenEnforcer, TokenEnforcerTokenizerData
from pydantic import BaseModel
from llama_stack.apis.inference import (
Fp8QuantizationConfig,
Int4QuantizationConfig,
ResponseFormat,
ResponseFormatType,
)
from llama_stack.distribution.utils.model_utils import model_local_dir from llama_stack.distribution.utils.model_utils import model_local_dir
from llama_stack.providers.utils.inference.prompt_adapter import ( from llama_stack.providers.utils.inference.prompt_adapter import (
@ -44,12 +49,7 @@ from llama_stack.providers.utils.inference.prompt_adapter import (
CompletionRequestWithRawContent, CompletionRequestWithRawContent,
) )
from .config import ( from .config import MetaReferenceInferenceConfig, MetaReferenceQuantizedInferenceConfig
Fp8QuantizationConfig,
Int4QuantizationConfig,
MetaReferenceInferenceConfig,
MetaReferenceQuantizedInferenceConfig,
)
log = logging.getLogger(__name__) log = logging.getLogger(__name__)

View file

@ -14,7 +14,10 @@ from llama_models.llama3.api.datatypes import Model
from llama_models.llama3.api.tokenizer import Tokenizer from llama_models.llama3.api.tokenizer import Tokenizer
from llama_models.sku_list import resolve_model from llama_models.sku_list import resolve_model
from llama_stack.apis.inference import ChatCompletionRequest, CompletionRequest from llama_stack.providers.utils.inference.prompt_adapter import (
ChatCompletionRequestWithRawContent,
CompletionRequestWithRawContent,
)
from .config import MetaReferenceInferenceConfig from .config import MetaReferenceInferenceConfig
from .generation import Llama, model_checkpoint_dir from .generation import Llama, model_checkpoint_dir
@ -27,9 +30,9 @@ class ModelRunner:
# the `task` object is the same that is sent to `ModelParallelProcessGroup.run_inference()` # the `task` object is the same that is sent to `ModelParallelProcessGroup.run_inference()`
def __call__(self, req: Any): def __call__(self, req: Any):
if isinstance(req, ChatCompletionRequest): if isinstance(req, ChatCompletionRequestWithRawContent):
return self.llama.chat_completion(req) return self.llama.chat_completion(req)
elif isinstance(req, CompletionRequest): elif isinstance(req, CompletionRequestWithRawContent):
return self.llama.completion(req) return self.llama.completion(req)
else: else:
raise ValueError(f"Unexpected task type {type(req)}") raise ValueError(f"Unexpected task type {type(req)}")
@ -100,7 +103,7 @@ class LlamaModelParallelGenerator:
def completion( def completion(
self, self,
request: CompletionRequest, request: CompletionRequestWithRawContent,
) -> Generator: ) -> Generator:
req_obj = deepcopy(request) req_obj = deepcopy(request)
gen = self.group.run_inference(req_obj) gen = self.group.run_inference(req_obj)
@ -108,7 +111,7 @@ class LlamaModelParallelGenerator:
def chat_completion( def chat_completion(
self, self,
request: ChatCompletionRequest, request: ChatCompletionRequestWithRawContent,
) -> Generator: ) -> Generator:
req_obj = deepcopy(request) req_obj = deepcopy(request)
gen = self.group.run_inference(req_obj) gen = self.group.run_inference(req_obj)

View file

@ -34,7 +34,10 @@ from pydantic import BaseModel, Field
from torch.distributed.launcher.api import elastic_launch, LaunchConfig from torch.distributed.launcher.api import elastic_launch, LaunchConfig
from typing_extensions import Annotated from typing_extensions import Annotated
from llama_stack.apis.inference import ChatCompletionRequest, CompletionRequest from llama_stack.providers.utils.inference.prompt_adapter import (
ChatCompletionRequestWithRawContent,
CompletionRequestWithRawContent,
)
from .generation import TokenResult from .generation import TokenResult
@ -79,7 +82,7 @@ class TaskRequest(BaseModel):
type: Literal[ProcessingMessageName.task_request] = ( type: Literal[ProcessingMessageName.task_request] = (
ProcessingMessageName.task_request ProcessingMessageName.task_request
) )
task: Union[CompletionRequest, ChatCompletionRequest] task: Union[CompletionRequestWithRawContent, ChatCompletionRequestWithRawContent]
class TaskResponse(BaseModel): class TaskResponse(BaseModel):
@ -264,9 +267,6 @@ def launch_dist_group(
init_model_cb: Callable, init_model_cb: Callable,
**kwargs, **kwargs,
) -> None: ) -> None:
id = uuid.uuid4().hex
dist_url = f"file:///tmp/llama3_{id}_{time.time()}"
with tempfile.TemporaryDirectory() as tmpdir: with tempfile.TemporaryDirectory() as tmpdir:
# TODO: track workers and if they terminate, tell parent process about it so cleanup can happen # TODO: track workers and if they terminate, tell parent process about it so cleanup can happen
launch_config = LaunchConfig( launch_config = LaunchConfig(
@ -315,7 +315,7 @@ def start_model_parallel_process(
# wait until the model is loaded; rank 0 will send a message to indicate it's ready # wait until the model is loaded; rank 0 will send a message to indicate it's ready
request_socket.send(encode_msg(ReadyRequest())) request_socket.send(encode_msg(ReadyRequest()))
response = request_socket.recv() _response = request_socket.recv()
log.info("Loaded model...") log.info("Loaded model...")
return request_socket, process return request_socket, process
@ -349,7 +349,10 @@ class ModelParallelProcessGroup:
self.started = False self.started = False
def run_inference( def run_inference(
self, req: Union[CompletionRequest, ChatCompletionRequest] self,
req: Union[
CompletionRequestWithRawContent, ChatCompletionRequestWithRawContent
],
) -> Generator: ) -> Generator:
assert not self.running, "inference already running" assert not self.running, "inference already running"

View file

@ -7,10 +7,10 @@
import logging import logging
import os import os
import uuid import uuid
from typing import AsyncGenerator, Optional from typing import AsyncGenerator, List, Optional
from llama_models.llama3.api.chat_format import ChatFormat from llama_models.llama3.api.chat_format import ChatFormat
from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_models.llama3.api.tokenizer import Tokenizer from llama_models.llama3.api.tokenizer import Tokenizer
from llama_models.sku_list import resolve_model from llama_models.sku_list import resolve_model
@ -18,9 +18,26 @@ from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.sampling_params import SamplingParams as VLLMSamplingParams from vllm.sampling_params import SamplingParams as VLLMSamplingParams
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.inference import (
ChatCompletionRequest,
ChatCompletionResponse,
ChatCompletionResponseStreamChunk,
CompletionResponse,
CompletionResponseStreamChunk,
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.apis.models import Model
from llama_stack.providers.datatypes import Model, ModelsProtocolPrivate from llama_stack.providers.datatypes import ModelsProtocolPrivate
from llama_stack.providers.utils.inference.openai_compat import ( from llama_stack.providers.utils.inference.openai_compat import (
OpenAICompatCompletionChoice, OpenAICompatCompletionChoice,
OpenAICompatCompletionResponse, OpenAICompatCompletionResponse,

View file

@ -16,11 +16,14 @@ import faiss
import numpy as np import numpy as np
from numpy.typing import NDArray from numpy.typing import NDArray
from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_stack.apis.memory import * # noqa: F403
from llama_stack.apis.inference import InterleavedContent from llama_stack.apis.inference import InterleavedContent
from llama_stack.apis.memory_banks import MemoryBankType, VectorMemoryBank from llama_stack.apis.memory import (
Chunk,
Memory,
MemoryBankDocument,
QueryDocumentsResponse,
)
from llama_stack.apis.memory_banks import MemoryBank, MemoryBankType, VectorMemoryBank
from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate
from llama_stack.providers.utils.kvstore import kvstore_impl from llama_stack.providers.utils.kvstore import kvstore_impl
from llama_stack.providers.utils.memory.vector_store import ( from llama_stack.providers.utils.memory.vector_store import (

View file

@ -14,11 +14,10 @@ from enum import Enum
from typing import Any, Callable, Dict, List from typing import Any, Callable, Dict, List
import torch import torch
from llama_stack.apis.datasets import Datasets
from llama_stack.apis.common.type_system import * # noqa
from llama_models.datatypes import Model from llama_models.datatypes import Model
from llama_models.sku_list import resolve_model from llama_models.sku_list import resolve_model
from llama_stack.apis.common.type_system import ParamType from llama_stack.apis.common.type_system import ParamType, StringType
from llama_stack.apis.datasets import Datasets
from torchtune.models.llama3 import llama3_tokenizer, lora_llama3_8b from torchtune.models.llama3 import llama3_tokenizer, lora_llama3_8b
from torchtune.models.llama3._tokenizer import Llama3Tokenizer from torchtune.models.llama3._tokenizer import Llama3Tokenizer

View file

@ -3,11 +3,26 @@
# #
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from datetime import datetime
from typing import Any, Dict, List, Optional
from llama_models.schema_utils import webmethod
from llama_stack.apis.datasetio import DatasetIO from llama_stack.apis.datasetio import DatasetIO
from llama_stack.apis.datasets import Datasets
from llama_stack.apis.post_training import (
AlgorithmConfig,
DPOAlignmentConfig,
JobStatus,
LoraFinetuningConfig,
PostTrainingJob,
PostTrainingJobArtifactsResponse,
PostTrainingJobStatusResponse,
TrainingConfig,
)
from llama_stack.providers.inline.post_training.torchtune.config import ( from llama_stack.providers.inline.post_training.torchtune.config import (
TorchtunePostTrainingConfig, TorchtunePostTrainingConfig,
) )
from llama_stack.apis.post_training import * # noqa
from llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device import ( from llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device import (
LoraFinetuningSingleDevice, LoraFinetuningSingleDevice,
) )

View file

@ -14,27 +14,33 @@ from typing import Any, Dict, List, Optional, Tuple
import torch import torch
from llama_models.sku_list import resolve_model from llama_models.sku_list import resolve_model
from llama_stack.apis.common.training_types import PostTrainingMetric
from llama_stack.apis.datasetio import DatasetIO from llama_stack.apis.datasetio import DatasetIO
from llama_stack.apis.datasets import Datasets
from llama_stack.apis.post_training import (
AlgorithmConfig,
Checkpoint,
LoraFinetuningConfig,
OptimizerConfig,
TrainingConfig,
)
from llama_stack.distribution.utils.config_dirs import DEFAULT_CHECKPOINT_DIR from llama_stack.distribution.utils.config_dirs import DEFAULT_CHECKPOINT_DIR
from llama_stack.providers.inline.post_training.torchtune.common.checkpointer import (
TorchtuneCheckpointer,
)
from torch import nn
from torchtune import utils as torchtune_utils
from torchtune.training.metric_logging import DiskLogger
from tqdm import tqdm
from llama_stack.apis.post_training import * # noqa
from llama_stack.distribution.utils.model_utils import model_local_dir from llama_stack.distribution.utils.model_utils import model_local_dir
from llama_stack.providers.inline.post_training.torchtune.common import utils from llama_stack.providers.inline.post_training.torchtune.common import utils
from llama_stack.providers.inline.post_training.torchtune.common.checkpointer import (
TorchtuneCheckpointer,
)
from llama_stack.providers.inline.post_training.torchtune.config import ( from llama_stack.providers.inline.post_training.torchtune.config import (
TorchtunePostTrainingConfig, TorchtunePostTrainingConfig,
) )
from llama_stack.providers.inline.post_training.torchtune.datasets.sft import SFTDataset from llama_stack.providers.inline.post_training.torchtune.datasets.sft import SFTDataset
from torch import nn
from torch.optim import Optimizer from torch.optim import Optimizer
from torch.utils.data import DataLoader, DistributedSampler from torch.utils.data import DataLoader, DistributedSampler
from torchtune import modules, training from torchtune import modules, training, utils as torchtune_utils
from torchtune.data import AlpacaToMessages, padded_collate_sft from torchtune.data import AlpacaToMessages, padded_collate_sft
from torchtune.modules.loss import CEWithChunkedOutputLoss from torchtune.modules.loss import CEWithChunkedOutputLoss
@ -43,11 +49,12 @@ from torchtune.modules.peft import (
get_adapter_state_dict, get_adapter_state_dict,
get_lora_module_names, get_lora_module_names,
get_merged_lora_ckpt, get_merged_lora_ckpt,
load_dora_magnitudes,
set_trainable_params, set_trainable_params,
validate_missing_and_unexpected_for_lora, validate_missing_and_unexpected_for_lora,
) )
from torchtune.training.lr_schedulers import get_cosine_schedule_with_warmup from torchtune.training.lr_schedulers import get_cosine_schedule_with_warmup
from torchtune.training.metric_logging import DiskLogger
from tqdm import tqdm
log = logging.getLogger(__name__) log = logging.getLogger(__name__)
@ -110,6 +117,10 @@ class LoraFinetuningSingleDevice:
self.checkpoint_dir = config.checkpoint_dir self.checkpoint_dir = config.checkpoint_dir
else: else:
model = resolve_model(self.model_id) model = resolve_model(self.model_id)
if model is None:
raise ValueError(
f"{self.model_id} not found. Your model id should be in the llama models SKU list"
)
self.checkpoint_dir = model_checkpoint_dir(model) self.checkpoint_dir = model_checkpoint_dir(model)
self._output_dir = str(DEFAULT_CHECKPOINT_DIR) self._output_dir = str(DEFAULT_CHECKPOINT_DIR)
@ -277,7 +288,6 @@ class LoraFinetuningSingleDevice:
for m in model.modules(): for m in model.modules():
if hasattr(m, "initialize_dora_magnitude"): if hasattr(m, "initialize_dora_magnitude"):
m.initialize_dora_magnitude() m.initialize_dora_magnitude()
load_dora_magnitudes(model)
if lora_weights_state_dict: if lora_weights_state_dict:
lora_missing, lora_unexpected = model.load_state_dict( lora_missing, lora_unexpected = model.load_state_dict(
lora_weights_state_dict, strict=False lora_weights_state_dict, strict=False

View file

@ -7,8 +7,14 @@
import logging import logging
from typing import Any, Dict, List from typing import Any, Dict, List
from llama_stack.apis.safety import * # noqa: F403
from llama_stack.apis.inference import Message from llama_stack.apis.inference import Message
from llama_stack.apis.safety import (
RunShieldResponse,
Safety,
SafetyViolation,
ViolationLevel,
)
from llama_stack.apis.shields import Shield
from llama_stack.providers.utils.inference.prompt_adapter import ( from llama_stack.providers.utils.inference.prompt_adapter import (
interleaved_content_as_str, interleaved_content_as_str,
) )

View file

@ -9,10 +9,24 @@ import re
from string import Template from string import Template
from typing import Any, Dict, List, Optional from typing import Any, Dict, List, Optional
from llama_models.llama3.api.datatypes import * # noqa: F403 from llama_models.datatypes import CoreModelId
from llama_stack.apis.inference import * # noqa: F403 from llama_models.llama3.api.datatypes import Role
from llama_stack.apis.safety import * # noqa: F403
from llama_stack.apis.common.content_types import ImageContentItem, TextContentItem from llama_stack.apis.common.content_types import ImageContentItem, TextContentItem
from llama_stack.apis.inference import (
ChatCompletionResponseEventType,
Inference,
Message,
UserMessage,
)
from llama_stack.apis.safety import (
RunShieldResponse,
Safety,
SafetyViolation,
ViolationLevel,
)
from llama_stack.apis.shields import Shield
from llama_stack.distribution.datatypes import Api from llama_stack.distribution.datatypes import Api
from llama_stack.providers.datatypes import ShieldsProtocolPrivate from llama_stack.providers.datatypes import ShieldsProtocolPrivate

View file

@ -11,11 +11,16 @@ import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer from transformers import AutoModelForSequenceClassification, AutoTokenizer
from llama_stack.distribution.utils.model_utils import model_local_dir from llama_stack.apis.inference import Message
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.safety import (
from llama_stack.apis.safety import * # noqa: F403 RunShieldResponse,
from llama_models.llama3.api.datatypes import * # noqa: F403 Safety,
SafetyViolation,
ViolationLevel,
)
from llama_stack.apis.shields import Shield
from llama_stack.distribution.utils.model_utils import model_local_dir
from llama_stack.providers.datatypes import ShieldsProtocolPrivate from llama_stack.providers.datatypes import ShieldsProtocolPrivate
from llama_stack.providers.utils.inference.prompt_adapter import ( from llama_stack.providers.utils.inference.prompt_adapter import (
interleaved_content_as_str, interleaved_content_as_str,

View file

@ -3,14 +3,17 @@
# #
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import List from typing import Any, Dict, List, Optional
from llama_models.llama3.api.datatypes import * # noqa: F403 from llama_stack.apis.datasetio import DatasetIO
from llama_stack.apis.scoring import * # noqa: F403 from llama_stack.apis.datasets import Datasets
from llama_stack.apis.scoring_functions import * # noqa: F403 from llama_stack.apis.scoring import (
from llama_stack.apis.common.type_system import * # noqa: F403 ScoreBatchResponse,
from llama_stack.apis.datasetio import * # noqa: F403 ScoreResponse,
from llama_stack.apis.datasets import * # noqa: F403 Scoring,
ScoringResult,
)
from llama_stack.apis.scoring_functions import ScoringFn, ScoringFnParams
from llama_stack.providers.datatypes import ScoringFunctionsProtocolPrivate from llama_stack.providers.datatypes import ScoringFunctionsProtocolPrivate
from llama_stack.providers.utils.common.data_schema_validator_mixin import ( from llama_stack.providers.utils.common.data_schema_validator_mixin import (
DataSchemaValidatorMixin, DataSchemaValidatorMixin,

View file

@ -3,16 +3,8 @@
# #
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import List
from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_stack.apis.scoring import * # noqa: F403
from llama_stack.apis.scoring_functions import * # noqa: F403
from llama_stack.apis.common.type_system import * # noqa: F403
from llama_stack.apis.datasetio import * # noqa: F403
from llama_stack.apis.datasets import * # noqa: F403
import os import os
from typing import Any, Dict, List, Optional
from autoevals.llm import Factuality from autoevals.llm import Factuality
from autoevals.ragas import ( from autoevals.ragas import (
@ -27,6 +19,17 @@ from autoevals.ragas import (
) )
from pydantic import BaseModel from pydantic import BaseModel
from llama_stack.apis.datasetio import DatasetIO
from llama_stack.apis.datasets import Datasets
from llama_stack.apis.scoring import (
ScoreBatchResponse,
ScoreResponse,
Scoring,
ScoringResult,
ScoringResultRow,
)
from llama_stack.apis.scoring_functions import AggregationFunctionType, ScoringFn
from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.distribution.request_headers import NeedsRequestProviderData
from llama_stack.providers.datatypes import ScoringFunctionsProtocolPrivate from llama_stack.providers.datatypes import ScoringFunctionsProtocolPrivate
from llama_stack.providers.utils.common.data_schema_validator_mixin import ( from llama_stack.providers.utils.common.data_schema_validator_mixin import (

View file

@ -3,7 +3,9 @@
# #
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from llama_stack.apis.scoring import * # noqa: F401, F403 from typing import Any, Dict, Optional
from pydantic import BaseModel, Field
class BraintrustScoringConfig(BaseModel): class BraintrustScoringConfig(BaseModel):

View file

@ -17,6 +17,22 @@ from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.semconv.resource import ResourceAttributes from opentelemetry.semconv.resource import ResourceAttributes
from llama_stack.apis.telemetry import (
Event,
MetricEvent,
QueryCondition,
SpanEndPayload,
SpanStartPayload,
SpanStatus,
SpanWithStatus,
StructuredLogEvent,
Telemetry,
Trace,
UnstructuredLogEvent,
)
from llama_stack.distribution.datatypes import Api
from llama_stack.providers.inline.telemetry.meta_reference.console_span_processor import ( from llama_stack.providers.inline.telemetry.meta_reference.console_span_processor import (
ConsoleSpanProcessor, ConsoleSpanProcessor,
) )
@ -27,10 +43,6 @@ from llama_stack.providers.inline.telemetry.meta_reference.sqlite_span_processor
from llama_stack.providers.utils.telemetry.dataset_mixin import TelemetryDatasetMixin from llama_stack.providers.utils.telemetry.dataset_mixin import TelemetryDatasetMixin
from llama_stack.providers.utils.telemetry.sqlite_trace_store import SQLiteTraceStore from llama_stack.providers.utils.telemetry.sqlite_trace_store import SQLiteTraceStore
from llama_stack.apis.telemetry import * # noqa: F403
from llama_stack.distribution.datatypes import Api
from .config import TelemetryConfig, TelemetrySink from .config import TelemetryConfig, TelemetrySink
_GLOBAL_STORAGE = { _GLOBAL_STORAGE = {

View file

@ -4,12 +4,10 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from llama_stack.apis.telemetry import Telemetry
from .config import SampleConfig from .config import SampleConfig
from llama_stack.apis.telemetry import * # noqa: F403
class SampleTelemetryImpl(Telemetry): class SampleTelemetryImpl(Telemetry):
def __init__(self, config: SampleConfig): def __init__(self, config: SampleConfig):
self.config = config self.config = config

View file

@ -0,0 +1,20 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from pydantic import BaseModel
from .brave_search import BraveSearchToolRuntimeImpl
from .config import BraveSearchToolConfig
class BraveSearchToolProviderDataValidator(BaseModel):
api_key: str
async def get_provider_impl(config: BraveSearchToolConfig, _deps):
impl = BraveSearchToolRuntimeImpl(config)
await impl.initialize()
return impl

View file

@ -0,0 +1,123 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from typing import Any, Dict, List
import requests
from llama_stack.apis.tools import Tool, ToolGroupDef, ToolInvocationResult, ToolRuntime
from llama_stack.distribution.request_headers import NeedsRequestProviderData
from llama_stack.providers.datatypes import ToolsProtocolPrivate
from .config import BraveSearchToolConfig
class BraveSearchToolRuntimeImpl(
ToolsProtocolPrivate, ToolRuntime, NeedsRequestProviderData
):
def __init__(self, config: BraveSearchToolConfig):
self.config = config
async def initialize(self):
pass
async def register_tool(self, tool: Tool):
if tool.identifier != "brave_search":
raise ValueError(f"Tool identifier {tool.identifier} is not supported")
async def unregister_tool(self, tool_id: str) -> None:
return
def _get_api_key(self) -> str:
if self.config.api_key:
return self.config.api_key
provider_data = self.get_request_provider_data()
if provider_data is None or not provider_data.api_key:
raise ValueError(
'Pass Search provider\'s API Key in the header X-LlamaStack-ProviderData as { "api_key": <your api key>}'
)
return provider_data.api_key
async def discover_tools(self, tool_group: ToolGroupDef) -> List[Tool]:
raise NotImplementedError("Brave search tool group not supported")
async def invoke_tool(
self, tool_name: str, args: Dict[str, Any]
) -> ToolInvocationResult:
api_key = self._get_api_key()
url = "https://api.search.brave.com/res/v1/web/search"
headers = {
"X-Subscription-Token": api_key,
"Accept-Encoding": "gzip",
"Accept": "application/json",
}
payload = {"q": args["query"]}
response = requests.get(url=url, params=payload, headers=headers)
response.raise_for_status()
results = self._clean_brave_response(response.json())
content_items = "\n".join([str(result) for result in results])
return ToolInvocationResult(
content=content_items,
)
def _clean_brave_response(self, search_response):
clean_response = []
if "mixed" in search_response:
mixed_results = search_response["mixed"]
for m in mixed_results["main"][: self.config.max_results]:
r_type = m["type"]
results = search_response[r_type]["results"]
cleaned = self._clean_result_by_type(r_type, results, m.get("index"))
clean_response.append(cleaned)
return clean_response
def _clean_result_by_type(self, r_type, results, idx=None):
type_cleaners = {
"web": (
["type", "title", "url", "description", "date", "extra_snippets"],
lambda x: x[idx],
),
"faq": (["type", "question", "answer", "title", "url"], lambda x: x),
"infobox": (
["type", "title", "url", "description", "long_desc"],
lambda x: x[idx],
),
"videos": (["type", "url", "title", "description", "date"], lambda x: x),
"locations": (
[
"type",
"title",
"url",
"description",
"coordinates",
"postal_address",
"contact",
"rating",
"distance",
"zoom_level",
],
lambda x: x,
),
"news": (["type", "title", "url", "description"], lambda x: x),
}
if r_type not in type_cleaners:
return ""
selected_keys, result_selector = type_cleaners[r_type]
results = result_selector(results)
if isinstance(results, list):
cleaned = [
{k: v for k, v in item.items() if k in selected_keys}
for item in results
]
else:
cleaned = {k: v for k, v in results.items() if k in selected_keys}
return str(cleaned)

View file

@ -0,0 +1,20 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from typing import Optional
from pydantic import BaseModel, Field
class BraveSearchToolConfig(BaseModel):
api_key: Optional[str] = Field(
default=None,
description="The Brave Search API Key",
)
max_results: int = Field(
default=3,
description="The maximum number of results to return",
)

View file

@ -6,7 +6,13 @@
from typing import List from typing import List
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.providers.datatypes import (
AdapterSpec,
Api,
InlineProviderSpec,
ProviderSpec,
remote_provider_spec,
)
from llama_stack.providers.utils.kvstore import kvstore_dependencies from llama_stack.providers.utils.kvstore import kvstore_dependencies

View file

@ -6,7 +6,13 @@
from typing import List from typing import List
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.providers.datatypes import (
AdapterSpec,
Api,
InlineProviderSpec,
ProviderSpec,
remote_provider_spec,
)
def available_providers() -> List[ProviderSpec]: def available_providers() -> List[ProviderSpec]:

View file

@ -6,7 +6,7 @@
from typing import List from typing import List
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.providers.datatypes import Api, InlineProviderSpec, ProviderSpec
def available_providers() -> List[ProviderSpec]: def available_providers() -> List[ProviderSpec]:

View file

@ -6,8 +6,13 @@
from typing import List from typing import List
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.providers.datatypes import (
AdapterSpec,
Api,
InlineProviderSpec,
ProviderSpec,
remote_provider_spec,
)
META_REFERENCE_DEPS = [ META_REFERENCE_DEPS = [
"accelerate", "accelerate",

View file

@ -6,8 +6,13 @@
from typing import List from typing import List
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.providers.datatypes import (
AdapterSpec,
Api,
InlineProviderSpec,
ProviderSpec,
remote_provider_spec,
)
EMBEDDING_DEPS = [ EMBEDDING_DEPS = [
"blobfile", "blobfile",

View file

@ -6,7 +6,7 @@
from typing import List from typing import List
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.providers.datatypes import Api, InlineProviderSpec, ProviderSpec
def available_providers() -> List[ProviderSpec]: def available_providers() -> List[ProviderSpec]:

View file

@ -6,7 +6,7 @@
from typing import List from typing import List
from llama_stack.distribution.datatypes import ( from llama_stack.providers.datatypes import (
AdapterSpec, AdapterSpec,
Api, Api,
InlineProviderSpec, InlineProviderSpec,

View file

@ -6,7 +6,7 @@
from typing import List from typing import List
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.providers.datatypes import Api, InlineProviderSpec, ProviderSpec
def available_providers() -> List[ProviderSpec]: def available_providers() -> List[ProviderSpec]:

View file

@ -6,7 +6,13 @@
from typing import List from typing import List
from llama_stack.distribution.datatypes import * # noqa: F403 from llama_stack.providers.datatypes import (
AdapterSpec,
Api,
InlineProviderSpec,
ProviderSpec,
remote_provider_spec,
)
def available_providers() -> List[ProviderSpec]: def available_providers() -> List[ProviderSpec]:

View file

@ -0,0 +1,37 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from typing import List
from llama_stack.providers.datatypes import (
AdapterSpec,
Api,
InlineProviderSpec,
ProviderSpec,
remote_provider_spec,
)
def available_providers() -> List[ProviderSpec]:
return [
InlineProviderSpec(
api=Api.tool_runtime,
provider_type="inline::brave-search",
pip_packages=[],
module="llama_stack.providers.inline.tool_runtime.brave_search",
config_class="llama_stack.providers.inline.tool_runtime.brave_search.config.BraveSearchToolConfig",
provider_data_validator="llama_stack.providers.inline.tool_runtime.brave_search.BraveSearchToolProviderDataValidator",
),
remote_provider_spec(
api=Api.tool_runtime,
adapter=AdapterSpec(
adapter_type="model-context-protocol",
module="llama_stack.providers.remote.tool_runtime.model_context_protocol",
config_class="llama_stack.providers.remote.tool_runtime.model_context_protocol.config.ModelContextProtocolConfig",
pip_packages=["mcp"],
),
),
]

View file

@ -4,12 +4,10 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from llama_stack.apis.agents import Agents
from .config import SampleConfig from .config import SampleConfig
from llama_stack.apis.agents import * # noqa: F403
class SampleAgentsImpl(Agents): class SampleAgentsImpl(Agents):
def __init__(self, config: SampleConfig): def __init__(self, config: SampleConfig):
self.config = config self.config = config

View file

@ -5,11 +5,11 @@
# the root directory of this source tree. # the root directory of this source tree.
from typing import Any, Dict, List, Optional from typing import Any, Dict, List, Optional
from llama_stack.apis.datasetio import * # noqa: F403
import datasets as hf_datasets import datasets as hf_datasets
from llama_stack.apis.datasetio import DatasetIO, PaginatedRowsResult
from llama_stack.apis.datasets import Dataset
from llama_stack.providers.datatypes import DatasetsProtocolPrivate from llama_stack.providers.datatypes import DatasetsProtocolPrivate
from llama_stack.providers.utils.datasetio.url_utils import get_dataframe_from_url from llama_stack.providers.utils.datasetio.url_utils import get_dataframe_from_url
from llama_stack.providers.utils.kvstore import kvstore_impl from llama_stack.providers.utils.kvstore import kvstore_impl

View file

@ -4,8 +4,8 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import * # noqa: F403
import json import json
from typing import AsyncGenerator, AsyncIterator, Dict, List, Optional, Union
from botocore.client import BaseClient from botocore.client import BaseClient
from llama_models.datatypes import CoreModelId from llama_models.datatypes import CoreModelId
@ -13,6 +13,24 @@ from llama_models.llama3.api.chat_format import ChatFormat
from llama_models.llama3.api.tokenizer import Tokenizer from llama_models.llama3.api.tokenizer import Tokenizer
from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.inference import (
ChatCompletionRequest,
ChatCompletionResponse,
ChatCompletionResponseStreamChunk,
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.providers.remote.inference.bedrock.config import BedrockConfig
from llama_stack.providers.utils.bedrock.client import create_bedrock_client
from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.model_registry import (
build_model_alias, build_model_alias,
ModelRegistryHelper, ModelRegistryHelper,
@ -29,11 +47,6 @@ from llama_stack.providers.utils.inference.prompt_adapter import (
interleaved_content_as_str, interleaved_content_as_str,
) )
from llama_stack.apis.inference import * # noqa: F403
from llama_stack.providers.remote.inference.bedrock.config import BedrockConfig
from llama_stack.providers.utils.bedrock.client import create_bedrock_client
MODEL_ALIASES = [ MODEL_ALIASES = [
build_model_alias( build_model_alias(

View file

@ -4,17 +4,31 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import AsyncGenerator from typing import AsyncGenerator, List, Optional, Union
from cerebras.cloud.sdk import AsyncCerebras from cerebras.cloud.sdk import AsyncCerebras
from llama_models.datatypes import CoreModelId
from llama_models.llama3.api.chat_format import ChatFormat from llama_models.llama3.api.chat_format import ChatFormat
from llama_models.llama3.api.tokenizer import Tokenizer from llama_models.llama3.api.tokenizer import Tokenizer
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.inference import (
from llama_models.datatypes import CoreModelId ChatCompletionRequest,
CompletionRequest,
CompletionResponse,
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.model_registry import (
build_model_alias, build_model_alias,

View file

@ -4,7 +4,7 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import AsyncGenerator from typing import AsyncGenerator, List, Optional
from llama_models.datatypes import CoreModelId from llama_models.datatypes import CoreModelId
@ -14,7 +14,20 @@ from llama_models.llama3.api.tokenizer import Tokenizer
from openai import OpenAI from openai import OpenAI
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.inference import (
ChatCompletionRequest,
ChatCompletionResponse,
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.model_registry import (
build_model_alias, build_model_alias,

View file

@ -22,7 +22,7 @@ class FireworksImplConfig(BaseModel):
) )
@classmethod @classmethod
def sample_run_config(cls) -> Dict[str, Any]: def sample_run_config(cls, __distro_dir__: str) -> Dict[str, Any]:
return { return {
"url": "https://api.fireworks.ai/inference/v1", "url": "https://api.fireworks.ai/inference/v1",
"api_key": "${env.FIREWORKS_API_KEY}", "api_key": "${env.FIREWORKS_API_KEY}",

View file

@ -11,7 +11,24 @@ from llama_models.datatypes import CoreModelId
from llama_models.llama3.api.chat_format import ChatFormat from llama_models.llama3.api.chat_format import ChatFormat
from llama_models.llama3.api.tokenizer import Tokenizer from llama_models.llama3.api.tokenizer import Tokenizer
from llama_stack.apis.inference import * # noqa: F403
from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.inference import (
ChatCompletionRequest,
ChatCompletionResponse,
CompletionRequest,
CompletionResponse,
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
ResponseFormatType,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.distribution.request_headers import NeedsRequestProviderData
from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.model_registry import (
build_model_alias, build_model_alias,
@ -65,6 +82,10 @@ MODEL_ALIASES = [
"fireworks/llama-v3p2-90b-vision-instruct", "fireworks/llama-v3p2-90b-vision-instruct",
CoreModelId.llama3_2_90b_vision_instruct.value, CoreModelId.llama3_2_90b_vision_instruct.value,
), ),
build_model_alias(
"fireworks/llama-v3p3-70b-instruct",
CoreModelId.llama3_3_70b_instruct.value,
),
build_model_alias( build_model_alias(
"fireworks/llama-guard-3-8b", "fireworks/llama-guard-3-8b",
CoreModelId.llama_guard_3_8b.value, CoreModelId.llama_guard_3_8b.value,

View file

@ -10,9 +10,7 @@ from typing import Any, AsyncGenerator, Dict, Generator, List, Optional
from llama_models.llama3.api.datatypes import ( from llama_models.llama3.api.datatypes import (
BuiltinTool, BuiltinTool,
CompletionMessage,
StopReason, StopReason,
TokenLogProbs,
ToolCall, ToolCall,
ToolDefinition, ToolDefinition,
) )
@ -42,12 +40,14 @@ from llama_stack.apis.inference import (
ChatCompletionResponseEvent, ChatCompletionResponseEvent,
ChatCompletionResponseEventType, ChatCompletionResponseEventType,
ChatCompletionResponseStreamChunk, ChatCompletionResponseStreamChunk,
CompletionMessage,
CompletionRequest, CompletionRequest,
CompletionResponse, CompletionResponse,
CompletionResponseStreamChunk, CompletionResponseStreamChunk,
JsonSchemaResponseFormat, JsonSchemaResponseFormat,
Message, Message,
SystemMessage, SystemMessage,
TokenLogProbs,
ToolCallDelta, ToolCallDelta,
ToolCallParseStatus, ToolCallParseStatus,
ToolResponseMessage, ToolResponseMessage,

View file

@ -5,7 +5,7 @@
# the root directory of this source tree. # the root directory of this source tree.
import logging import logging
from typing import AsyncGenerator from typing import AsyncGenerator, List, Optional, Union
import httpx import httpx
from llama_models.datatypes import CoreModelId from llama_models.datatypes import CoreModelId
@ -14,15 +14,33 @@ from llama_models.llama3.api.chat_format import ChatFormat
from llama_models.llama3.api.tokenizer import Tokenizer from llama_models.llama3.api.tokenizer import Tokenizer
from ollama import AsyncClient from ollama import AsyncClient
from llama_stack.apis.common.content_types import (
ImageContentItem,
InterleavedContent,
TextContentItem,
)
from llama_stack.apis.inference import (
ChatCompletionRequest,
ChatCompletionResponse,
CompletionRequest,
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.apis.models import Model, ModelType
from llama_stack.providers.datatypes import ModelsProtocolPrivate
from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.model_registry import (
build_model_alias, build_model_alias,
build_model_alias_with_just_provider_model_id, build_model_alias_with_just_provider_model_id,
ModelRegistryHelper, ModelRegistryHelper,
) )
from llama_stack.apis.inference import * # noqa: F403
from llama_stack.apis.common.content_types import ImageContentItem, TextContentItem
from llama_stack.providers.datatypes import ModelsProtocolPrivate
from llama_stack.providers.utils.inference.openai_compat import ( from llama_stack.providers.utils.inference.openai_compat import (
get_sampling_options, get_sampling_options,
OpenAICompatCompletionChoice, OpenAICompatCompletionChoice,
@ -100,6 +118,10 @@ model_aliases = [
"llama3.2-vision:90b", "llama3.2-vision:90b",
CoreModelId.llama3_2_90b_vision_instruct.value, CoreModelId.llama3_2_90b_vision_instruct.value,
), ),
build_model_alias(
"llama3.3:70b",
CoreModelId.llama3_3_70b_instruct.value,
),
# The Llama Guard models don't have their full fp16 versions # The Llama Guard models don't have their full fp16 versions
# so we are going to alias their default version to the canonical SKU # so we are going to alias their default version to the canonical SKU
build_model_alias( build_model_alias(

View file

@ -4,12 +4,11 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from llama_stack.apis.inference import Inference
from llama_stack.apis.models import Model
from .config import SampleConfig from .config import SampleConfig
from llama_stack.apis.inference import * # noqa: F403
class SampleInferenceImpl(Inference): class SampleInferenceImpl(Inference):
def __init__(self, config: SampleConfig): def __init__(self, config: SampleConfig):
self.config = config self.config = config

View file

@ -13,10 +13,25 @@ from llama_models.llama3.api.chat_format import ChatFormat
from llama_models.llama3.api.tokenizer import Tokenizer from llama_models.llama3.api.tokenizer import Tokenizer
from llama_models.sku_list import all_registered_models from llama_models.sku_list import all_registered_models
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.models import * # noqa: F403 from llama_stack.apis.inference import (
ChatCompletionRequest,
ChatCompletionResponse,
CompletionRequest,
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
ResponseFormatType,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.apis.models import Model
from llama_stack.providers.datatypes import Model, ModelsProtocolPrivate from llama_stack.providers.datatypes import ModelsProtocolPrivate
from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.model_registry import (
build_model_alias, build_model_alias,
ModelRegistryHelper, ModelRegistryHelper,

View file

@ -4,7 +4,7 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from typing import AsyncGenerator from typing import AsyncGenerator, List, Optional, Union
from llama_models.datatypes import CoreModelId from llama_models.datatypes import CoreModelId
@ -14,7 +14,22 @@ from llama_models.llama3.api.tokenizer import Tokenizer
from together import Together from together import Together
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.inference import (
ChatCompletionRequest,
ChatCompletionResponse,
CompletionRequest,
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
ResponseFormatType,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.distribution.request_headers import NeedsRequestProviderData
from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.model_registry import (
build_model_alias, build_model_alias,

View file

@ -5,7 +5,7 @@
# the root directory of this source tree. # the root directory of this source tree.
import logging import logging
from typing import AsyncGenerator from typing import AsyncGenerator, List, Optional, Union
from llama_models.llama3.api.chat_format import ChatFormat from llama_models.llama3.api.chat_format import ChatFormat
from llama_models.llama3.api.tokenizer import Tokenizer from llama_models.llama3.api.tokenizer import Tokenizer
@ -13,7 +13,25 @@ from llama_models.sku_list import all_registered_models
from openai import OpenAI from openai import OpenAI
from llama_stack.apis.inference import * # noqa: F403 from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.inference import (
ChatCompletionRequest,
ChatCompletionResponse,
CompletionRequest,
CompletionResponse,
CompletionResponseStreamChunk,
EmbeddingsResponse,
Inference,
LogProbConfig,
Message,
ResponseFormat,
ResponseFormatType,
SamplingParams,
ToolChoice,
ToolDefinition,
ToolPromptFormat,
)
from llama_stack.apis.models import Model, ModelType
from llama_stack.providers.datatypes import ModelsProtocolPrivate from llama_stack.providers.datatypes import ModelsProtocolPrivate
from llama_stack.providers.utils.inference.model_registry import ( from llama_stack.providers.utils.inference.model_registry import (

View file

@ -12,8 +12,14 @@ from urllib.parse import urlparse
import chromadb import chromadb
from numpy.typing import NDArray from numpy.typing import NDArray
from llama_stack.apis.memory import * # noqa: F403 from llama_stack.apis.inference import InterleavedContent
from llama_stack.apis.memory_banks import MemoryBankType from llama_stack.apis.memory import (
Chunk,
Memory,
MemoryBankDocument,
QueryDocumentsResponse,
)
from llama_stack.apis.memory_banks import MemoryBank, MemoryBankType
from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate
from llama_stack.providers.inline.memory.chroma import ChromaInlineImplConfig from llama_stack.providers.inline.memory.chroma import ChromaInlineImplConfig
from llama_stack.providers.utils.memory.vector_store import ( from llama_stack.providers.utils.memory.vector_store import (

View file

@ -5,7 +5,7 @@
# the root directory of this source tree. # the root directory of this source tree.
import logging import logging
from typing import List, Tuple from typing import Any, Dict, List, Optional, Tuple
import psycopg2 import psycopg2
from numpy.typing import NDArray from numpy.typing import NDArray
@ -14,8 +14,14 @@ from psycopg2.extras import execute_values, Json
from pydantic import BaseModel, parse_obj_as from pydantic import BaseModel, parse_obj_as
from llama_stack.apis.memory import * # noqa: F403 from llama_stack.apis.inference import InterleavedContent
from llama_stack.apis.memory_banks import MemoryBankType, VectorMemoryBank from llama_stack.apis.memory import (
Chunk,
Memory,
MemoryBankDocument,
QueryDocumentsResponse,
)
from llama_stack.apis.memory_banks import MemoryBank, MemoryBankType, VectorMemoryBank
from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate
from llama_stack.providers.utils.memory.vector_store import ( from llama_stack.providers.utils.memory.vector_store import (

View file

@ -6,16 +6,21 @@
import logging import logging
import uuid import uuid
from typing import Any, Dict, List from typing import Any, Dict, List, Optional
from numpy.typing import NDArray from numpy.typing import NDArray
from qdrant_client import AsyncQdrantClient, models from qdrant_client import AsyncQdrantClient, models
from qdrant_client.models import PointStruct from qdrant_client.models import PointStruct
from llama_stack.apis.memory_banks import * # noqa: F403 from llama_stack.apis.inference import InterleavedContent
from llama_stack.apis.memory import (
Chunk,
Memory,
MemoryBankDocument,
QueryDocumentsResponse,
)
from llama_stack.apis.memory_banks import MemoryBank, MemoryBankType
from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate
from llama_stack.apis.memory import * # noqa: F403
from llama_stack.providers.remote.memory.qdrant.config import QdrantConfig from llama_stack.providers.remote.memory.qdrant.config import QdrantConfig
from llama_stack.providers.utils.memory.vector_store import ( from llama_stack.providers.utils.memory.vector_store import (
BankWithIndex, BankWithIndex,

View file

@ -4,12 +4,11 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from llama_stack.apis.memory import Memory
from llama_stack.apis.memory_banks import MemoryBank
from .config import SampleConfig from .config import SampleConfig
from llama_stack.apis.memory import * # noqa: F403
class SampleMemoryImpl(Memory): class SampleMemoryImpl(Memory):
def __init__(self, config: SampleConfig): def __init__(self, config: SampleConfig):
self.config = config self.config = config

View file

@ -14,8 +14,14 @@ from numpy.typing import NDArray
from weaviate.classes.init import Auth from weaviate.classes.init import Auth
from weaviate.classes.query import Filter from weaviate.classes.query import Filter
from llama_stack.apis.memory import * # noqa: F403 from llama_stack.apis.common.content_types import InterleavedContent
from llama_stack.apis.memory_banks import MemoryBankType from llama_stack.apis.memory import (
Chunk,
Memory,
MemoryBankDocument,
QueryDocumentsResponse,
)
from llama_stack.apis.memory_banks import MemoryBank, MemoryBankType
from llama_stack.distribution.request_headers import NeedsRequestProviderData from llama_stack.distribution.request_headers import NeedsRequestProviderData
from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate from llama_stack.providers.datatypes import Api, MemoryBanksProtocolPrivate
from llama_stack.providers.utils.memory.vector_store import ( from llama_stack.providers.utils.memory.vector_store import (

View file

@ -9,8 +9,15 @@ import logging
from typing import Any, Dict, List from typing import Any, Dict, List
from llama_stack.apis.safety import * # noqa from llama_stack.apis.inference import Message
from llama_models.llama3.api.datatypes import * # noqa: F403
from llama_stack.apis.safety import (
RunShieldResponse,
Safety,
SafetyViolation,
ViolationLevel,
)
from llama_stack.apis.shields import Shield
from llama_stack.providers.datatypes import ShieldsProtocolPrivate from llama_stack.providers.datatypes import ShieldsProtocolPrivate
from llama_stack.providers.utils.bedrock.client import create_bedrock_client from llama_stack.providers.utils.bedrock.client import create_bedrock_client

View file

@ -4,12 +4,11 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
from llama_stack.apis.safety import Safety
from llama_stack.apis.shields import Shield
from .config import SampleConfig from .config import SampleConfig
from llama_stack.apis.safety import * # noqa: F403
class SampleSafetyImpl(Safety): class SampleSafetyImpl(Safety):
def __init__(self, config: SampleConfig): def __init__(self, config: SampleConfig):
self.config = config self.config = config

View file

@ -0,0 +1,21 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from pydantic import BaseModel
from .config import ModelContextProtocolConfig
from .model_context_protocol import ModelContextProtocolToolRuntimeImpl
class ModelContextProtocolToolProviderDataValidator(BaseModel):
api_key: str
async def get_adapter_impl(config: ModelContextProtocolConfig, _deps):
impl = ModelContextProtocolToolRuntimeImpl(config)
await impl.initialize()
return impl

View file

@ -0,0 +1,11 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from pydantic import BaseModel
class ModelContextProtocolConfig(BaseModel):
pass

View file

@ -0,0 +1,84 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
from typing import Any, Dict, List
from urllib.parse import urlparse
from llama_stack.apis.tools import (
MCPToolGroupDef,
ToolDef,
ToolGroupDef,
ToolInvocationResult,
ToolParameter,
ToolRuntime,
)
from llama_stack.providers.datatypes import ToolsProtocolPrivate
from mcp import ClientSession
from mcp.client.sse import sse_client
from .config import ModelContextProtocolConfig
class ModelContextProtocolToolRuntimeImpl(ToolsProtocolPrivate, ToolRuntime):
def __init__(self, config: ModelContextProtocolConfig):
self.config = config
async def initialize(self):
pass
async def discover_tools(self, tool_group: ToolGroupDef) -> List[ToolDef]:
if not isinstance(tool_group, MCPToolGroupDef):
raise ValueError(f"Unsupported tool group type: {type(tool_group)}")
tools = []
async with sse_client(tool_group.endpoint.uri) as streams:
async with ClientSession(*streams) as session:
await session.initialize()
tools_result = await session.list_tools()
for tool in tools_result.tools:
parameters = []
for param_name, param_schema in tool.inputSchema.get(
"properties", {}
).items():
parameters.append(
ToolParameter(
name=param_name,
parameter_type=param_schema.get("type", "string"),
description=param_schema.get("description", ""),
)
)
tools.append(
ToolDef(
name=tool.name,
description=tool.description,
parameters=parameters,
metadata={
"endpoint": tool_group.endpoint.uri,
},
)
)
return tools
async def invoke_tool(
self, tool_name: str, args: Dict[str, Any]
) -> ToolInvocationResult:
tool = await self.tool_store.get_tool(tool_name)
if tool.metadata is None or tool.metadata.get("endpoint") is None:
raise ValueError(f"Tool {tool_name} does not have metadata")
endpoint = tool.metadata.get("endpoint")
if urlparse(endpoint).scheme not in ("http", "https"):
raise ValueError(f"Endpoint {endpoint} is not a valid HTTP(S) URL")
async with sse_client(endpoint) as streams:
async with ClientSession(*streams) as session:
await session.initialize()
result = await session.call_tool(tool.identifier, args)
return ToolInvocationResult(
content="\n".join([result.model_dump_json() for result in result.content]),
error_code=1 if result.isError else 0,
)

Some files were not shown because too many files have changed in this diff Show more