# What does this PR do?
dropped python3.10, updated pyproject and dependencies, and also removed
some blocks of code with special handling for enum.StrEnum
Closes#2458
Signed-off-by: Charlie Doern <cdoern@redhat.com>
This allows a set of rules to be defined for determining access to
resources. The rules are (loosely) based on the cedar policy format.
A rule defines a list of action either to permit or to forbid. It may
specify a principal or a resource that must match for the rule to take
effect. It may also specify a condition, either a 'when' or an 'unless',
with additional constraints as to where the rule applies.
A list of rules is held for each type to be protected and tried in order
to find a match. If a match is found, the request is permitted or
forbidden depening on the type of rule. If no match is found, the
request is denied. If no rules are specified for a given type, a rule
that allows any action as long as the resource attributes match the user
attributes is added (i.e. the previous behaviour is the default.
Some examples in yaml:
```
model:
- permit:
principal: user-1
actions: [create, read, delete]
comment: user-1 has full access to all models
- permit:
principal: user-2
actions: [read]
resource: model-1
comment: user-2 has read access to model-1 only
- permit:
actions: [read]
when:
user_in: resource.namespaces
comment: any user has read access to models with matching attributes
vector_db:
- forbid:
actions: [create, read, delete]
unless:
user_in: role::admin
comment: only user with admin role can use vector_db resources
```
---------
Signed-off-by: Gordon Sim <gsim@redhat.com>
# What does this PR do?
The goal of this PR is code base modernization.
Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)
Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
Updated all instances of datetime.now() to use timezone.utc for
consistency in handling time across different systems. This ensures that
timestamps are always in Coordinated Universal Time (UTC), avoiding
issues with time zone discrepancies and promoting uniformity in
time-related data.
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
We currently use `max_infer_iters` in 2 different ways
1/ Server: track number of times
2/ Client side: track number of times we send `resume_turn` request
This PR gets rid of the need of (2) and makes server track total number
of times we perform inference within a Turn
**NOTE**
The PR will assume StopReason is set to
- end_of_message: turn is not finished, we could be waiting for client
tool call responses
- end_of_turn: if the entire turn is finished and there's no more things
to be done.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py::test_custom_tool_infinite_loop --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```
[//]: # (## Documentation)
# Problem
Our current Agent framework has discrepancies in definition on how we
handle server side and client side tools.
1. Server Tools: a single Turn is returned including `ToolExecutionStep`
in agenst
2. Client Tools: `create_agent_turn` is called in loop with client agent
lib yielding the agent chunk
ad6ffc63df/src/llama_stack_client/lib/agents/agent.py (L186-L211)
This makes it inconsistent to work with server & client tools. It also
complicates the logs to telemetry to get information about agents turn /
history for observability.
#### Principle
The same `turn_id` should be used to represent the steps required to
complete a user message including client tools.
## Solution
1. `AgentTurnResponseEventType.turn_awaiting_input` status to indicate
that the current turn is not completed, and awaiting tool input
2. `continue_agent_turn` endpoint to update agent turn with client's
tool response.
# What does this PR do?
- Skeleton API as example
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
- Just API update, no functionality change
```
llama stack run + client-sdk test
```
<img width="842" alt="image"
src="https://github.com/user-attachments/assets/7ac56b5f-f424-4632-9476-7e0f57555bc3"
/>
[//]: # (## Documentation)
# What does this PR do?
PR #639 introduced the notion of Tools API and ability to invoke tools
through API just as any resource. This PR changes the Agents to start
using the Tools API to invoke tools. Major changes include:
1) Ability to specify tool groups with AgentConfig
2) Agent gets the corresponding tool definitions for the specified tools
and pass along to the model
3) Attachements are now named as Documents and their behavior is mostly
unchanged from user perspective
4) You can specify args that can be injected to a tool call through
Agent config. This is especially useful in case of memory tool, where
you want the tool to operate on a specific memory bank.
5) You can also register tool groups with args, which lets the agent
inject these as well into the tool call.
6) All tests have been migrated to use new tools API and fixtures
including client SDK tests
7) Telemetry just works with tools API because of our trace protocol
decorator
## Test Plan
```
pytest -s -v -k fireworks llama_stack/providers/tests/agents/test_agents.py \
--safety-shield=meta-llama/Llama-Guard-3-8B \
--inference-model=meta-llama/Llama-3.1-8B-Instruct
pytest -s -v -k together llama_stack/providers/tests/tools/test_tools.py \
--safety-shield=meta-llama/Llama-Guard-3-8B \
--inference-model=meta-llama/Llama-3.1-8B-Instruct
LLAMA_STACK_CONFIG="/Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml" pytest -v tests/client-sdk/agents/test_agents.py
```
run.yaml:
https://gist.github.com/dineshyv/0365845ad325e1c2cab755788ccc5994
Notebook:
https://colab.research.google.com/drive/1ck7hXQxRl6UvT-ijNRZ-gMZxH1G3cN2d?usp=sharing
# What does this PR do?
This PR fixes some of the issues with our telemetry setup to enable logs
to be delivered to opentelemetry and jaeger. Main fixes
1) Updates the open telemetry provider to use the latest oltp exports
instead of deprected ones.
2) Adds a tracing middleware, which injects traces into each HTTP
request that the server recieves and this is going to be the root trace.
Previously, we did this in the create_dynamic_route method, which is
actually not the actual exectuion flow, but more of a config and this
causes the traces to end prematurely. Through middleware, we plugin the
trace start and end at the right location.
3) We manage our own methods to create traces and spans and this does
not fit well with Opentelemetry SDK since it does not support provide a
way to take in traces and spans that are already created. it expects us
to use the SDK to create them. For now, I have a hacky approach of just
maintaining a map from our internal telemetry objects to the open
telemetry specfic ones. This is not the ideal solution. I will explore
other ways to get around this issue. for now, to have something that
works, i am going to keep this as is.
Addresses: #509
# What does this PR do?
This PR moves all print statements to use logging. Things changed:
- Had to add `await start_trace("sse_generator")` to server.py to
actually get tracing working. else was not seeing any logs
- If no telemetry provider is provided in the run.yaml, we will write to
stdout
- by default, the logs are going to be in JSON, but we expose an option
to configure to output in a human readable way.
# What does this PR do?
In short, provide a summary of what this PR does and why. Usually, the
relevant context should be present in a linked issue.
- [This PR solves the issue where agents cannot keep track of
instructions after executing the first turn because system instructions
were not getting appended in the messages list. It also solves the issue
where turns are not being fetched in the appropriate sequence.]
Addresses issue (#issue)
## Test Plan
Please describe:
- I have a file which has a precise prompt which requires more than one
turn to be executed will share the file below. I ran that file as a
python script to make sure that the turns are being executed as per the
instructions after making the code change
```
import asyncio
from typing import List, Optional, Dict
from llama_stack_client import LlamaStackClient
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.types import SamplingParams, UserMessage
from llama_stack_client.types.agent_create_params import AgentConfig
LLAMA_STACK_API_TOGETHER_URL="http://10.12.79.177:5001"
class Agent:
def __init__(self):
self.client = LlamaStackClient(
base_url=LLAMA_STACK_API_TOGETHER_URL,
)
def create_agent(self, agent_config: AgentConfig):
agent = self.client.agents.create(
agent_config=agent_config,
)
self.agent_id = agent.agent_id
session = self.client.agents.session.create(
agent_id=agent.agent_id,
session_name="example_session",
)
self.session_id = session.session_id
async def execute_turn(self, content: str):
response = self.client.agents.turn.create(
agent_id=self.agent_id,
session_id=self.session_id,
messages=[
UserMessage(content=content, role="user"),
],
stream=True,
)
for chunk in response:
if chunk.event.payload.event_type != "turn_complete":
yield chunk
async def run_main():
system_prompt="""You are an AI Agent tasked with Capturing Book Renting Information for a Library.
You will politely gather the book and user details one step at a time to send over the book to the user. Here’s how to proceed:
1. Data Security: Inform the user that their data will be kept secure.
2. Optional Participation: Let them know they are not required to share details but that doing so will help them learn about the books offered.
3. Sequential Information Capture: Follow the steps below, one question at a time. Do not skip or combine questions.
Steps
Step 1: Politely ask to provide the name of the book.
Step 2: Ask for the name of the author.
Step 3: Ask for the Author's country.
Step 4: Ask for the year of publication.
Step 5: If any information is missing or seems incorrect, ask the user to re-enter that specific detail.
Step 6: Confirm that the user consents to share the entered information.
Step 7: Thank the user for providing the details and let them know they will receive an email about the book.
Do not do any validation of the user entered information.
Do not print the Steps or your internal thoughts in the response.
Do not print the prompts or data structure object in the response
Do not fill in the requested user data on your own. It has to be entered by the user only.
Finally, compile and print the user-provided information as a JSON object in your response.
"""
agent_config = AgentConfig(
model="Llama3.2-11B-Vision-Instruct",
instructions=system_prompt,
enable_session_persistence=True,
)
agent = Agent()
agent.create_agent(agent_config)
print("Agent and Session:", agent.agent_id, agent.session_id)
while True:
query = input("Enter your query (or type 'exit' to quit): ")
if query.lower() == "exit":
print("Exiting the loop.")
break
else:
prompt = query
print(f"User> {prompt}")
response = agent.execute_turn(content=prompt)
async for log in EventLogger().log(response):
if log is not None:
log.print()
if __name__ == "__main__":
asyncio.run(run_main())
```
Below is a screenshot of the results of the first commit
<img width="1770" alt="Screenshot 2024-11-13 at 3 15 29 PM"
src="https://github.com/user-attachments/assets/1a7a090d-fc92-49cc-a786-bfc812e3d9cc">
Below is a screenshot of the results of the second commit
<img width="1792" alt="Screenshot 2024-11-13 at 6 40 56 PM"
src="https://github.com/user-attachments/assets/a9474f75-cd8c-4d49-82cd-5ff81ff12b07">
Also a screenshot of print statement to show that the turns being
fetched now are in a sequence
<img width="1783" alt="Screenshot 2024-11-13 at 6 42 22 PM"
src="https://github.com/user-attachments/assets/b906404e-a3e4-48a2-b893-69f36bbdcb98">
## Sources
Please link relevant resources if necessary.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.