Commit graph

59 commits

Author SHA1 Message Date
slekkala1
1ac320b7e6
chore: remove dead code (#3729)
# What does this PR do?
Removing some dead code, found by vulture and checked by claude that
there are no references or imports for these


## Test Plan
CI
2025-10-07 20:26:02 -07:00
slekkala1
c2d97a9db9
chore: fix flaky unit test and add proper shutdown for file batches (#3725)
# What does this PR do?
Have been running into flaky unit test failures:
5217035494
Fixing below
1. Shutting down properly by cancelling any stale file batches tasks
running in background.
2. Also, use unique_kvstore_config, so the test dont use same db path
and maintain test isolation
## Test Plan
Ran unit test locally and CI
2025-10-07 14:23:14 -07:00
Francisco Arceo
d5b136ac66
feat: Enabling Annotations in Responses (#3698)
# What does this PR do?
Implements annotations for `file_search` tool.

Also adds some logs and tests.

## How does this work? 
1. **Citation Markers**: Models insert `<|file-id|>` tokens during
generation with instructions from search results
2. **Post-Processing**: Extract markers using regex to calculate
character positions and create `AnnotationFileCitation` objects
3. **File Mapping**: Store filename metadata during vector store
operations for proper citation display

## Example 
This is the updated `quickstart.py` script, which uses the `extra_body`
to register the embedding model.

```python
import io, requests
from openai import OpenAI

url="https://www.paulgraham.com/greatwork.html"
model = "gpt-4o-mini"
client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")

vs = client.vector_stores.create(
    name="my_citations_db",
    extra_body={
        "embedding_model": "ollama/nomic-embed-text:latest",
        "embedding_dimension": 768,
    }
)
response = requests.get(url)
pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
file_id = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants").id
client.vector_stores.files.create(vector_store_id=vs.id, file_id=file_id)

resp = client.responses.create(
    model=model,
    input="How do you do great work? Use our existing knowledge_search tool.",
    tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
    include=["file_search_call.results"],
)

print(resp)
```

<details>
<summary> Example of the full response </summary>

```python
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/files "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores/vs_0f6f7e35-f48b-4850-8604-8117d9a50e0a/files "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"
Response(id='resp-28f5793d-3272-4de3-81f6-8cbf107d5bcd', created_at=1759797954.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-4o-mini', object='response', output=[ResponseFileSearchToolCall(id='call_xWtvEQETN5GNiRLLiBIDKntg', queries=['how to do great work tips'], status='completed', type='file_search_call', results=[Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.3722624322210302, text='\\\'re looking where few have looked before.<br /><br />One sign that you\\\'re suited for some kind of work is when you like\\neven the parts that other people find tedious or frightening.<br /><br />But fields aren\\\'t people; you don\\\'t owe them any loyalty. If in the\\ncourse of working on one thing you discover another that\\\'s more\\nexciting, don\\\'t be afraid to switch.<br /><br />If you\\\'re making something for people, make sure it\\\'s something\\nthey actually want. The best way to do this is to make something\\nyou yourself want. Write the story you want to read; build the tool\\nyou want to use. Since your friends probably have similar interests,\\nthis will also get you your initial audience.<br /><br />This <i>should</i> follow from the excitingness rule. Obviously the most\\nexciting story to write will be the one you want to read. The reason\\nI mention this case explicitly is that so many people get it wrong.\\nInstead of making what they want, they try to make what some\\nimaginary, more sophisticated audience wants. And once you go down\\nthat route, you\\\'re lost.\\n<font color=#dddddd>[<a href="#f6n"><font color=#dddddd>6</font></a>]</font><br /><br />There are a lot of forces that will lead you astray when you\\\'re\\ntrying to figure out what to work on. Pretentiousness, fashion,\\nfear, money, politics, other people\\\'s wishes, eminent frauds. But\\nif you stick to what you find genuinely interesting, you\\\'ll be proof\\nagainst all of them. If you\\\'re interested, you\\\'re not astray.<br /><br /><br /><br /><br /><br />\\nFollowing your interests may sound like a rather passive strategy,\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.2532794607643494, text=' with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good colleagues\\noffer <i>surprising</i> insights. They can see and do things that you\\ncan\\\'t. So if you have a handful of colleagues good enough to keep\\nyou on your toes in this sense, you\\\'re probably over the threshold.<br /><br />Most of us can benefit from collaborating with colleagues, but some\\nprojects require people on a larger scale, and starting one of those\\nis not for everyone. If you want to run a project like that, you\\\'ll\\nhave to become a manager, and managing well takes aptitude and\\ninterest like any other kind of work. If you don\\\'t have them, there\\nis no middle path: you must either force yourself to learn management\\nas a second language, or avoid such projects.\\n<font color=#dddddd>[<a href="#f27n"><font color=#dddddd>27</font></a>]</font><br /><br /><br /><br /><br /><br />\\nHusband your morale. It\\\'s the basis of everything when you\\\'re working\\non ambitious projects. You have to nurture and protect it like a\\nliving organism.<br /><br />Morale starts with your view of life. You\\\'re more likely to do great\\nwork if you\\\'re an optimist, and more likely to if you think of\\nyourself as lucky than if you think of yourself as a victim.<br /><br />Indeed, work can to some extent protect you from your problems. If\\nyou choose work that\\\'s pure, its very difficulties will serve as a\\nrefuge from the difficulties of everyday life. If this is escapism,\\nit\\\'s a very productive form of it, and one that has been used by\\nsome of the greatest minds in history.<br /><br />Morale compounds via work: high morale helps you do good work, which\\nincreases your morale and helps you do even'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1973485818164222, text=' your\\nability and interest can take you. And you can only answer that by\\ntrying.<br /><br />Many more people could try to do great work than do. What holds\\nthem back is a combination of modesty and fear. It seems presumptuous\\nto try to be Newton or Shakespeare. It also seems hard; surely if\\nyou tried something like that, you\\\'d fail. Presumably the calculation\\nis rarely explicit. Few people consciously decide not to try to do\\ngreat work. But that\\\'s what\\\'s going on subconsciously; they shy\\naway from the question.<br /><br />So I\\\'m going to pull a sneaky trick on you. Do you want to do great\\nwork, or not? Now you have to decide consciously. Sorry about that.\\nI wouldn\\\'t have done it to a general audience. But we already know\\nyou\\\'re interested.<br /><br />Don\\\'t worry about being presumptuous. You don\\\'t have to tell anyone.\\nAnd if it\\\'s too hard and you fail, so what? Lots of people have\\nworse problems than that. In fact you\\\'ll be lucky if it\\\'s the worst\\nproblem you have.<br /><br />Yes, you\\\'ll have to work hard. But again, lots of people have to\\nwork hard. And if you\\\'re working on something you find very\\ninteresting, which you necessarily will if you\\\'re on the right path,\\nthe work will probably feel less burdensome than a lot of your\\npeers\\\'.<br /><br />The discoveries are out there, waiting to be made. Why not by you?<br /><br /><br /><br /><br /><br /><br /><br /><br /><br />\\n<b>Notes</b><br /><br />[<a name="f1n"><font color=#000000>1</font></a>]\\nI don\\\'t think you could give a precise definition of what\\ncounts as great work. Doing great work means doing something important\\nso well that you expand people\\\'s ideas of what\\\'s possible. But\\nthere\\\'s no threshold for importance. It\\\'s a matter of degree, and\\noften hard to judge at the time anyway. So I\\\'d rather people focused\\non developing their interests rather than worrying about whether\\nthey\\\'re important or not. Just try to do something amazing, and\\nleave it to future generations to say if you succeeded.<br /><br />[<a name="f2n"><font color=#000000>2</font></a>]\\nA lot of standup comedy is based on noticing anomalies in\\neveryday life. "Did you ever notice...?" New ideas come from doing\\nthis about nontrivial things. Which may help explain why people\\\'s\\nreaction to a new idea is often the first half of laughing: Ha!<br /><br />[<a name="f3n"><font color=#000000>3</font></a>]\\nThat second qualifier is critical. If you\\\'re excited about\\nsomething most authorities discount, but you can\\\'t give a more\\nprecise explanation than "they don\\\'t get it," then you\\\'re starting\\nto drift into the territory of cranks.<br /><br />[<a name="f4n"><font color=#000000>4</font></a>]\\nFinding something to work on is not simply a matter of finding\\na match between the current version of you and a list of known\\nproblems. You\\\'ll often have to coevolve with the problem. That\\\'s\\nwhy it can sometimes be so hard to figure out what to work on. The\\nsearch space is huge. It\\\'s the cartesian product of all possible\\nt'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1764591706535943, text='\\noptimistic, and even though one of the sources of their optimism\\nis ignorance, in this case ignorance can sometimes beat knowledge.<br /><br />Try to finish what you start, though, even if it turns out to be\\nmore work than you expected. Finishing things is not just an exercise\\nin tidiness or self-discipline. In many projects a lot of the best\\nwork happens in what was meant to be the final stage.<br /><br />Another permissible lie is to exaggerate the importance of what\\nyou\\\'re working on, at least in your own mind. If that helps you\\ndiscover something new, it may turn out not to have been a lie after\\nall.\\n<font color=#dddddd>[<a href="#f7n"><font color=#dddddd>7</font></a>]</font><br /><br /><br /><br /><br /><br />\\nSince there are two senses of starting work &mdash; per day and per\\nproject &mdash; there are also two forms of procrastination. Per-project\\nprocrastination is far the more dangerous. You put off starting\\nthat ambitious project from year to year because the time isn\\\'t\\nquite right. When you\\\'re procrastinating in units of years, you can\\nget a lot not done.\\n<font color=#dddddd>[<a href="#f8n"><font color=#dddddd>8</font></a>]</font><br /><br />One reason per-project procrastination is so dangerous is that it\\nusually camouflages itself as work. You\\\'re not just sitting around\\ndoing nothing; you\\\'re working industriously on something else. So\\nper-project procrastination doesn\\\'t set off the alarms that per-day\\nprocrastination does. You\\\'re too busy to notice it.<br /><br />The way to beat it is to stop occasionally and ask yourself: Am I\\nworking on what I most want to work on? When you\\\'re young it\\\'s ok\\nif the answer is sometimes no, but this gets increasingly dangerous\\nas you get older.\\n<font color=#dddddd>[<a href="#f9n"><font color=#dddddd>9</font></a>]</font><br /><br /><br /><br /><br /><br />\\nGreat work usually entails spending what would seem to most people\\nan unreasonable amount of time on a problem. You can\\\'t think of\\nthis time as a cost, or it will seem too high. You have to find the\\nwork sufficiently engaging as it\\\'s happening.<br /><br />There may be some jobs where you have to work diligently for years\\nat things you hate before you get to the good part, but this is not\\nhow great work happens. Great work happens by focusing consistently\\non something you\\\'re genuinely interested in. When you pause to take\\nstock, you\\\'re surprised how far you\\\'ve come.<br /><br />The reason we\\\'re surprised is that we underestimate the cumulative\\neffect of work. Writing a page a day doesn\\\'t sound like much, but\\nif you do it every day you\\\'ll write a book a year. That\\\'s the key:\\nconsistency. People who do great things don\\\'t get a lot done every\\nday. They get something done, rather than nothing.<br /><br />If you do work that compounds, you\\\'ll get exponential growth. Most\\npeople who do this do it unconsciously, but it\\\'s worth stopping to\\nthink about. Learning, for example, is an instance of this phenomenon:\\nthe more you learn about something, the easier it is to learn more.\\nGrowing an audience is another: the more fans you have, the more\\nnew fans they\\\'ll bring you.<br /><br />'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.174069664815369, text='\\ninside.<br /><br /><br /><br /><br /><br />Let\\\'s talk a little more about the complicated business of figuring\\nout what to work on. The main reason it\\\'s hard is that you can\\\'t\\ntell what most kinds of work are like except by doing them. Which\\nmeans the four steps overlap: you may have to work at something for\\nyears before you know how much you like it or how good you are at\\nit. And in the meantime you\\\'re not doing, and thus not learning\\nabout, most other kinds of work. So in the worst case you choose\\nlate based on very incomplete information.\\n<font color=#dddddd>[<a href="#f4n"><font color=#dddddd>4</font></a>]</font><br /><br />The nature of ambition exacerbates this problem. Ambition comes in\\ntwo forms, one that precedes interest in the subject and one that\\ngrows out of it. Most people who do great work have a mix, and the\\nmore you have of the former, the harder it will be to decide what\\nto do.<br /><br />The educational systems in most countries pretend it\\\'s easy. They\\nexpect you to commit to a field long before you could know what\\nit\\\'s really like. And as a result an ambitious person on an optimal\\ntrajectory will often read to the system as an instance of breakage.<br /><br />It would be better if they at least admitted it &mdash; if they admitted\\nthat the system not only can\\\'t do much to help you figure out what\\nto work on, but is designed on the assumption that you\\\'ll somehow\\nmagically guess as a teenager. They don\\\'t tell you, but I will:\\nwhen it comes to figuring out what to work on, you\\\'re on your own.\\nSome people get lucky and do guess correctly, but the rest will\\nfind themselves scrambling diagonally across tracks laid down on\\nthe assumption that everyone does.<br /><br />What should you do if you\\\'re young and ambitious but don\\\'t know\\nwhat to work on? What you should <i>not</i> do is drift along passively,\\nassuming the problem will solve itself. You need to take action.\\nBut there is no systematic procedure you can follow. When you read\\nbiographies of people who\\\'ve done great work, it\\\'s remarkable how\\nmuch luck is involved. They discover what to work on as a result\\nof a chance meeting, or by reading a book they happen to pick up.\\nSo you need to make yourself a big target for luck, and the way to\\ndo that is to be curious. Try lots of things, meet lots of people,\\nread lots of books, ask lots of questions.\\n<font color=#dddddd>[<a href="#f5n"><font color=#dddddd>5</font></a>]</font><br /><br />When in doubt, optimize for interestingness. Fields change as you\\nlearn more about them. What mathematicians do, for example, is very\\ndifferent from what you do in high school math classes. So you need\\nto give different types of work a chance to show you what they\\\'re\\nlike. But a field should become <i>increasingly</i> interesting as you\\nlearn more about it. If it doesn\\\'t, it\\\'s probably not for you.<br /><br />Don\\\'t worry if you find you\\\'re interested in different things than\\nother people. The stranger your tastes in interestingness, the\\nbetter. Strange tastes are often strong ones, and a strong taste\\nfor work means you\\\'ll be productive. And you\\\'re more likely to find\\nnew things if you'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.158095578895721, text='. Don\\\'t copy the manner of\\nan eminent 50 year old professor if you\\\'re 18, for example, or the\\nidiom of a Renaissance poem hundreds of years later.<br /><br />Some of the features of things you admire are flaws they succeeded\\ndespite. Indeed, the features that are easiest to imitate are the\\nmost likely to be the flaws.<br /><br />This is particularly true for behavior. Some talented people are\\njerks, and this sometimes makes it seem to the inexperienced that\\nbeing a jerk is part of being talented. It isn\\\'t; being talented\\nis merely how they get away with it.<br /><br />One of the most powerful kinds of copying is to copy something from\\none field into another. History is so full of chance discoveries\\nof this type that it\\\'s probably worth giving chance a hand by\\ndeliberately learning about other kinds of work. You can take ideas\\nfrom quite distant fields if you let them be metaphors.<br /><br />Negative examples can be as inspiring as positive ones. In fact you\\ncan sometimes learn more from things done badly than from things\\ndone well; sometimes it only becomes clear what\\\'s needed when it\\\'s\\nmissing.<br /><br /><br /><br /><br /><br />\\nIf a lot of the best people in your field are collected in one\\nplace, it\\\'s usually a good idea to visit for a while. It will\\nincrease your ambition, and also, by showing you that these people\\nare human, increase your self-confidence.\\n<font color=#dddddd>[<a href="#f26n"><font color=#dddddd>26</font></a>]</font><br /><br />If you\\\'re earnest you\\\'ll probably get a warmer welcome than you\\nmight expect. Most people who are very good at something are happy\\nto talk about it with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1566747762241967, text=',\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if you do that you\\\'ll find you get diminishing returns:\\nfatigue will make you stupid, and eventually even damage your health.\\nThe point at which work yields diminishing returns depends on the\\ntype. Some of the hardest types you might only be able to do for\\nfour or five hours a day.<br /><br />Ideally those hours will be contiguous. To the extent you can, try\\nto arrange your life so you have big blocks of time to work in.\\nYou\\\'ll shy away from hard tasks if you know you might be interrupted.<br /><br />It will probably be harder to start working than to keep working.\\nYou\\\'ll often have to trick yourself to get over that initial\\nthreshold. Don\\\'t worry about this; it\\\'s the nature of work, not a\\nflaw in your character. Work has a sort of activation energy, both\\nper day and per project. And since this threshold is fake in the\\nsense that it\\\'s higher than the energy required to keep going, it\\\'s\\nok to tell yourself a lie of corresponding magnitude to get over\\nit.<br /><br />It\\\'s usually a mistake to lie to yourself if you want to do great\\nwork, but this is one of the rare cases where it isn\\\'t. When I\\\'m\\nreluctant to start work in the morning, I often trick myself by\\nsaying "I\\\'ll just read over what I\\\'ve got so far." Five minutes\\nlater I\\\'ve found something that seems mistaken or incomplete, and\\nI\\\'m off.<br /><br />Similar techniques work for starting new projects. It\\\'s ok to lie\\nto yourself about how much work a project will entail, for example.\\nLots of great things began with someone saying "How hard could it\\nbe?"<br /><br />This is one case where the young have an advantage. They\\\'re more'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1349744395573516, text=' audience\\nin the traditional sense. Either way it doesn\\\'t need to be big.\\nThe value of an audience doesn\\\'t grow anything like linearly with\\nits size. Which is bad news if you\\\'re famous, but good news if\\nyou\\\'re just starting out, because it means a small but dedicated\\naudience can be enough to sustain you. If a handful of people\\ngenuinely love what you\\\'re doing, that\\\'s enough.<br /><br />To the extent you can, avoid letting intermediaries come between\\nyou and your audience. In some types of work this is inevitable,\\nbut it\\\'s so liberating to escape it that you might be better off\\nswitching to an adjacent type if that will let you go direct.\\n<font color=#dddddd>[<a href="#f28n"><font color=#dddddd>28</font></a>]</font><br /><br />The people you spend time with will also have a big effect on your\\nmorale. You\\\'ll find there are some who increase your energy and\\nothers who decrease it, and the effect someone has is not always\\nwhat you\\\'d expect. Seek out the people who increase your energy and\\navoid those who decrease it. Though of course if there\\\'s someone\\nyou need to take care of, that takes precedence.<br /><br />Don\\\'t marry someone who doesn\\\'t understand that you need to work,\\nor sees your work as competition for your attention. If you\\\'re\\nambitious, you need to work; it\\\'s almost like a medical condition;\\nso someone who won\\\'t let you work either doesn\\\'t understand you,\\nor does and doesn\\\'t care.<br /><br />Ultimately morale is physical. You think with your body, so it\\\'s\\nimportant to take care of it. That means exercising regularly,\\neating and sleeping well, and avoiding the more dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.123214818076958, text='b\'<html><head><meta name="Keywords" content="" /><title>How to Do Great Work</title><!-- <META NAME="ROBOTS" CONTENT="NOODP"> -->\\n<link rel="shortcut icon" href="http://ycombinator.com/arc/arc.png">\\n</head><body bgcolor="#ffffff" background="https://s.turbifycdn.com/aah/paulgraham/bel-6.gif" text="#000000" link="#000099" vlink="#464646"><table border="0" cellspacing="0" cellpadding="0"><tr valign="top"><td><map name=118ab66adb24b4f><area shape=rect coords="0,0,67,21" href="index.html"><area shape=rect coords="0,21,67,42" href="articles.html"><area shape=rect coords="0,42,67,63" href="http://www.amazon.com/gp/product/0596006624"><area shape=rect coords="0,63,67,84" href="books.html"><area shape=rect coords="0,84,67,105" href="http://ycombinator.com"><area shape=rect coords="0,105,67,126" href="arc.html"><area shape=rect coords="0,126,67,147" href="bel.html"><area shape=rect coords="0,147,67,168" href="lisp.html"><area shape=rect coords="0,168,67,189" href="antispam.html"><area shape=rect coords="0,189,67,210" href="kedrosky.html"><area shape=rect coords="0,210,67,231" href="faq.html"><area shape=rect coords="0,231,67,252" href="raq.html"><area shape=rect coords="0,252,67,273" href="quo.html"><area shape=rect coords="0,273,67,294" href="rss.html"><area shape=rect coords="0,294,67,315" href="bio.html"><area shape=rect coords="0,315,67,336" href="https://twitter.com/paulg"><area shape=rect coords="0,336,67,357" href="https://mas.to/@paulg"></map><img src="https://s.turbifycdn.com/aah/paulgraham/bel-7.gif" width="69" height="357" usemap=#118ab66adb24b4f border="0" hspace="0" vspace="0" ismap /></td><td><img src="https://sep.turbifycdn.com/ca/Img/trans_1x1.gif" height="1" width="26" border="0" /></td><td><a href="index.html"><img src="https://s.turbifycdn.com/aah/paulgraham/bel-8.gif" width="410" height="45" border="0" hspace="0" vspace="0" /></a><br /><br /><table border="0" cellspacing="0" cellpadding="0" width="435"><tr valign="top"><td width="435"><img src="https://s.turbifycdn.com/aah/paulgraham/how-to-do-great-work-2.gif" width="185" height="18" border="0" hspace="0" vspace="0" alt="How to Do Great Work" /><br /><br /><font size="2" face="verdana">July 2023<br /><br />If you collected lists of techniques for doing great work in a lot\\nof different fields, what would the intersection look like? I decided\\nto find out'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1193194369249235, text=' dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied with a single\\nword, my bet would be on "curiosity."<br /><br />That doesn\\\'t translate directly to advice. It\\\'s not enough just to\\nbe curious, and you can\\\'t command curiosity anyway. But you can\\nnurture it and let it drive you.<br /><br />Curiosity is the key to all four steps in doing great work: it will\\nchoose the field for you, get you to the frontier, cause you to\\nnotice the gaps in it, and drive you to explore them. The whole\\nprocess is a kind of dance with curiosity.<br /><br /><br /><br /><br /><br />\\nBelieve it or not, I tried to make this essay as short as I could.\\nBut its length at least means it acts as a filter. If you made it\\nthis far, you must be interested in doing great work. And if so\\nyou\\\'re already further along than you might realize, because the\\nset of people willing to want to is small.<br /><br />The factors in doing great work are factors in the literal,\\nmathematical sense, and they are: ability, interest, effort, and\\nluck. Luck by definition you can\\\'t do anything about, so we can\\nignore that. And we can assume effort, if you do in fact want to\\ndo great work. So the problem boils down to ability and interest.\\nCan you find a kind of work where your ability and interest will\\ncombine to yield an explosion of new ideas?<br /><br />Here there are grounds for optimism. There are so many different\\nways to do great work, and even more that are still undiscovered.\\nOut of all those different types of work, the one you\\\'re most suited\\nfor is probably a pretty close match. Probably a comically close\\nmatch. It\\\'s just a question of finding it, and how far into it')]), ResponseOutputMessage(id='msg_3591ea71-8b35-4efd-a5ad-c1c250801971', content=[ResponseOutputText(annotations=[AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')], text='To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message')], parallel_tool_calls=False, temperature=None, tool_choice=None, tools=None, top_p=None, background=None, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier=None, status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity=None), top_logprobs=None, truncation=None, usage=None, user=None)

In [34]: resp.output[1].content[0].text
Out[34]: 'To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.'
```
</details>

The relevant output looks like this:

```python
>resp.output[1].content[0].annotations
[AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')]```
And
```python
In [144]: print(resp.output[1].content[0].text)
To do great work, consider the following principles:

1. **Follow Your Interests**: Engage in work that genuinely excites you.
If you find an area intriguing, pursue it without being overly concerned
about external pressures or norms. You should create things that you
would want for yourself, as this often aligns with what others in your
circle might want too.

2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should
be tempered by genuine interest. Instead of detailed planning for the
future, focus on exciting projects that keep your options open. This
approach, known as "staying upwind," allows for adaptability and can
lead to unforeseen achievements.

3. **Choose Quality Colleagues**: Collaborating with talented colleagues
can significantly affect your own work. Seek out individuals who offer
surprising insights and whom you admire. The presence of good colleagues
can elevate the quality of your work and inspire you.

4. **Maintain High Morale**: Your attitude towards work and life affects
your performance. Cultivating optimism and viewing yourself as lucky
rather than victimized can boost your productivity. It’s essential to
care for your physical health as well since it directly impacts your
mental faculties and morale.

5. **Be Consistent**: Great work often comes from cumulative effort.
Daily progress, even in small amounts, can result in substantial
achievements over time. Emphasize consistency and make the work
engaging, as this reduces the perceived burden of hard labor.

6. **Embrace Curiosity**: Curiosity is a driving force that can guide
you in selecting fields of interest, pushing you to explore uncharted
territories. Allow it to shape your work and continually seek knowledge
and insights.

By focusing on these aspects, you can create an environment conducive to
great work and personal fulfillment.
```

And the code below outputs only periods highlighting that the position/index behaves as expected—i.e., the annotation happens at the end of the sentence.

```python
print([resp.output[1].content[0].text[j.index] for j in
resp.output[1].content[0].annotations])
Out[41]: ['.', '.', '.', '.', '.', '.']
```

## Test Plan
Unit tests added.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-07 14:00:56 -04:00
slekkala1
bba9957edd
feat(api): Add vector store file batches api (#3642)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do?

Add Open AI Compatible vector store file batches api. This functionality
is needed to attach many files to a vector store as a batch.
https://github.com/llamastack/llama-stack/issues/3533

API Stubs have been merged
https://github.com/llamastack/llama-stack/pull/3615
Adds persistence for file batches as discussed in diff
https://github.com/llamastack/llama-stack/pull/3544
(Used claude code for generation and reviewed by me)


## Test Plan
1. Unit tests pass
2. Also verified the cc-vec integration with LLamaStackClient works with
the file batches api. https://github.com/raghotham/cc-vec
2. Integration tests pass
2025-10-06 16:58:22 -07:00
Christian Zaccaria
bcdbb53be3
feat: implement keyword and hybrid search for Weaviate provider (#3264)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- This PR implements keyword and hybrid search for Weaviate DB based on
its inbuilt functions.
- Added fixtures to conftest.py for Weaviate.
- Enabled integration tests for remote Weaviate on all 3 search modes.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3010 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Unit tests and integration tests should pass on this PR.
2025-10-03 10:22:30 +02:00
slekkala1
cc64093ae4
feat(api): Add Vector Store File batches api stub (#3615)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 34s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do?
Adding api stubs for vector store file batches apis
https://github.com/llamastack/llama-stack/issues/3533
API Ref:
https://platform.openai.com/docs/api-reference/vector-stores-file-batches

## Test Plan
CI
2025-09-30 12:07:33 -07:00
Matthew Farrellee
478b4ff1e6
chore(migrate apis): move VectorDBWithIndex from embeddings to openai_embeddings (#3294)
# What does this PR do?

migrates VectorDBWithIndex to use openai_embeddings

part of #2365 

## Test Plan

existing unit tests
2025-08-31 14:48:35 -07:00
Mustafa Elbehery
c3b2b06974
refactor(logging): rename llama_stack logger categories (#3065)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR renames categories of llama_stack loggers.

This PR aligns logging categories as per the package name, as well as
reviews from initial
https://github.com/meta-llama/llama-stack/pull/2868. This is a follow up
to #3061.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Replaces https://github.com/meta-llama/llama-stack/pull/2868
Part of https://github.com/meta-llama/llama-stack/issues/2865

cc @leseb @rhuss

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
2025-08-21 17:31:04 -07:00
Mustafa Elbehery
3f8df167f3
chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061)
# What does this PR do?

This PR adds a step in pre-commit to enforce using `llama_stack` logger.

Currently, various parts of the code base uses different loggers. As a
custom `llama_stack` logger exist and used in the codebase, it is better
to standardize its utilization.

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
2025-08-20 07:15:35 -04:00
Ashwin Bharambe
3d90117891
chore(tests): fix responses and vector_io tests (#3119)
Some fixes to MCP tests. And a bunch of fixes for Vector providers.

I also enabled a bunch of Vector IO tests to be used with
`LlamaStackLibraryClient`

## Test Plan

Run Responses tests with llama stack library client:
```
pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \
   --text-model openai/gpt-4o \
  --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \
  -k "client_with_models"
```

Do the same with `-k openai_client`

The rest should be taken care of by CI.
2025-08-12 16:15:53 -07:00
Varsha
e3928e6a29
feat: Implement hybrid search in Milvus (#2644)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Test External API and Providers / test-external (venv) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do?
This PR implements hybrid search for Milvus DB based on the inbuilt
milvus support.
   
    To test:
    ```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s
--tb=long --disable-warnings --asyncio-mode=auto
    ```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-08-07 09:42:03 +02:00
Varsha
3c2aee610d
refactor: Remove double filtering based on score threshold (#3019)
# What does this PR do?
Remove score_threshold based check from `OpenAIVectorStoreMixin`

Closes: https://github.com/meta-llama/llama-stack/issues/3018

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-08-02 15:57:03 -07:00
Francisco Arceo
33cca26154
chore: Enabling Integration tests for Weaviate (#2882)
# What does this PR do?

This PR (1) enables the files API for Weaviate and (2) enables
integration tests for Weaviate, which adds a docker container to the
github action.

This PR also handles a couple of edge cases for in creating the
collection and ensuring the tests all pass.

## Test Plan
CI enabled

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-31 20:29:50 -04:00
Nathan Weinberg
cd5c6a2fcd
chore: standardize vector store not found error (#2968)
# What does this PR do?
1. Creates a new `VectorStoreNotFoundError` class
2. Implements the new class where appropriate 

Relates to #2379

Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
2025-07-30 15:19:16 -07:00
Derek Higgins
52201612de
feat: implement chunk deletion for vector stores (#2701)
Add support for deleting individual chunks from vector stores

- Add abstract remove_chunk() method to EmbeddingIndex base class
- Implement chunk deletion for Faiss provider, SQLite Vec, Milvus,
PGVector
- Placeholder implementations with NotImplementedError for
Chroma/Qdrant/Weaviate
- Integrate chunk deletion into OpenAI vector store file deletion flow
- removed xfail from
test_openai_vector_store_delete_file_removes_from_vector_store

Closes: #2477

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
2025-07-25 10:30:30 -04:00
Francisco Arceo
2aba2c1236
chore: Moving vector store and vector store files helper methods to openai_vector_store_mixin (#2863)
# What does this PR do?
Moving vector store and vector store files helper methods to
`openai_vector_store_mixin.py`

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
The tests are already supported in the CI and tests the inline providers
and current integration tests.

Note that the `vector_index` fixture will be test `milvus_vec_adapter`,
`faiss_vec_adapter`, and `sqlite_vec_adapter` in
`tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`.

Additionally, the integration tests in `integration-vector-io-tests.yml`
runs `tests/integration/vector_io` tests for the following providers:
```python
vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"]
```

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-23 13:35:48 -04:00
Francisco Arceo
20c3197952
chore: Making name optional in openai_create_vector_store (#2858)
# What does this PR do?
chore: Making name optional in openai_create_vector_store


# Closes https://github.com/meta-llama/llama-stack/issues/2706

## Test Plan
CI and unit tests

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-22 13:31:31 -04:00
Francisco Arceo
31b088978a
fix: Fix /vector-stores/create API when vector store with duplicate name (#2617)
# What does this PR do?

Resolves https://github.com/meta-llama/llama-stack/issues/2735

Currently, if you test against OpenAI's Vector Stores API the
`client.vector_stores.search` call fails with an invalid vector_db
during routing (see the script referenced in the clickable item under
the Test Plan section).

This PR ensures that `client.vector_stores.search()` is compatible with
OpenAI's Vector Stores API.

Two biggest changes:
1. The `name`, which was previously used as the `vector_db_id`, has been
changed to be consistent with OpenAI's `vs_{uuid}` format.
2. The vector store ID has to be referenced by the ID, the name is not
reliable as every `client.vector_stores.create` results in a new vector
store.

NOTE: I believe this is a breaking change for end users as they'll need
to update their VectorDB identifiers.

## Test Plan
Unit tests:
```bash
./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v
```
Integration tests:
```bash
ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv

LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv
```

Unit tests and test script below 👇 

<details> 
<summary>Click here for script used to test OpenAI and Llama Stack
Vector Store implementation</summary>

```python
import json
import argparse
from openai import OpenAI, pagination
import logging
from colorama import Fore, Style, init
import traceback
import os

# Initialize colorama for color support in terminal
init(autoreset=True)

# Setup basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

DEMO_VECTOR_STORE_NAME = "Support FAQ FJA"
global DEMO_VECTOR_STORE_ID
global DEMO_VECTOR_STORE_ID2


def colored_print(color, text):
    """Prints text to the console with the specified color."""
    print(f"{color}{text}{Style.RESET_ALL}")


def log_and_print(color, message, level=logging.INFO):
    """Logs a message and prints it to the console with the specified color."""
    logging.log(level, message)
    colored_print(color, message)


def run_tests(client, prefix="openai"):
    """
    Runs all tests using the provided OpenAI client and saves the output
    to JSON files with the given prefix.
    """
    # Create the directory if it doesn't exist
    os.makedirs('openai_testing', exist_ok=True)

    # Default values in case tests fail
    global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
    DEMO_VECTOR_STORE_ID = None
    DEMO_VECTOR_STORE_ID2 = None

    def test_idempotent_vector_store_creation():
        """
        Test that creating a vector store with the same name is idempotent.
        """
        log_and_print(Fore.BLUE, "Starting vector store creation test...")
        try:
            vector_store = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Attempt to create the same vector store again
            vector_store2 = client.vector_stores.create(
                name=DEMO_VECTOR_STORE_NAME,
            )

            # Check instead of assert
            if vector_store2.id != vector_store.id:
                log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)

            global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
            DEMO_VECTOR_STORE_ID = vector_store.id
            DEMO_VECTOR_STORE_ID2 = vector_store2.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2
        except Exception as e:
            log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            # Create a fallback vector store ID if needed
            if 'vector_store' in locals() and vector_store:
                DEMO_VECTOR_STORE_ID = vector_store.id
            return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2

    def test_vector_store_list():
        """
        Test listing vector stores.
        """
        log_and_print(Fore.BLUE, "Starting vector store list test...")
        try:
            vector_stores = client.vector_stores.list()

            # Check instead of assert
            if not isinstance(vector_stores, pagination.SyncCursorPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Vector store list test passed!")

            vector_stores_data = vector_stores.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f:
                json.dump(vector_stores_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_retrieve_vector_store():
        """
        Test retrieving a specific vector store.
        """
        log_and_print(Fore.BLUE, "Starting retrieve vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            vector_store = client.vector_stores.retrieve(
                vector_store_id=DEMO_VECTOR_STORE_ID,
            )

            # Check instead of assert
            if vector_store.id != DEMO_VECTOR_STORE_ID:
                log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Retrieve vector store test passed!")

            vector_store_data = vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f:
                json.dump(vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_modify_vector_store():
        """
        Test modifying a vector store.
        """
        log_and_print(Fore.BLUE, "Starting modify vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            updated_vector_store = client.vector_stores.update(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                name="Updated Support FAQ FJA",
            )

            # Check instead of assert
            if updated_vector_store.name != "Updated Support FAQ FJA":
                log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Modify vector store test passed!")

            updated_vector_store_data = updated_vector_store.to_dict()
            log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f:
                json.dump(updated_vector_store_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_delete_vector_store():
        """
        Test deleting a vector store.
        """
        log_and_print(Fore.BLUE, "Starting delete vector store test...")
        if not DEMO_VECTOR_STORE_ID2:
            log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available",
                          level=logging.WARNING)
            return

        try:
            response = client.vector_stores.delete(
                vector_store_id=DEMO_VECTOR_STORE_ID2,
            )

            log_and_print(Fore.GREEN, "Delete vector store test passed!")

            response_data = response.to_dict()
            log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}")
            with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f:
                json.dump(response_data, f, indent=2)
        except Exception as e:
            log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_create_vector_store_file():
        log_and_print(Fore.BLUE, "Starting create vector store file test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            # create jsonl of files as an example
            with open("mydata.jsonl", "w") as f:
                f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n')
                f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n')

            # Create a simple text file if my_data_small.txt doesn't exist
            if not os.path.exists("my_data_small.txt"):
                with open("my_data_small.txt", "w") as f:
                    f.write("This is a test file for vector store testing.\n")

            created_file = client.files.create(
                file=open("my_data_small.txt", "rb"),
                purpose="assistants",
            )

            created_file_data = created_file.to_dict()
            log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_create.json', 'w') as f:
                json.dump(created_file_data, f, indent=2)

            retrieved_files = client.files.retrieve(created_file.id)
            retrieved_files_data = retrieved_files.to_dict()
            log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}")
            with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f:
                json.dump(retrieved_files_data, f, indent=2)

            vector_store_file = client.vector_stores.files.create(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                file_id=created_file.id,
            )
            log_and_print(Fore.GREEN, "Create vector store file test passed!")
        except Exception as e:
            log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    def test_search_vector_store():
        """
        Test searching a vector store.
        """
        log_and_print(Fore.BLUE, "Starting search vector store test...")
        if not DEMO_VECTOR_STORE_ID:
            log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available",
                          level=logging.WARNING)
            return

        try:
            query = "What is the banana policy?"
            search_results = client.vector_stores.search(
                vector_store_id=DEMO_VECTOR_STORE_ID,
                query=query,
                max_num_results=10,
                ranking_options={
                    'ranker': 'default-2024-11-15',
                    'score_threshold': 0.0,
                },
                rewrite_query=False,
            )

            # Check instead of assert
            if not isinstance(search_results, pagination.SyncPage):
                log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}",
                              level=logging.WARNING)
            else:
                log_and_print(Fore.GREEN, "Search vector store test passed!")

            search_results_dict = search_results.to_dict()
            log_and_print(Fore.WHITE, f"Search results = {search_results_dict}")
            with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f:
                json.dump(search_results_dict, f, indent=2)

            log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}")
        except Exception as e:
            log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())

    # Run all tests in sequence, even if some fail
    test_results = []

    try:
        result = test_idempotent_vector_store_creation()
        if result and len(result) == 2:
            DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result
        test_results.append(True)
    except Exception as e:
        log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
        test_results.append(False)

    for test_func in [
        test_vector_store_list,
        test_retrieve_vector_store,
        test_modify_vector_store,
        test_delete_vector_store,
        test_create_vector_store_file,
        test_search_vector_store
    ]:
        try:
            test_func()
            test_results.append(True)
        except Exception as e:
            log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR)
            logging.error(traceback.format_exc())
            test_results.append(False)

    if all(test_results):
        log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!")
    else:
        failed_count = test_results.count(False)
        log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.")
    parser.add_argument(
        "--provider",
        type=str,
        default="llama",
        choices=["openai", "llama", "both"],
        help="Specify which environment to test: openai, llama, or both. Default is both.",
    )
    args = parser.parse_args()

    try:
        if args.provider in ("openai", "both"):
            openai_client = OpenAI()
            run_tests(openai_client, prefix="openai")

        if args.provider in ("llama", "both"):
            llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")
            run_tests(llama_client, prefix="llama")

        log_and_print(Fore.GREEN, "All tests completed!")

    except Exception as e:
        log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR)
        logging.error(traceback.format_exc())
```
</details>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-15 11:24:41 -04:00
Francisco Arceo
33f0d83ad3
chore: Move vector store kvstore implementation into openai_vector_store_mixin.py (#2748) 2025-07-14 18:10:35 -04:00
Derek Higgins
f77d4d91f5
fix: handle encoding errors when adding files to vector store (#2574)
Some checks failed
Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s
Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 8s
Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s
Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s
Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.13) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 45s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 43s
Pre-commit / pre-commit (push) Successful in 1m35s
- Add try-catch block around data.decode() to handle UnicodeDecodeError
- Implement UTF-8 fallback when detected encoding fails
- Return empty string when both encodings fail
- add unit tests

Fixes #2572: UnicodeDecodeError when uploading files with problematic
encodings

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-04 12:10:18 +02:00
Francisco Arceo
ea80ea63ac
chore: Updating chunk id generation to ensure uniqueness (#2618)
# What does this PR do?
This handles an edge case for `generate_chunk_id` if the concatenation
of the `document_id` and `chunk_text` combination are not unique. Adding
the window location ensures uniqueness.

## Test Plan
Added unit test

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-07-04 10:26:35 +05:30
Francisco Arceo
cc19b56c87
chore: OpenAI compatibility for Milvus (#2470)
# What does this PR do?
Closes https://github.com/meta-llama/llama-stack/issues/2461



## Test Plan
Tested with the `ollama` distriubtion template and updated the vector_io
provider to:
```yaml
vector_io:
- provider_id: milvus
  provider_type: inline::milvus
  config:
    db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/milvus_store.db
    kvstore:
      type: sqlite
      db_name: milvus_registry.db
```

Ran the stack
```bash
llama stack run ./llama_stack/templates/ollama/run.yaml --image-type venv --env OLLAMA_URL="http://0.0.0.0:11434"
```

Ran the tests:
```
pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py  --embedding-model all-MiniLM-L6-v2
```
Output passed.

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-06-27 16:00:36 -07:00
Sébastien Han
ac5fd57387
chore: remove nested imports (#2515)
# What does this PR do?

* Given that our API packages use "import *" in `__init.py__` we don't
need to do `from llama_stack.apis.models.models` but simply from
llama_stack.apis.models. The decision to use `import *` is debatable and
should probably be revisited at one point.

* Remove unneeded Ruff F401 rule
* Consolidate Ruff F403 rule in the pyprojectfrom
llama_stack.apis.models.models

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-06-26 08:01:05 +05:30
Francisco Arceo
82f13fe83e
feat: Add ChunkMetadata to Chunk (#2497)
# What does this PR do?
Adding `ChunkMetadata` so we can properly delete embeddings later.

More specifically, this PR refactors and extends the chunk metadata
handling in the vector database and introduces a distinction between
metadata used for model context and backend-only metadata required for
chunk management, storage, and retrieval. It also improves chunk ID
generation and propagation throughout the stack, enhances test coverage,
and adds new utility modules.

```python
class ChunkMetadata(BaseModel):
    """
    `ChunkMetadata` is backend metadata for a `Chunk` that is used to store additional information about the chunk that
        will NOT be inserted into the context during inference, but is required for backend functionality.
        Use `metadata` in `Chunk` for metadata that will be used during inference.
    """
    document_id: str | None = None
    chunk_id: str | None = None
    source: str | None = None
    created_timestamp: int | None = None
    updated_timestamp: int | None = None
    chunk_window: str | None = None
    chunk_tokenizer: str | None = None
    chunk_embedding_model: str | None = None
    chunk_embedding_dimension: int | None = None
    content_token_count: int | None = None
    metadata_token_count: int | None = None
```
Eventually we can migrate the document_id out of the `metadata` field.
I've introduced the changes so that `ChunkMetadata` is backwards
compatible with `metadata`.

<!-- If resolving an issue, uncomment and update the line below -->
Closes https://github.com/meta-llama/llama-stack/issues/2501 

## Test Plan
Added unit tests

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-06-25 15:55:23 -04:00
Varsha
cfee63bd0d
feat: Add search_mode support to OpenAI vector store API (#2500)
Some checks failed
Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 7s
Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.13, providers) (push) Failing after 18s
Test Llama Stack Build / build-single-provider (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, 3.12, vector_io) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 19s
Test Llama Stack Build / build (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 44s
Test External Providers / test-external-providers (venv) (push) Failing after 47s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 50s
Pre-commit / pre-commit (push) Successful in 2m12s
# What does this PR do?
Add search_mode parameter (vector/keyword/hybrid) to
openai_search_vector_store method. Fixes OpenAPI
code generation by using str instead of Literal type.

Closes: #2459 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-06-24 20:38:47 -04:00
Ashwin Bharambe
73c18feac4
fix: update the signature of openai_list_files_in_vector_store in all VectorIO impls (#2503) 2025-06-24 18:55:56 +05:30
Ben Browning
f394c7f2d9
feat: Add missing Vector Store Files API surface (#2468)
Some checks failed
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s
Integration Tests / test-matrix (http, 3.11, agents) (push) Failing after 26s
Integration Tests / test-matrix (http, 3.12, tool_runtime) (push) Failing after 19s
Python Package Build Test / build (3.11) (push) Failing after 5s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Python Package Build Test / build (3.13) (push) Failing after 5s
Integration Tests / test-matrix (http, 3.11, scoring) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 21s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 15s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Update ReadTheDocs / update-readthedocs (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 48s
Test External Providers / test-external-providers (venv) (push) Failing after 43s
Unit Tests / unit-tests (3.13) (push) Failing after 52s
Pre-commit / pre-commit (push) Successful in 2m4s
# What does this PR do?

This adds the ability to list, retrieve, update, and delete Vector Store
Files. It implements these new APIs for the faiss and sqlite-vec
providers, since those are the two that also have the rest of the vector
store files implementation.

Closes #2445 

## Test Plan

### test_openai_vector_stores Integration Tests

There are a number of new integration tests added, which I ran for each
provider as outlined below.

faiss (from ollama distro):

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run llama_stack/templates/ollama/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \
  --embedding-model=all-MiniLM-L6-v2
```

sqlite-vec (from starter distro):

```
llama stack run llama_stack/templates/starter/run.yaml

LLAMA_STACK_CONFIG=http://localhost:8321 \
pytest -sv tests/integration/vector_io/test_openai_vector_stores.py \
  --embedding-model=all-MiniLM-L6-v2
```

### file_search verification tests

I also ensured the file_search verification tests continue to work, both
for faiss and sqlite-vec.

faiss (ollama distro):

```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run llama_stack/templates/ollama/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.2-3B-Instruct
```


sqlite-vec (starter distro):

```
llama stack run llama_stack/templates/starter/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=together/meta-llama/Llama-3.2-3B-Instruct-Turbo
```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-19 11:08:24 -04:00
ehhuang
db2cd9e8f3
feat: support filters in file search (#2472)
# What does this PR do?
Move to use vector_stores.search for file search tool in Responses,
which supports filters.

closes #2435 

## Test Plan
Added e2e test with fitlers.
myenv ❯ llama stack run llama_stack/templates/fireworks/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k 'file_search and filters' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.3-70B-Instruct
2025-06-18 21:50:55 -07:00
Varsha
2e8054bede
feat: Implement hybrid search in SQLite-vec (#2312)
Some checks failed
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 25s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 24s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 22s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 41s
Test Llama Stack Build / generate-matrix (push) Successful in 37s
Test Llama Stack Build / build-single-provider (push) Failing after 37s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 35s
Test External Providers / test-external-providers (venv) (push) Failing after 5s
Update ReadTheDocs / update-readthedocs (push) Failing after 5s
Unit Tests / unit-tests (3.11) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 7s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s
Unit Tests / unit-tests (3.10) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 2m0s
# What does this PR do?
Add support for hybrid search mode in SQLite-vec provider, which
combines
keyword and vector search for better results. The implementation:

- Adds hybrid search mode as a new option alongside vector and keyword
search
- Implements query_hybrid method in SQLiteVecIndex that:
  - First performs keyword search to get candidate matches
  - Then applies vector similarity search on those candidates
- Updates documentation to reflect the new search mode

This change improves search quality by leveraging both semantic
similarity
and keyword matching, while maintaining backward compatibility with
existing
vector and keyword search modes.

## Test Plan
```
pytest tests/unit/providers/vector_io/test_sqlite_vec.py -v -s --tb=short
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:217: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
=============================================================================================== test session starts ===============================================================================================
platform darwin -- Python 3.10.16, pytest-8.3.5, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '8.3.5', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'asyncio': '0.26.0', 'nbval': '0.11.0', 'cov': '6.1.1'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, anyio-4.8.0, asyncio-0.26.0, nbval-0.11.0, cov-6.1.1
asyncio: mode=strict, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 10 items                                                                                                                                                                                                

tests/unit/providers/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_full_text_search_k_greater_than_results PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_no_keyword_matches PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_score_threshold PASSED
tests/unit/providers/vector_io/test_sqlite_vec.py::test_query_chunks_hybrid_different_embedding PASSED
```

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-06-13 15:54:06 -04:00
Ben Browning
941f505eb0
feat: File search tool for Responses API (#2426)
# What does this PR do?

This is an initial working prototype of wiring up the `file_search`
builtin tool for the Responses API to our existing rag knowledge search
tool.

This is me seeing what I could pull together on top of the bits we
already have merged. This may not be the ideal way to implement this,
and things like how I shuffle the vector store ids from the original
response API tool request to the actual tool execution feel a bit hacky
(grep for `tool_kwargs["vector_db_ids"]` in `_execute_tool_call` to see
what I mean).

## Test Plan

I stubbed in some new tests to exercise this using text and pdf
documents.

Note that this is currently under tests/verification only because it
sometimes flakes with tool calling of the small Llama-3.2-3B model we
run in CI (and that I use as an example below). We'd want to make the
test a bit more robust in some way if we moved this over to
tests/integration and ran it in CI.

### OpenAI SaaS (to verify test correctness)

```
pytest -sv tests/verifications/openai_api/test_responses.py \
  -k 'file_search' \
  --base-url=https://api.openai.com/v1 \
  --model=gpt-4o
```

### Fireworks with faiss vector store

```
llama stack run llama_stack/templates/fireworks/run.yaml

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k 'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.3-70B-Instruct
```

### Ollama with faiss vector store

This sometimes flakes on Ollama because the quantized small model
doesn't always choose to call the tool to answer the user's question.
But, it often works.

```
ollama run llama3.2:3b

INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run ./llama_stack/templates/ollama/run.yaml \
  --image-type venv \
  --env OLLAMA_URL="http://0.0.0.0:11434"

pytest -sv tests/verifications/openai_api/test_responses.py \
  -k'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=meta-llama/Llama-3.2-3B-Instruct
```

### OpenAI provider with sqlite-vec vector store

```
llama stack run ./llama_stack/templates/starter/run.yaml --image-type venv

 pytest -sv tests/verifications/openai_api/test_responses.py \
  -k 'file_search' \
  --base-url=http://localhost:8321/v1/openai/v1 \
  --model=openai/gpt-4o-mini
```

### Ensure existing vector store integration tests still pass

```
ollama run llama3.2:3b

INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
llama stack run ./llama_stack/templates/ollama/run.yaml \
  --image-type venv \
  --env OLLAMA_URL="http://0.0.0.0:11434"

LLAMA_STACK_CONFIG=http://localhost:8321 \
pytest -sv tests/integration/vector_io \
  --text-model "meta-llama/Llama-3.2-3B-Instruct" \
  --embedding-model=all-MiniLM-L6-v2
```

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-06-13 14:32:48 -04:00
Hardik Shah
fef670b024
feat: update openai tests to work with both clients (#2442)
Some checks failed
Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 18s
Integration Tests / test-matrix (http, 3.11, inference) (push) Failing after 22s
Integration Tests / test-matrix (http, 3.11, providers) (push) Failing after 20s
Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (http, 3.12, scoring) (push) Failing after 18s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s
Unit Tests / unit-tests (3.11) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 1m45s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m46s
Unit Tests / unit-tests (3.12) (push) Failing after 2m1s
Unit Tests / unit-tests (3.10) (push) Failing after 2m3s
Pre-commit / pre-commit (push) Successful in 3m11s
https://github.com/meta-llama/llama-stack-client-python/pull/238 updated
llama-stack-client to also support Open AI endpoints for embeddings,
files, vector-stores. This updates the test to test all configs --
openai sdk, llama stack sdk and library-as-client.
2025-06-12 16:30:23 -07:00
Hardik Shah
0bc1747ed8
feat: update search for vector_stores (#2441)
Updated the `search` functionality return response to match openai. 

## Test Plan
```
pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2
```
2025-06-12 15:34:22 -07:00
Hardik Shah
de37a04c3e
fix: set appropriate defaults for params (#2434)
Some checks failed
Integration Tests / test-matrix (http, 3.11, post_training) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.11, datasets) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, datasets) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 11s
Integration Tests / test-matrix (library, 3.10, agents) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 16s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 17s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 19s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 15s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 14s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 13s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 20s
Update ReadTheDocs / update-readthedocs (push) Failing after 17s
Unit Tests / unit-tests (3.12) (push) Failing after 20s
Unit Tests / unit-tests (3.11) (push) Failing after 1m39s
Unit Tests / unit-tests (3.13) (push) Failing after 1m37s
Unit Tests / unit-tests (3.10) (push) Failing after 1m41s
Pre-commit / pre-commit (push) Failing after 3h4m8s
Setting defaults to be `| None` else they get marked as required params
in open-api spec.
2025-06-11 17:30:34 -07:00
Hardik Shah
d55100d9b7
feat: OpenAIVectorIOMixin for vector_stores common logic (#2427)
Extracts common OpenAI vector-store code into its own mixin so that all
providers can share the same core logic.
This also makes it easy for Llama Stack to support both vector-stores
and Llama Stack APIs in the interim so that both share the same
underlying vector-dbs.

Each provider contains storage specific logic to `create / edit / delete
/ list` vector dbs while the plumbing logic is standardized in the
common code.

Ensured that this works well with both faiss and sqllite-vec. 

### Test Plan 
```
llama stack run starter
pytest -sv --stack-config http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2
```
2025-06-11 15:40:57 -07:00
Francisco Arceo
f328436831
feat: Enable ingestion of precomputed embeddings (#2317)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Integration Tests / test-matrix (http, inspect) (push) Failing after 9s
Integration Tests / test-matrix (http, post_training) (push) Failing after 9s
Integration Tests / test-matrix (http, agents) (push) Failing after 10s
Integration Tests / test-matrix (http, datasets) (push) Failing after 10s
Integration Tests / test-matrix (http, inference) (push) Failing after 10s
Integration Tests / test-matrix (library, agents) (push) Failing after 9s
Integration Tests / test-matrix (http, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, datasets) (push) Failing after 8s
Integration Tests / test-matrix (http, providers) (push) Failing after 9s
Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 10s
Integration Tests / test-matrix (library, inference) (push) Failing after 9s
Test External Providers / test-external-providers (venv) (push) Failing after 6s
Integration Tests / test-matrix (library, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, post_training) (push) Failing after 10s
Unit Tests / unit-tests (3.11) (push) Failing after 7s
Unit Tests / unit-tests (3.10) (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 7s
Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 1m15s
2025-05-31 04:03:37 -06:00
Varsha
e92301f2d7
feat(sqlite-vec): enable keyword search for sqlite-vec (#1439)
# What does this PR do?
This PR introduces support for keyword based FTS5 search with BM25
relevance scoring. It makes changes to the existing EmbeddingIndex base
class in order to support a search_mode and query_str parameter, that
can be used for keyword based search implementations.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
run 
```
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
```
Output:
```
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
/Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
====================================================== test session starts =======================================================
platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.4-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0
asyncio: mode=auto, asyncio_default_fixture_loop_scope=None
collected 7 items                                                                                                                

llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_fts PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
```


For reference, with the implementation, the fts table looks like below:
```
Chunk ID: 9fbc39ce-c729-64a2-260f-c5ec9bb2a33e, Content: Sentence 0 from document 0
Chunk ID: 94062914-3e23-44cf-1e50-9e25821ba882, Content: Sentence 1 from document 0
Chunk ID: e6cfd559-4641-33ba-6ce1-7038226495eb, Content: Sentence 2 from document 0
Chunk ID: 1383af9b-f1f0-f417-4de5-65fe9456cc20, Content: Sentence 3 from document 0
Chunk ID: 2db19b1a-de14-353b-f4e1-085e8463361c, Content: Sentence 4 from document 0
Chunk ID: 9faf986a-f028-7714-068a-1c795e8f2598, Content: Sentence 5 from document 0
Chunk ID: ef593ead-5a4a-392f-7ad8-471a50f033e8, Content: Sentence 6 from document 0
Chunk ID: e161950f-021f-7300-4d05-3166738b94cf, Content: Sentence 7 from document 0
Chunk ID: 90610fc4-67c1-e740-f043-709c5978867a, Content: Sentence 8 from document 0
Chunk ID: 97712879-6fff-98ad-0558-e9f42e6b81d3, Content: Sentence 9 from document 0
Chunk ID: aea70411-51df-61ba-d2f0-cb2b5972c210, Content: Sentence 0 from document 1
Chunk ID: b678a463-7b84-92b8-abb2-27e9a1977e3c, Content: Sentence 1 from document 1
Chunk ID: 27bd63da-909c-1606-a109-75bdb9479882, Content: Sentence 2 from document 1
Chunk ID: a2ad49ad-f9be-5372-e0c7-7b0221d0b53e, Content: Sentence 3 from document 1
Chunk ID: cac53bcd-1965-082a-c0f4-ceee7323fc70, Content: Sentence 4 from document 1
```

Query results:
Result 1: Sentence 5 from document 0
Result 2: Sentence 5 from document 1
Result 3: Sentence 5 from document 2

[//]: # (## Documentation)

---------

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
2025-05-21 15:24:24 -04:00
Francisco Arceo
8e7ab146f8
feat: Adding support for customizing chunk context in RAG insertion and querying (#2134)
# What does this PR do?
his PR allows users to customize the template used for chunks when
inserted into the context. Additionally, this enables metadata injection
into the context of an LLM for RAG. This makes a naive and crude
assumption that each chunk should include the metadata, this is
obviously redundant when multiple chunks are returned from the same
document. In order to remove any sort of duplication of chunks, we'd
have to make much more significant changes so this is a reasonable first
step that unblocks users requesting this enhancement in
https://github.com/meta-llama/llama-stack/issues/1767.

In the future, this can be extended to support citations.


List of Changes:
- `llama_stack/apis/tools/rag_tool.py`
    - Added  `chunk_template` field in `RAGQueryConfig`.
- Added `field_validator` to validate the `chunk_template` field in
`RAGQueryConfig`.
- Ensured the `chunk_template` field includes placeholders `{index}` and
`{chunk.content}`.
- Updated the `query` method to use the `chunk_template` for formatting
chunk text content.
- `llama_stack/providers/inline/tool_runtime/rag/memory.py`
- Modified the `insert` method to pass `doc.metadata` for chunk
creation.
- Enhanced the `query` method to format results using `chunk_template`
and exclude unnecessary metadata fields like `token_count`.
- `llama_stack/providers/utils/memory/vector_store.py`
- Updated `make_overlapped_chunks` to include metadata serialization and
token count for both content and metadata.
    - Added error handling for metadata serialization issues.
- `pyproject.toml`
- Added `pydantic.field_validator` as a recognized `classmethod`
decorator in the linting configuration.
- `tests/integration/tool_runtime/test_rag_tool.py`
- Refactored test assertions to separate `assert_valid_chunk_response`
and `assert_valid_text_response`.
- Added integration tests to validate `chunk_template` functionality
with and without metadata inclusion.
- Included a test case to ensure `chunk_template` validation errors are
raised appropriately.
- `tests/unit/rag/test_vector_store.py`
- Added unit tests for `make_overlapped_chunks`, verifying chunk
creation with overlapping tokens and metadata integrity.
- Added tests to handle metadata serialization errors, ensuring proper
exception handling.
- `docs/_static/llama-stack-spec.html`
- Added a new `chunk_template` field of type `string` with a default
template for formatting retrieved chunks in RAGQueryConfig.
    - Updated the `required` fields to include `chunk_template`.
- `docs/_static/llama-stack-spec.yaml`
- Introduced `chunk_template` field with a default value for
RAGQueryConfig.
- Updated the required configuration list to include `chunk_template`.
- `docs/source/building_applications/rag.md`
- Documented the `chunk_template` configuration, explaining how to
customize metadata formatting in RAG queries.
- Added examples demonstrating the usage of the `chunk_template` field
in RAG tool queries.
    - Highlighted default values for `RAG` agent configurations.

# Resolves https://github.com/meta-llama/llama-stack/issues/1767

## Test Plan
Updated both `test_vector_store.py` and `test_rag_tool.py` and tested
end-to-end with a script.

I also tested the quickstart to enable this and specified this metadata:
```python
document = RAGDocument(
    document_id="document_1",
    content=source,
    mime_type="text/html",
    metadata={"author": "Paul Graham", "title": "How to do great work"},
)
```
Which produced the output below: 

![Screenshot 2025-05-13 at 10 53
43 PM](https://github.com/user-attachments/assets/bb199d04-501e-4217-9c44-4699d43d5519)

This highlights the usefulness of the additional metadata. Notice how
the metadata is redundant for different chunks of the same document. I
think we can update that in a subsequent PR.

# Documentation
I've added a brief comment about this in the documentation to outline
this to users and updated the API documentation.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-05-14 21:56:20 -04:00
Kevin Postlethwait
a57985eeac
fix: add check for interleavedContent (#1973)
# What does this PR do?
Checks for RAGDocument of type InterleavedContent

I noticed when stepping through the code that the supported types for
`RAGDocument` included `InterleavedContent` as a content type. This type
is not checked against before putting the `doc.content` is regex matched
against. This would cause a runtime error. This change adds an explicit
check for type.

The only other part that I'm unclear on is how to handle the
`ImageContent` type since this would always just return `<image>` which
seems like an undesired behavior. Should the `InterleavedContent` type
be removed from `RAGDocument` and replaced with `URI | str`?

## Test Plan


[//]: # (## Documentation)

---------

Signed-off-by: Kevin <kpostlet@redhat.com>
2025-05-06 09:55:07 -07:00
Ihar Hrachyshka
9e6561a1ec
chore: enable pyupgrade fixes (#1806)
# What does this PR do?

The goal of this PR is code base modernization.

Schema reflection code needed a minor adjustment to handle UnionTypes
and collections.abc.AsyncIterator. (Both are preferred for latest Python
releases.)

Note to reviewers: almost all changes here are automatically generated
by pyupgrade. Some additional unused imports were cleaned up. The only
change worth of note can be found under `docs/openapi_generator` and
`llama_stack/strong_typing/schema.py` where reflection code was updated
to deal with "newer" types.

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-05-01 14:23:50 -07:00
Ihar Hrachyshka
8234cdf1a5
fix(deps): move chardet and pypdf imports inline where used (#1434)
# What does this PR do?

Fix import errors due to `chardet` and `pypdf` not being installed while
imported from `url_utils.py`.

Closes #1432

## Test Plan

Now able to run the server with the config.

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-03-06 17:09:14 -08:00
Ashwin Bharambe
8bbd52bb9f
chore: remove dependency on llama_models completely (#1344) 2025-03-01 12:48:08 -08:00
Sébastien Han
e4a1579e63
build: format codebase imports using ruff linter (#1028)
# What does this PR do?

- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-13 10:06:21 -08:00
Hardik Shah
a84e7669f0
feat: Add a new template for dell (#978)
- Added new template `dell` and its documentation 
- Update docs 
- [minor] uv fix i came across 
- codegen for all templates 

Tested with 

```bash
export INFERENCE_PORT=8181
export DEH_URL=http://0.0.0.0:$INFERENCE_PORT
export INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
export CHROMADB_HOST=localhost
export CHROMADB_PORT=6601
export CHROMA_URL=[http://$CHROMADB_HOST:$CHROMADB_PORT](about:blank)
export CUDA_VISIBLE_DEVICES=0
export LLAMA_STACK_PORT=8321

# build the stack template 
llama stack build --template=dell 

# start the TGI inference server 
podman run --rm -it --network host -v $HOME/.cache/huggingface:/data -e HF_TOKEN=$HF_TOKEN -p $INFERENCE_PORT:$INFERENCE_PORT --gpus $CUDA_VISIBLE_DEVICES [ghcr.io/huggingface/text-generation-inference](http://ghcr.io/huggingface/text-generation-inference) --dtype bfloat16 --usage-stats off --sharded false --cuda-memory-fraction 0.7 --model-id $INFERENCE_MODEL --port $INFERENCE_PORT --hostname 0.0.0.0

# start chroma-db for vector-io ( aka RAG )
podman run --rm -it --network host --name chromadb -v .:/chroma/chroma -e IS_PERSISTENT=TRUE chromadb/chroma:latest --port $CHROMADB_PORT --host $(hostname)

# build docker 
llama stack build --template=dell --image-type=container

# run llama stack server ( via docker )
podman run -it \
--network host \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes 
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \ localhost/distribution-dell:dev \
--port $LLAMA_STACK_PORT  \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL

# test the server 
cd <PATH_TO_LLAMA_STACK_REPO>
LLAMA_STACK_BASE_URL=http://0.0.0.0:$LLAMA_STACK_PORT pytest -s -v tests/client-sdk/agents/test_agents.py

```

---------

Co-authored-by: Hardik Shah <hjshah@fb.com>
2025-02-06 14:14:39 -08:00
Yuan Tang
34ab7a3b6c
Fix precommit check after moving to ruff (#927)
Lint check in main branch is failing. This fixes the lint check after we
moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We
need to move to a `ruff.toml` file as well as fixing and ignoring some
additional checks.

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-02 06:46:45 -08:00
Ashwin Bharambe
1a7490470a
[memory refactor][3/n] Introduce RAGToolRuntime as a specialized sub-protocol (#832)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Third part:
- we need to make `tool_runtime.rag_tool.query_context()` and
`tool_runtime.rag_tool.insert_documents()` methods work smoothly with
complete type safety. To that end, we introduce a sub-resource path
`tool-runtime/rag-tool/` and make changes to the resolver to make things
work.
- the PR updates the agents implementation to directly call these typed
APIs for memory accesses rather than going through the complex, untyped
"invoke_tool" API. the code looks much nicer and simpler (expectedly.)
- there are a number of hacks in the server resolver implementation
still, we will live with some and fix some

Note that we must make sure the client SDKs are able to handle this
subresource complexity also. Stainless has support for subresources, so
this should be possible but beware.

## Test Plan

Our RAG test is sad (doesn't actually test for actual RAG output) but I
verified that the implementation works. I will work on fixing the RAG
test afterwards.

```bash
pytest -s -v tests/agents/test_agents.py -k "rag and together" --safety-shield=meta-llama/Llama-Guard-3-8B
```
2025-01-22 10:04:16 -08:00
Ashwin Bharambe
78a481bb22
[memory refactor][2/n] Update faiss and make it pass tests (#830)
See https://github.com/meta-llama/llama-stack/issues/827 for the broader
design.

Second part:

- updates routing table / router code 
- updates the faiss implementation


## Test Plan

```
pytest -s -v -k sentence test_vector_io.py --env EMBEDDING_DIMENSION=384
```
2025-01-22 10:02:15 -08:00
Xi Yan
3c72c034e6
[remove import *] clean up import *'s (#689)
# What does this PR do?

- as title, cleaning up `import *`'s
- upgrade tests to make them more robust to bad model outputs
- remove import *'s in llama_stack/apis/* (skip __init__ modules)
<img width="465" alt="image"
src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2"
/>

- run `sh run_openapi_generator.sh`, no types gets affected

## Test Plan

### Providers Tests

**agents**
```
pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8
```

**inference**
```bash
# meta-reference
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

# together
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py
pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py

pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py 
```

**safety**
```
pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B
```

**memory**
```
pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384
```

**scoring**
```
pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py
```


**datasetio**
```
pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py
pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py
```


**eval**
```
pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py
```

### Client-SDK Tests
```
LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk
```

### llama-stack-apps
```
PORT=5000
LOCALHOST=localhost

python -m examples.agents.hello $LOCALHOST $PORT
python -m examples.agents.inflation $LOCALHOST $PORT
python -m examples.agents.podcast_transcript $LOCALHOST $PORT
python -m examples.agents.rag_as_attachments $LOCALHOST $PORT
python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT
python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT
python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT

# Vision model
python -m examples.interior_design_assistant.app
python -m examples.agent_store.app $LOCALHOST $PORT
```

### CLI
```
which llama
llama model prompt-format -m Llama3.2-11B-Vision-Instruct
llama model list
llama stack list-apis
llama stack list-providers inference

llama stack build --template ollama --image-type conda
```

### Distributions Tests
**ollama**
```
llama stack build --template ollama --image-type conda
ollama run llama3.2:1b-instruct-fp16
llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct
```

**fireworks**
```
llama stack build --template fireworks --image-type conda
llama stack run ./llama_stack/templates/fireworks/run.yaml
```

**together**
```
llama stack build --template together --image-type conda
llama stack run ./llama_stack/templates/together/run.yaml
```

**tgi**
```
llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-12-27 15:45:44 -08:00
Ashwin Bharambe
8de8eb03c8
Update the "InterleavedTextMedia" type (#635)
## What does this PR do?

This is a long-pending change and particularly important to get done
now.

Specifically:
- we cannot "localize" (aka download) any URLs from media attachments
anywhere near our modeling code. it must be done within llama-stack.
- `PIL.Image` is infesting all our APIs via `ImageMedia ->
InterleavedTextMedia` and that cannot be right at all. Anything in the
API surface must be "naturally serializable". We need a standard `{
type: "image", image_url: "<...>" }` which is more extensible
- `UserMessage`, `SystemMessage`, etc. are moved completely to
llama-stack from the llama-models repository.

See https://github.com/meta-llama/llama-models/pull/244 for the
corresponding PR in llama-models.

## Test Plan

```bash
cd llama_stack/providers/tests

pytest -s -v -k "fireworks or ollama or together" inference/test_vision_inference.py
pytest -s -v -k "(fireworks or ollama or together) and llama_3b" inference/test_text_inference.py
pytest -s -v -k chroma memory/test_memory.py \
  --env EMBEDDING_DIMENSION=384 --env CHROMA_DB_PATH=/tmp/foobar

pytest -s -v -k fireworks agents/test_agents.py  \
   --safety-shield=meta-llama/Llama-Guard-3-8B \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct
```

Updated the client sdk (see PR ...), installed the SDK in the same
environment and then ran the SDK tests:

```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=together pytest -s -v agents/test_agents.py
LLAMA_STACK_CONFIG=ollama pytest -s -v memory/test_memory.py

# this one needed a bit of hacking in the run.yaml to ensure I could register the vision model correctly
INFERENCE_MODEL=llama3.2-vision:latest LLAMA_STACK_CONFIG=ollama pytest -s -v inference/test_inference.py
```
2024-12-17 11:18:31 -08:00
Dinesh Yeduguru
96e158eaac
Make embedding generation go through inference (#606)
This PR does the following:
1) adds the ability to generate embeddings in all supported inference
providers.
2) Moves all the memory providers to use the inference API and improved
the memory tests to setup the inference stack correctly and use the
embedding models

This is a merge from #589 and #598
2024-12-12 11:47:50 -08:00
Aidan Do
1c03ba239e
[#342] RAG - fix PDF format in vector database (#551)
# What does this PR do?

Addresses issue (#342)

- PDFs uploaded from url are being loaded into vector db as raw bytes
- Instead this PR extracts text from PDF if mime_type is
"application/json"
- Adds tests to cover new cases

## Test Plan

Ran these unit tests:

```bash
llama stack build --template meta-reference-gpu --image-type conda
conda activate llamastack-meta-reference-gpu
pip install pytest pytest-asyncio pypdf
pytest llama_stack/providers/tests/memory/test_vector_store.py -v
```

```
platform linux -- Python 3.10.15, pytest-8.3.3, pluggy-1.5.0 -- /home/ubuntu/1xa100-2/llama-stack/envs/bin/python
cachedir: .pytest_cache
rootdir: /home/ubuntu/1xa100-2/llama-stack
configfile: pyproject.toml
plugins: anyio-4.6.2.post1, asyncio-0.24.0, httpx-0.35.0
asyncio: mode=strict, default_loop_scope=None
collected 3 items                                                                                                                          

llama_stack/providers/tests/memory/test_vector_store.py::TestVectorStore::test_returns_content_from_pdf_data_uri PASSED              [ 33%]
llama_stack/providers/tests/memory/test_vector_store.py::TestVectorStore::test_downloads_pdf_and_returns_content PASSED              [ 66%]
llama_stack/providers/tests/memory/test_vector_store.py::TestVectorStore::test_downloads_pdf_and_returns_content_with_url_object PASSED [100%]

======================================================= 3 passed, 1 warning in 0.62s =======================================================
```

Tested manually via [this
script](afc8f8bebf/init.py)
to initialize and [this
script](afc8f8bebf/query.py)
to query

```bash
# Ran with meta-reference-gpu with safety
llama stack build --template meta-reference-gpu --image-type conda && llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \
  --port 5001 \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-11B-Vision-Instruct

# Run init.py script
wget https://raw.githubusercontent.com/aidando73/llama-stack/afc8f8bebf70e1ad065d87e84692e1a3a45d9e19/init.py
pip install httpx==0.27.2 # Due to issue https://github.com/meta-llama/llama-stack-client-python/issues/54
python init.py
# Run query.py script
wget https://raw.githubusercontent.com/aidando73/llama-stack/afc8f8bebf70e1ad065d87e84692e1a3a45d9e19/query.py
python query.py
```

Should output valid text chunks

```
Chunk(content=' that it has a significantly\nlower violation rate than the competing standalone open source model, trading off a higher false refusal rate.\nLong-context safety. Long-context models are vulnerable to many-shot jailbreaking attacks without targeted\nmitigation (Anil et al., 2024). To address this, we finetune our models on SFT datasets that include examples\nof safe behavior in the presence of demonstrations of unsafe behavior in context. We develop a scalable\nmitigation strategy that significantly reduces VR, effectively neutralizing the impact of longer context attacks\neven for 256-shot attacks. This approach shows little to no impact on FRR and most helpfulness metrics.\nTo quantify the effectiveness of our long context safety mitigations, we use two additional benchmarking\nmethods: DocQA and Many-shot. For DocQA, short for “document question answering,” we use long documents\nwith information that could be utilized in adversarial ways. Models are provided both the document and a set\nof prompts related to the document in order to test whether the questions being related to information in the\ndocument affected the model’s ability to respond safely to the prompts. For Many-shot, following Anil et al.\n(2024), we construct a synthetic chat history composed of unsafe prompt-response pairs. A final prompt,\nunrelated to previous messages, is used to test whether the unsafe behavior in-context influenced the model\n45\nto response unsafely. The violation and false refusal rates for both DocQA and Many-shot are shown in\nFigure 20. We see that Llama 405B (with and without Llama Guard) is Pareto-better than the Comp. 2\nsystem across both violation rates and false refusal rates, across both DocQA and Many-shot. Relative to\nComp. 1, we find that Llama 405B is significantly safer, while coming at a trade off on false refusal.\nTool usage safety. The diversity of possible tools and the implementation of the tool usage call and integration\ninto the model make tool usage a challenging capability to fully mitigate (Wallace et al., 2024). We focus on\nthe search usecase. Violation and false refusal rates are shown in Figure 20. We tested against the Comp. 1\nsystem, where we find that Llama 405B is significantly safer, though has a slightly higher false refusal rate.\n5.4.5 Cybersecurity and Chemical/Biological Weapons Safety\nCyberSecurity evaluation results. To evaluate cybersecurity risk, we leverage the Cyber', document_id='num-0', token_count=512)0.7354530813978312
Chunk(content='.\nThrough careful ablations, we observe that mixing0.1% of synthetically generated long-context data with the\noriginal short-context data optimizes the performance across both short-context and long-context benchmarks.\nDPO. We observe that using only short context training data in DPO did not negatively impact long-context\nperformance as long as the SFT model is high quality in long context tasks. We suspect this is due to the\nfact that our DPO recipe has fewer optimizer steps than SFT. Given this finding, we keep the standard\nshort-context recipe for DPO on top of our long-context SFT checkpoints.\n4.3.5 Tool Use\nTeaching LLMs to use tools such as search engines or code interpreters hugely expands the range of tasks\nthey can solve, transforming them from pure chat models into more general assistants (Nakano et al., 2021;\nThoppilan et al., 2022; Parisi et al., 2022; Gao et al., 2023; Mialon et al., 2023a; Schick et al., 2024). We train\nLlama 3 to interact with the following tools:\n• Search engine. Llama 3 is trained to use Brave Search7 to answer questions about recent events that go\nbeyond its knowledge cutoff or that require retrieving a particular piece of information from the web.\n• Python interpreter. Llama 3 can generate and execute code to perform complex computations, read files\nuploaded by the user and solve tasks based on them such as question answering, summarization, data\nanalysis or visualization.\n7https://brave.com/search/api/\n24\n• Mathematical computational engine. Llama 3 can use the Wolfram Alpha API8 to more accurately solve\nmath, science problems, or retrieve accurate information from Wolfram’s database.\nThe resulting model is able to use these tools in a chat setup to solve the user’s queries, including in multi-turn\ndialogs. If a query requires multiple tool calls, the model can write a step-by-step plan, call the tools in\nsequence, and do reasoning after each tool call.\nWe also improve Llama 3’s zero-shot tool use capabilities — given in-context, potentially unseen tool definitions\nand a user query, we train the model to generate the correct tool call.\nImplementation. We implement our core tools as Python objects with different methods. Zero-shot tools can\nbe implemented as Python functions with descriptions, documentation (i.e., examples for', document_id='num-0', token_count=512)0.7350672465928054
Chunk(content=' Embeddings RoPE (θ = 500, 000)\nTable 3 Overview of the key hyperparameters of Llama 3. We display settings for 8B, 70B, and 405B language models.\n• We use a vocabulary with 128K tokens. Our token vocabulary combines 100K tokens from thetiktoken3\ntokenizer with 28K additional tokens to better support non-English languages. Compared to the Llama\n2 tokenizer, our new tokenizer improves compression rates on a sample of English data from 3.17 to\n3.94 characters per token. This enables the model to “read” more text for the same amount of training\ncompute. We also found that adding 28K tokens from select non-English languages improved both\ncompression ratios and downstream performance, with no impact on English tokenization.\n• We increase the RoPE base frequency hyperparameter to 500,000. This enables us to better support\nlonger contexts; Xiong et al. (2023) showed this value to be effective for context lengths up to 32,768.\nLlama 3 405B uses an architecture with 126 layers, a token representation dimension of 16,384, and 128\nattention heads; see Table 3 for details. This leads to a model size that is approximately compute-optimal\naccording to scaling laws on our data for our training budget of3.8 × 1025 FLOPs.\n3.2.1 Scaling Laws\nWe develop scaling laws (Hoffmann et al., 2022; Kaplan et al., 2020) to determine the optimal model size for\nour flagship model given our pre-training compute budget. In addition to determining the optimal model size,\na major challenge is to forecast the flagship model’s performance on downstream benchmark tasks, due to a\ncouple of issues: (1) Existing scaling laws typically predict only next-token prediction loss rather than specific\nbenchmark performance. (2) Scaling laws can be noisy and unreliable because they are developed based on\npre-training runs conducted with small compute budgets (Wei et al., 2022b).\nTo address these challenges, we implement a two-stage methodology to develop scaling laws that accurately\npredict downstream benchmark performance:\n1. We first establish a correlation between the compute-optimal model’s negative log-likelihood on down-\nstream tasks and the training FLOPs.\n2. Next, we correlate the negative log-likelihood on downstream tasks with task accuracy, utilizing both', document_id='num-0', token_count=512)0.7172908346230037
```

## Before submitting

- [x] N/A - This PR fixes a typo or improves the docs (you can dismiss
the other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] N/A - Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.
2024-12-10 21:33:27 -08:00