llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-28 02:53:30 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	2e5bfcd42a	Update Telemetry API so OpenAPI generation can work (#640 ) We cannot use recursive types because not only does our OpenAPI generator not like them, even if it did, it is not easy for all client languages to automatically construct proper APIs (especially considering garbage collection) around them. For now, we can return a `Dict[str, SpanWithStatus]` instead of `SpanWithChildren` and rely on the client to reconstruct the tree. Also fixed a super subtle issue with the OpenAPI generation process (monkey-patching of json_schema_type wasn't working because of import reordering.)	2024-12-16 13:00:14 -08:00
Xi Yan	d97cfaa9d9	[docs] add openapi spec to docs (#508 ) # What does this PR do? - modify openapi generator to add coming soon tag for unimplemented api - sphinx-redocs extension for openapi spec to readthedocs page ## Test Plan https://github.com/user-attachments/assets/b4c7eebc-2361-4198-a987-dbfbcff914cf ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-22 17:54:32 -08:00
Ashwin Bharambe	8ed79ad0f3	Fix the pyopenapi generator avoid potential circular imports	2024-11-18 23:37:52 -08:00
Ashwin Bharambe	0dc7f5fa89	Add version to REST API url (#478 ) # What does this PR do? Adds a `/alpha/` prefix to all the REST API urls. Also makes them all use hyphens instead of underscores as is more standard practice. (This is based on feedback from our partners.) ## Test Plan The Stack itself does not need updating. However, client SDKs and documentation will need to be updated.	2024-11-18 22:44:14 -08:00
Ashwin Bharambe	bba6edd06b	Fix OpenAPI generation to have text/event-stream for streamable methods	2024-11-14 12:51:38 -08:00
Ashwin Bharambe	d9d271a684	Allow specifying resources in StackRunConfig (#425 ) # What does this PR do? This PR brings back the facility to not force registration of resources onto the user. This is not just annoying but actually not feasible sometimes. For example, you may have a Stack which boots up with private providers for inference for models A and B. There is no way for the user to actually know which model is being served by these providers now (to be able to register it.) How will this avoid the users needing to do registration? In a follow-up diff, I will make sure I update the sample run.yaml files so they list the models served by the distributions explicitly. So when users do `llama stack build --template <...>` and run it, their distributions come up with the right set of models they expect. For self-hosted distributions, it also allows us to have a place to explicit list the models that need to be served to make the "complete" stack (including safety, e.g.) ## Test Plan Started ollama locally with two lightweight models: Llama3.2-3B-Instruct and Llama-Guard-3-1B. Updated all the tests including agents. Here's the tests I ran so far: ```bash pytest -s -v -m "fireworks and llama_3b" test_text_inference.py::TestInference \ --env FIREWORKS_API_KEY=... pytest -s -v -m "ollama and llama_3b" test_text_inference.py::TestInference pytest -s -v -m ollama test_safety.py pytest -s -v -m faiss test_memory.py pytest -s -v -m ollama test_agents.py \ --inference-model=Llama3.2-3B-Instruct --safety-model=Llama-Guard-3-1B ``` Found a few bugs here and there pre-existing that these test runs fixed.	2024-11-12 10:58:49 -08:00
Ashwin Bharambe	47e7c2dc15	Fix openapi generator and regenerator OpenAPI types	2024-11-11 18:44:38 -08:00
Xi Yan	2b7d70ba86	[Evals API][11/n] huggingface dataset provider + mmlu scoring fn (#392 ) * wip * scoring fn api * eval api * eval task * evaluate api update * pre commit * unwrap context -> config * config field doc * typo * naming fix * separate benchmark / app eval * api name * rename * wip tests * wip * datasetio test * delete unused * fixture * scoring resolve * fix scoring register * scoring test pass * score batch * scoring fix * fix eval * test eval works * huggingface provider * datasetdef files * mmlu scoring fn * test wip * remove type ignore * api refactor * add default task_eval_id for routing * add eval_id for jobs * remove type ignore * huggingface provider * wip huggingface register * only keep 1 run_eval * fix optional * register task required * register task required * delete old tests * fix * mmlu loose * refactor * msg * fix tests * move benchmark task def to file * msg * gen openapi * openapi gen * move dataset to hf llamastack repo * remove todo * refactor * add register model to unit test * rename * register to client * delete preregistered dataset/eval task * comments * huggingface -> remote adapter * openapi gen	2024-11-11 14:49:50 -05:00
Ashwin Bharambe	37b330b4ef	add dynamic clients for all APIs (#348 ) * add dynamic clients for all APIs * fix openapi generator * inference + memory + agents tests now pass with "remote" providers * Add docstring which fixes openapi generator :/	2024-10-31 14:46:25 -07:00
Xi Yan	abdf7cddf3	[Evals API][4/n] evals with generation meta-reference impl (#303 ) * wip * dataset validation * test_scoring * cleanup * clean up test * comments * error checking * dataset client * test client: * datasetio client * clean up * basic scoring function works * scorer wip * equality scorer * score batch impl * score batch * update scoring test * refactor * validate scorer input * address comments * evals with generation * add all rows scores to ScoringResult * minor typing * bugfix * scoring function def rename * rebase name * refactor * address comments * Update iOS inference instructions for new quantization * Small updates to quantization config * Fix score threshold in faiss * Bump version to 0.0.45 * Handle both ipv6 and ipv4 interfaces together * update manifest for build templates * Update getting_started.md * chatcompletion & completion input type validation * inclusion->subsetof * error checking * scoring_function -> scoring_fn rename, scorer -> scoring_fn rename * address comments * [Evals API][5/n] fixes to generate openapi spec (#323) * generate openapi * typing comment, dataset -> dataset_id * remove custom type * sample eval run.yaml --------- Co-authored-by: Dalton Flanagan <6599399+dltn@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2024-10-25 13:12:39 -07:00
Dalton Flanagan	8c3010553f	Fix agents path in generate.py	2024-10-10 11:41:03 -04:00
Ashwin Bharambe	8d049000e3	Add an introspection "Api.inspect" API	2024-10-02 15:41:14 -07:00
Ashwin Bharambe	ec4fc800cc	[API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92 ) This is yet another of those large PRs (hopefully we will have less and less of them as things mature fast). This one introduces substantial improvements and some simplifications to the stack. Most important bits: * Agents reference implementation now has support for session / turn persistence. The default implementation uses sqlite but there's also support for using Redis. * We have re-architected the structure of the Stack APIs to allow for more flexible routing. The motivating use cases are: - routing model A to ollama and model B to a remote provider like Together - routing shield A to local impl while shield B to a remote provider like Bedrock - routing a vector memory bank to Weaviate while routing a keyvalue memory bank to Redis * Support for provider specific parameters to be passed from the clients. A client can pass data using `x_llamastack_provider_data` parameter which can be type-checked and provided to the Adapter implementations.	2024-09-23 14:22:22 -07:00
Ashwin Bharambe	942cb87a3c	remove apis/stack.py	2024-09-20 09:37:08 -07:00
Xi Yan	f3f5873e9e	regenerate openapi spec	2024-09-18 19:28:05 -07:00
Xi Yan	2c1ad10710	move openapi from rfcs->docs	2024-09-18 16:09:17 -07:00

16 commits