llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-17 14:09:48 +00:00

Author	SHA1	Message	Date
Xi Yan	e245f459bb	requirements	2024-12-03 16:05:01 -08:00
Xi Yan	114595ce71	navigation	2024-12-02 20:11:32 -08:00
Xi Yan	06b9566eb6	more msg	2024-12-02 16:14:04 -08:00
Xi Yan	0e718b9712	native eval	2024-12-02 15:49:34 -08:00
Xi Yan	b59810cd9a	native eval	2024-12-02 15:38:58 -08:00
Xi Yan	de2ab1243a	native eval	2024-12-02 14:36:17 -08:00
Xi Yan	2f7e39fb10	fix	2024-12-02 13:20:23 -08:00
Xi Yan	6bdad37372	readme	2024-12-02 13:14:15 -08:00
Xi Yan	3335bcd83d	cleanup	2024-12-02 13:12:44 -08:00
Xi Yan	7f2ed9622c	cleanup	2024-12-02 13:06:36 -08:00
Xi Yan	9bb6c1346b	rag page	2024-11-27 16:56:57 -08:00
Xi Yan	2ecbbd92ed	rag page	2024-11-27 16:52:07 -08:00
Xi Yan	5d9faca81b	distribution inspect	2024-11-27 16:03:58 -08:00
Xi Yan	73335e4aaf	playground	2024-11-27 15:31:57 -08:00
Xi Yan	68b70d1b1f	playground	2024-11-27 15:27:10 -08:00
Xi Yan	c544e4b015	chat playground	2024-11-27 15:11:27 -08:00
Xi Yan	371259ca5b	readme	2024-11-26 22:02:29 -08:00
Xi Yan	8840cf1d9a	readme	2024-11-26 20:16:39 -08:00
Xi Yan	2c8a7a972c	rename playground-ui -> ui	2024-11-26 20:15:41 -08:00
Xi Yan	d467638f26	move playground ui to llama-stack repo	2024-11-26 19:57:00 -08:00
Xi Yan	c2cfd2261e	move playground ui to llama-stack repo	2024-11-26 19:54:24 -08:00
Xi Yan	50cc165077	fixes tests & move braintrust api_keys to request headers (#535 ) # What does this PR do? - braintrust scoring provider requires OPENAI_API_KEY env variable to be set - move this to be able to be set as request headers (e.g. like together / fireworks api keys) - fixes pytest with agents dependency ## Test Plan E2E ``` llama stack run ``` ```yaml scoring: - provider_id: braintrust-0 provider_type: inline::braintrust config: {} ``` Client ```python self.client = LlamaStackClient( base_url=os.environ.get("LLAMA_STACK_ENDPOINT", "http://localhost:5000"), provider_data={ "openai_api_key": os.environ.get("OPENAI_API_KEY", ""), }, ) ``` - run `llama-stack-client eval run_scoring` Unit Test ``` pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py ``` ``` pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py --env OPENAI_API_KEY=$OPENAI_API_KEY ``` <img width="745" alt="image" src="https://github.com/user-attachments/assets/68f5cdda-f6c8-496d-8b4f-1b3dabeca9c2"> ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-26 13:11:21 -08:00
Ashwin Bharambe	34be07e0df	Ensure model_local_dir does not mangle "C:\" on Windows	2024-11-24 14:18:59 -08:00
Dinesh Yeduguru	501e7c9d64	Fix opentelemetry adapter (#510 ) # What does this PR do? This PR fixes some of the issues with our telemetry setup to enable logs to be delivered to opentelemetry and jaeger. Main fixes 1) Updates the open telemetry provider to use the latest oltp exports instead of deprected ones. 2) Adds a tracing middleware, which injects traces into each HTTP request that the server recieves and this is going to be the root trace. Previously, we did this in the create_dynamic_route method, which is actually not the actual exectuion flow, but more of a config and this causes the traces to end prematurely. Through middleware, we plugin the trace start and end at the right location. 3) We manage our own methods to create traces and spans and this does not fit well with Opentelemetry SDK since it does not support provide a way to take in traces and spans that are already created. it expects us to use the SDK to create them. For now, I have a hacky approach of just maintaining a map from our internal telemetry objects to the open telemetry specfic ones. This is not the ideal solution. I will explore other ways to get around this issue. for now, to have something that works, i am going to keep this as is. Addresses: #509	2024-11-22 18:18:11 -08:00
dltn	eaf4fbef75	another print -> log fix	2024-11-22 13:35:34 -08:00
dltn	302a0145e5	we do want prints in print_pip_install_help	2024-11-22 13:32:54 -08:00
Dinesh Yeduguru	6395dadc2b	use logging instead of prints (#499 ) # What does this PR do? This PR moves all print statements to use logging. Things changed: - Had to add `await start_trace("sse_generator")` to server.py to actually get tracing working. else was not seeing any logs - If no telemetry provider is provided in the run.yaml, we will write to stdout - by default, the logs are going to be in JSON, but we expose an option to configure to output in a human readable way.	2024-11-21 11:32:53 -08:00
Ashwin Bharambe	681322731b	Make run yaml optional so dockers can start with just --env (#492 ) When running with dockers, the idea is that users be able to work purely with the `llama stack` CLI. They should not need to know about the existence of any YAMLs unless they need to. This PR enables it. The docker command now doesn't need to volume mount a yaml and can simply be: ```bash docker run -v ~/.llama/:/root/.llama \ --env A=a --env B=b ``` ## Test Plan Check with conda first (no regressions): ```bash LLAMA_STACK_DIR=. llama stack build --template ollama llama stack run ollama --port 5001 # server starts up correctly ``` Check with docker ```bash # build the docker LLAMA_STACK_DIR=. llama stack build --template ollama --image-type docker export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" docker run -it -p 5001:5001 \ -v ~/.llama:/root/.llama \ -v $PWD:/app/llama-stack-source \ localhost/distribution-ollama:dev \ --port 5001 \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env OLLAMA_URL=http://host.docker.internal:11434 ``` Note that volume mounting to `/app/llama-stack-source` is only needed because we built the docker with uncommitted source code.	2024-11-20 13:11:40 -08:00
Dinesh Yeduguru	1d8d0593af	register with provider even if present in stack (#491 ) # What does this PR do? Remove a check which skips provider registration if a resource is already in stack registry. Since we do not reconcile state with provider, register should always call into provider's register endpoint. ## Test Plan ``` # stack run ╰─❯ llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml #register memory bank ❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0 Memory Bank Configuration: { │ 'memory_bank_type': 'vector', │ 'chunk_size_in_tokens': 512, │ 'embedding_model': 'all-MiniLM-L6-v2', │ 'overlap_size_in_tokens': 64 } #register again ❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0 Memory Bank Configuration: { │ 'memory_bank_type': 'vector', │ 'chunk_size_in_tokens': 512, │ 'embedding_model': 'all-MiniLM-L6-v2', │ 'overlap_size_in_tokens': 64 } ```	2024-11-20 11:05:50 -08:00
Ashwin Bharambe	e605d57fb7	use API version in "remote" stack client	2024-11-19 15:59:47 -08:00
Ashwin Bharambe	887ccc2143	Ensure llama-stack-client is installed in the container with TEST_PYPI	2024-11-19 15:21:10 -08:00
Ashwin Bharambe	394519d68a	Add llama-stack-client as a legitimate dependency for llama-stack	2024-11-19 11:44:35 -08:00
Ashwin Bharambe	c46b462c22	Updates to docker build script	2024-11-19 11:36:53 -08:00
Ashwin Bharambe	5e4ac1b7c1	Make sure server code uses version prefixed routes	2024-11-19 09:15:05 -08:00
Ashwin Bharambe	0dc7f5fa89	Add version to REST API url (#478 ) # What does this PR do? Adds a `/alpha/` prefix to all the REST API urls. Also makes them all use hyphens instead of underscores as is more standard practice. (This is based on feedback from our partners.) ## Test Plan The Stack itself does not need updating. However, client SDKs and documentation will need to be updated.	2024-11-18 22:44:14 -08:00
Dinesh Yeduguru	fe19076838	get stack run config based on template name (#477 ) This PR adds a method in stack to return the stackrunconfig object based on the template name. This will be used to instantiate a direct client without the need for an explicit run.yaml --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-18 18:05:05 -08:00
Ashwin Bharambe	91f3009c67	No more built_at	2024-11-18 16:38:51 -08:00
Ashwin Bharambe	fb15ff4a97	Move to use argparse, fix issues with multiple --env cmdline options	2024-11-18 16:31:59 -08:00
Ashwin Bharambe	b87f3ac499	Allow server to accept --env key pairs	2024-11-18 16:17:59 -08:00
Ashwin Bharambe	b822149098	Update start conda	2024-11-18 16:07:27 -08:00
Ashwin Bharambe	47c37fd831	Fixes	2024-11-18 16:03:53 -08:00
Ashwin Bharambe	2a31163178	Auto-generate distro yamls + docs (#468 ) # What does this PR do? Automatically generates - build.yaml - run.yaml - run-with-safety.yaml - parts of markdown docs for the distributions. ## Test Plan At this point, this only updates the YAMLs and the docs. Some testing (especially with ollama and vllm) has been performed but needs to be much more tested.	2024-11-18 14:57:06 -08:00
Ashwin Bharambe	20bf2f50c2	No more model_id warnings	2024-11-15 12:20:18 -08:00
Dinesh Yeduguru	0850ad656a	unregister for memory banks and remove update API (#458 ) The semantics of an Update on resources is very tricky to reason about especially for memory banks and models. The best way to go forward here is for the user to unregister and register a new resource. We don't have a compelling reason to support update APIs. Tests: pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "chroma" --env CHROMA_HOST=localhost --env CHROMA_PORT=8000 pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "pgvector" --env PGVECTOR_DB=postgres --env PGVECTOR_USER=postgres --env PGVECTOR_PASSWORD=mysecretpassword --env PGVECTOR_HOST=0.0.0.0 $CONDA_PREFIX/bin/pytest -v -s -m "ollama" llama_stack/providers/tests/inference/test_model_registration.py --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-14 17:12:11 -08:00
Dinesh Yeduguru	46f0b6606a	init registry once (#450 ) We are calling the initialize function on the registery in the common routing table impl, which is incorrect as the common routing table is the base class inherited by each resource's routing table. this change moves remove that and add the initialize to the creation, where it inits once server run. Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-13 22:20:57 -08:00
Dinesh Yeduguru	efe791bab7	Support model resource updates and deletes (#452 ) # What does this PR do? * Changes the registry to store only one RoutableObject per identifier. Before it was a list, which is not really required. * Adds impl for updates and deletes * Updates routing table to handle updates correctly ## Test Plan ``` ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ ❯ llama-stack-client models register dineshyv-model --provider-model-id=fireworks/llama-v3p1-70b-instruct Successfully registered model dineshyv-model ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| dineshyv-model \| fireworks-0 \| fireworks/llama-v3p1-70b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ ❯ llama-stack-client models update dineshyv-model --provider-model-id=fireworks/llama-v3p1-405b-instruct Successfully updated model dineshyv-model ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| dineshyv-model \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ llama-stack-client models delete dineshyv-model ❯ llama-stack-client models list +------------------------+---------------+------------------------------------+------------+ \| identifier \| provider_id \| provider_resource_id \| metadata \| +========================+===============+====================================+============+ \| Llama3.1-405B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-405b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.1-8B-Instruct \| fireworks-0 \| fireworks/llama-v3p1-8b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ \| Llama3.2-3B-Instruct \| fireworks-0 \| fireworks/llama-v3p2-1b-instruct \| {} \| +------------------------+---------------+------------------------------------+------------+ ``` --------- Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-13 21:55:41 -08:00
Dinesh Yeduguru	e90ea1ab1e	make distribution registry thread safe and other fixes (#449 ) This PR makes the following changes: 1) Fixes the get_all and initialize impl to actually read the values returned from the range call to kvstore and not keys. 2) The start_key and end_key are fixed to correct perform the range query after the key format changes 3) Made the cache registry thread safe since there are multiple initializes called for each routing table. Tests: * Start stack * Register dataset * Kill stack * Bring stack up * dataset list ``` llama-stack-client datasets list +--------------+---------------+---------------------------------------------------------------------------------+---------+ \| identifier \| provider_id \| metadata \| type \| +==============+===============+=================================================================================+=========+ \| alpaca \| huggingface-0 \| {} \| dataset \| +--------------+---------------+---------------------------------------------------------------------------------+---------+ \| mmlu \| huggingface-0 \| {'path': 'llama-stack/evals', 'name': 'evals__mmlu__details', 'split': 'train'} \| dataset \| +--------------+---------------+---------------------------------------------------------------------------------+---------+ ``` Co-authored-by: Dinesh Yeduguru <dineshyv@fb.com>	2024-11-13 15:12:34 -08:00
Ashwin Bharambe	7f6ac2fbd7	allow seeing warnings with traces optionally	2024-11-13 12:27:19 -08:00
Ashwin Bharambe	96e7ef646f	add support for ${env.FOO_BAR} placeholders in run.yaml files (#439 ) # What does this PR do? We'd like our docker steps to require _ZERO EDITS_ to a YAML file in order to get going. This is often not possible because depending on the provider, we do need some configuration input from the user. Environment variables are the best way to obtain this information. This PR allows our run.yaml to contain `${env.FOO_BAR}` placeholders which can be replaced using `docker run -e FOO_BAR=baz` (and similar `docker compose` equivalent). ## Test Plan For remote-vllm, example `run.yaml` snippet looks like this: ```yaml providers: inference: # serves main inference model - provider_id: vllm-0 provider_type: remote::vllm config: # NOTE: replace with "localhost" if you are running in "host" network mode url: ${env.LLAMA_INFERENCE_VLLM_URL:http://host.docker.internal:5100/v1} max_tokens: ${env.MAX_TOKENS:4096} api_token: fake # serves safety llama_guard model - provider_id: vllm-1 provider_type: remote::vllm config: # NOTE: replace with "localhost" if you are running in "host" network mode url: ${env.LLAMA_SAFETY_VLLM_URL:http://host.docker.internal:5101/v1} max_tokens: ${env.MAX_TOKENS:4096} api_token: fake ``` `compose.yaml` snippet looks like this: ```yaml llamastack: depends_on: - vllm-0 - vllm-1 # image: llamastack/distribution-remote-vllm image: llamastack/distribution-remote-vllm:test-0.0.52rc3 volumes: - ~/.llama:/root/.llama - ~/local/llama-stack/distributions/remote-vllm/run.yaml:/root/llamastack-run-remote-vllm.yaml # network_mode: "host" environment: - LLAMA_INFERENCE_VLLM_URL=${LLAMA_INFERENCE_VLLM_URL:-http://host.docker.internal:5100/v1} - LLAMA_INFERENCE_MODEL=${LLAMA_INFERENCE_MODEL:-Llama3.1-8B-Instruct} - MAX_TOKENS=${MAX_TOKENS:-4096} - SQLITE_STORE_DIR=${SQLITE_STORE_DIR:-$HOME/.llama/distributions/remote-vllm} - LLAMA_SAFETY_VLLM_URL=${LLAMA_SAFETY_VLLM_URL:-http://host.docker.internal:5101/v1} - LLAMA_SAFETY_MODEL=${LLAMA_SAFETY_MODEL:-Llama-Guard-3-1B} ```	2024-11-13 11:25:58 -08:00
Xi Yan	94a6f57812	change schema -> dataset_schema for register_dataset api (#443 ) # What does this PR do? - API updates: change schema to dataset_schema for register_dataset for resolving pydantic naming conflict - Note: this OpenAPI update will be synced with llama-stack-client-python SDK. cc @dineshyv ## Test Plan ``` pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-13 11:17:46 -05:00

1 2 3 4

156 commits