llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-08-13 13:19:57 +00:00

Author	SHA1	Message	Date
Connor Hack	8f60a3a55d	Clean up job names	2024-11-22 15:07:08 -08:00
Ashwin Bharambe	c2c53d0272	More doc cleanup	2024-11-22 14:37:22 -08:00
Connor Hack	cbd69d06c3	Clean up checkpoint directory setting	2024-11-22 14:22:31 -08:00
Ashwin Bharambe	900b0556e7	Much more documentation work, things are getting a bit consumable right now	2024-11-22 14:06:18 -08:00
Ashwin Bharambe	98e213e96c	More docs work	2024-11-22 14:06:18 -08:00
Ashwin Bharambe	eb2063bc3d	Updates to the main doc page	2024-11-22 14:06:18 -08:00
dltn	eaf4fbef75	another print -> log fix	2024-11-22 13:35:34 -08:00
dltn	302a0145e5	we do want prints in print_pip_install_help	2024-11-22 13:32:54 -08:00
Dalton Flanagan	b007b062f3	Fix `llama stack build` in 0.0.54 (#505 ) # What does this PR do? Safety provider `inline::meta-reference` is now deprecated. However, we * aren't checking / printing the deprecation message in `llama stack build` * make the deprecated (unusable) provider So I (1) added checking and (2) made `inline::llama-guard` the default ## Test Plan Before ``` Traceback (most recent call last): File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module> sys.exit(main()) File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main parser.run(args) File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run args.func(args) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command self._run_stack_build_command_from_build_config(build_config) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 305, in _run_stack_build_command_from_build_config self._generate_run_config(build_config, build_dir) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 226, in _generate_run_config config_type = instantiate_class_type( File "/home/dalton/all/llama-stack/llama_stack/distribution/utils/dynamic.py", line 12, in instantiate_class_type module = importlib.import_module(module_name) File "/home/dalton/.conda/envs/nov22/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked ModuleNotFoundError: No module named 'llama_stack.providers.inline.safety.meta_reference' ``` After ``` Traceback (most recent call last): File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module> sys.exit(main()) File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main parser.run(args) File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run args.func(args) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command self._run_stack_build_command_from_build_config(build_config) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 309, in _run_stack_build_command_from_build_config self._generate_run_config(build_config, build_dir) File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 228, in _generate_run_config raise InvalidProviderError(p.deprecation_error) llama_stack.distribution.resolver.InvalidProviderError: Provider `inline::meta-reference` for API `safety` does not work with the latest Llama Stack. - if you are using Llama Guard v3, please use the `inline::llama-guard` provider instead. - if you are using Prompt Guard, please use the `inline::prompt-guard` provider instead. - if you are using Code Scanner, please use the `inline::code-scanner` provider instead. ``` <img width="469" alt="Screenshot 2024-11-22 at 4 10 24 PM" src="https://github.com/user-attachments/assets/8c2e09fe-379a-4504-b246-7925f80a6ed6"> ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-11-22 16:23:44 -05:00
Connor Hack	d1d8f859e6	Update checkpointd directory setting	2024-11-22 12:51:34 -08:00
Connor Hack	7f5e0dd3db	Refactor test run to support shorthand model names	2024-11-22 12:30:13 -08:00
Connor Hack	9c07e0189a	Fix syntax error	2024-11-22 11:16:17 -08:00
Connor Hack	0e9ed3688d	Remove unnecessary env vars	2024-11-22 10:58:17 -08:00
Connor Hack	1481a67365	Test new provider name	2024-11-22 10:22:12 -08:00
Connor Hack	377896a4c5	Remove testing llama-stack RC	2024-11-22 09:46:14 -08:00
Connor Hack	143e91f23d	Add manual provider back for testing	2024-11-22 09:18:29 -08:00
Connor Hack	25e23a1dfe	Add debug statement for PROVIDER_ID	2024-11-22 08:56:53 -08:00
Connor Hack	496879795e	Dynamically change provider in tests	2024-11-22 07:22:04 -08:00
Chacksu	4136accf48	Merge branch 'meta-llama:main' into main	2024-11-21 19:49:53 -05:00
Connor Hack	046eec9793	Remove testing llama-stack RC	2024-11-21 16:35:00 -08:00
Ashwin Bharambe	2137b0af40	Bump version to 0.0.54	2024-11-21 16:28:30 -08:00
Ashwin Bharambe	c1025ebfdb	Delete some dead code	2024-11-21 15:20:06 -08:00
Ashwin Bharambe	a0a00f1345	Update telemetry to have TEXT be the default log format	2024-11-21 15:18:45 -08:00
Connor Hack	318c98807c	Pre-emptively test llama stack RC	2024-11-21 15:15:43 -08:00
Chacksu	94bfd9a1d1	Merge branch 'meta-llama:main' into main	2024-11-21 18:07:53 -05:00
Xi Yan	945db5dac2	fix logging	2024-11-21 15:02:57 -08:00
Ashwin Bharambe	d790be28b3	Don't skip meta-reference for the tests	2024-11-21 13:29:53 -08:00
Ashwin Bharambe	55c55b9f51	Update Quick Start significantly	2024-11-21 13:20:55 -08:00
Chacksu	19bc7e8942	Merge branch 'meta-llama:main' into main	2024-11-21 15:47:54 -05:00
Xi Yan	654722da7d	fix model id for llm_as_judge_405b	2024-11-21 11:34:49 -08:00
Dinesh Yeduguru	6395dadc2b	use logging instead of prints (#499 ) # What does this PR do? This PR moves all print statements to use logging. Things changed: - Had to add `await start_trace("sse_generator")` to server.py to actually get tracing working. else was not seeing any logs - If no telemetry provider is provided in the run.yaml, we will write to stdout - by default, the logs are going to be in JSON, but we expose an option to configure to output in a human readable way.	2024-11-21 11:32:53 -08:00
liyunlu0618	4e1105e563	Fix fp8 quantization script. (#500 ) # What does this PR do? Fix fp8 quantization script. ## Test Plan ``` sh run_quantize_checkpoint.sh localhost fp8 /home/yll/fp8_test/ /home/yll/fp8_test/quantized_2 /home/yll/fp8_test/tokenizer.model 1 1 ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Ran pre-commit to handle lint / formatting issues. - [x] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [x] Updated relevant documentation. - [x] Wrote necessary unit or integration tests. Co-authored-by: Yunlu Li <yll@meta.com>	2024-11-21 09:15:28 -08:00
Chacksu	09302347d3	Merge branch 'meta-llama:main' into main	2024-11-21 10:21:49 -05:00
Ashwin Bharambe	cf079a22a0	Plurals	2024-11-20 23:24:59 -08:00
Ashwin Bharambe	cd6ccb664c	Integrate distro docs into the restructured docs	2024-11-20 23:20:05 -08:00
Ashwin Bharambe	2411a44833	Update more distribution docs to be simpler and partially codegen'ed	2024-11-20 22:03:44 -08:00
Connor Hack	490c5fb730	Undo None check and temporarily move if model check before builder	2024-11-20 19:17:44 -08:00
Connor Hack	16ffe19a20	Account for if a permitted model is None	2024-11-20 18:48:59 -08:00
Chacksu	05f1041bfa	Merge branch 'meta-llama:main' into main	2024-11-20 19:21:20 -05:00
Ashwin Bharambe	e84d4436b5	Since we are pushing for HF repos, we should accept them in inference configs (#497 ) # What does this PR do? As the title says. ## Test Plan This needs `8752149f58` to also land. So the next package (0.0.54) will make this work properly. The test is: ```bash pytest -v -s -m "llama_3b and meta_reference" test_model_registration.py ```	2024-11-20 16:14:37 -08:00
Dinesh Yeduguru	b3f9e8b2f2	Restructure docs (#494 ) Rendered docs at: https://llama-stack.readthedocs.io/en/doc-simplify/	2024-11-20 15:54:47 -08:00
Chacksu	0ec4ddd179	Merge branch 'meta-llama:main' into main	2024-11-20 18:46:45 -05:00
Ashwin Bharambe	068ac00a3b	Don't depend on templates.py when print llama stack build messages (#496 )	2024-11-20 15:44:49 -08:00
Chacksu	a5acb59407	Merge branch 'meta-llama:main' into main	2024-11-20 18:30:01 -05:00
Connor Hack	2795731434	Update model name for mete-reference template	2024-11-20 14:40:37 -08:00
Ashwin Bharambe	00816cc8ef	make sure codegen doesn't cause spurious diffs for no reason	2024-11-20 13:56:30 -08:00
Chacksu	edfd92d81f	Merge branch 'meta-llama:main' into main	2024-11-20 16:12:38 -05:00
Ashwin Bharambe	681322731b	Make run yaml optional so dockers can start with just --env (#492 ) When running with dockers, the idea is that users be able to work purely with the `llama stack` CLI. They should not need to know about the existence of any YAMLs unless they need to. This PR enables it. The docker command now doesn't need to volume mount a yaml and can simply be: ```bash docker run -v ~/.llama/:/root/.llama \ --env A=a --env B=b ``` ## Test Plan Check with conda first (no regressions): ```bash LLAMA_STACK_DIR=. llama stack build --template ollama llama stack run ollama --port 5001 # server starts up correctly ``` Check with docker ```bash # build the docker LLAMA_STACK_DIR=. llama stack build --template ollama --image-type docker export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" docker run -it -p 5001:5001 \ -v ~/.llama:/root/.llama \ -v $PWD:/app/llama-stack-source \ localhost/distribution-ollama:dev \ --port 5001 \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env OLLAMA_URL=http://host.docker.internal:11434 ``` Note that volume mounting to `/app/llama-stack-source` is only needed because we built the docker with uncommitted source code.	2024-11-20 13:11:40 -08:00
Dinesh Yeduguru	1d8d0593af	register with provider even if present in stack (#491 ) # What does this PR do? Remove a check which skips provider registration if a resource is already in stack registry. Since we do not reconcile state with provider, register should always call into provider's register endpoint. ## Test Plan ``` # stack run ╰─❯ llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml #register memory bank ❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0 Memory Bank Configuration: { │ 'memory_bank_type': 'vector', │ 'chunk_size_in_tokens': 512, │ 'embedding_model': 'all-MiniLM-L6-v2', │ 'overlap_size_in_tokens': 64 } #register again ❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0 Memory Bank Configuration: { │ 'memory_bank_type': 'vector', │ 'chunk_size_in_tokens': 512, │ 'embedding_model': 'all-MiniLM-L6-v2', │ 'overlap_size_in_tokens': 64 } ```	2024-11-20 11:05:50 -08:00
Dinesh Yeduguru	91e7efbc91	fall to back to read from chroma/pgvector when not in cache (#489 ) # What does this PR do? The chroma provider maintains a cache but does not sync up with chroma on a cold start. this change adds a fallback to read from chroma on a cache miss. ## Test Plan ```bash #start stack llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml # Add documents PYTHONPATH=. python -m examples.agents.rag_with_memory_bank localhost 5000 No available shields. Disable safety. Using model: Llama3.1-8B-Instruct Created session_id=b951b14f-a9d2-43a3-8b80-d80114d58322 for Agent(0687a251-6906-4081-8d4c-f52e19db9dd7) memory_retrieval> Retrieved context from banks: ['test_bank']. ==== Here are the retrieved documents for relevant context: === START-RETRIEVED-CONTEXT === id:num-1; content:_ the template from Llama2 to better support multiturn conversations. The same text in the Lla... > inference> Based on the retrieved documentation, the top 5 topics that were explained are: ............... # Kill stack # Bootup stack llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml # Run a RAG app with just the agent flow. it discovers the previously added documents No available shields. Disable safety. Using model: Llama3.1-8B-Instruct Created session_id=7a30c1a7-c87e-4787-936c-d0306589fe5d for Agent(b30420f3-c928-498a-887b-d084f0f3806c) memory_retrieval> Retrieved context from banks: ['test_bank']. ==== Here are the retrieved documents for relevant context: === START-RETRIEVED-CONTEXT === id:num-1; content:_ the template from Llama2 to better support multiturn conversations. The same text in the Lla... > inference> Based on the provided documentation, the top 5 topics that were explained are: ..... ```	2024-11-20 10:30:23 -08:00

... 7 8 9 10 11 ...

1025 commits