Commit graph

1025 commits

Author SHA1 Message Date
Connor Hack
8f60a3a55d Clean up job names 2024-11-22 15:07:08 -08:00
Ashwin Bharambe
c2c53d0272 More doc cleanup 2024-11-22 14:37:22 -08:00
Connor Hack
cbd69d06c3 Clean up checkpoint directory setting 2024-11-22 14:22:31 -08:00
Ashwin Bharambe
900b0556e7 Much more documentation work, things are getting a bit consumable right now 2024-11-22 14:06:18 -08:00
Ashwin Bharambe
98e213e96c More docs work 2024-11-22 14:06:18 -08:00
Ashwin Bharambe
eb2063bc3d Updates to the main doc page 2024-11-22 14:06:18 -08:00
dltn
eaf4fbef75 another print -> log fix 2024-11-22 13:35:34 -08:00
dltn
302a0145e5 we do want prints in print_pip_install_help 2024-11-22 13:32:54 -08:00
Dalton Flanagan
b007b062f3
Fix llama stack build in 0.0.54 (#505)
# What does this PR do?

Safety provider `inline::meta-reference` is now deprecated. However, we 

* aren't checking / printing the deprecation message in `llama stack
build`
* make the deprecated (unusable) provider

So I (1) added checking and (2) made `inline::llama-guard` the default

## Test Plan

Before

```
Traceback (most recent call last):
  File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module>
    sys.exit(main())
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main
    parser.run(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run
    args.func(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command
    self._run_stack_build_command_from_build_config(build_config)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 305, in _run_stack_build_command_from_build_config
    self._generate_run_config(build_config, build_dir)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 226, in _generate_run_config
    config_type = instantiate_class_type(
  File "/home/dalton/all/llama-stack/llama_stack/distribution/utils/dynamic.py", line 12, in instantiate_class_type
    module = importlib.import_module(module_name)
  File "/home/dalton/.conda/envs/nov22/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'llama_stack.providers.inline.safety.meta_reference'
```

After

```
Traceback (most recent call last):
  File "/home/dalton/.conda/envs/nov22/bin/llama", line 8, in <module>
    sys.exit(main())
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 46, in main
    parser.run(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/llama.py", line 40, in run
    args.func(args)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 177, in _run_stack_build_command
    self._run_stack_build_command_from_build_config(build_config)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 309, in _run_stack_build_command_from_build_config
    self._generate_run_config(build_config, build_dir)
  File "/home/dalton/all/llama-stack/llama_stack/cli/stack/build.py", line 228, in _generate_run_config
    raise InvalidProviderError(p.deprecation_error)
llama_stack.distribution.resolver.InvalidProviderError: 
Provider `inline::meta-reference` for API `safety` does not work with the latest Llama Stack.
- if you are using Llama Guard v3, please use the `inline::llama-guard` provider instead.
- if you are using Prompt Guard, please use the `inline::prompt-guard` provider instead.
- if you are using Code Scanner, please use the `inline::code-scanner` provider instead.
```

<img width="469" alt="Screenshot 2024-11-22 at 4 10 24 PM"
src="https://github.com/user-attachments/assets/8c2e09fe-379a-4504-b246-7925f80a6ed6">

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2024-11-22 16:23:44 -05:00
Connor Hack
d1d8f859e6 Update checkpointd directory setting 2024-11-22 12:51:34 -08:00
Connor Hack
7f5e0dd3db Refactor test run to support shorthand model names 2024-11-22 12:30:13 -08:00
Connor Hack
9c07e0189a Fix syntax error 2024-11-22 11:16:17 -08:00
Connor Hack
0e9ed3688d Remove unnecessary env vars 2024-11-22 10:58:17 -08:00
Connor Hack
1481a67365 Test new provider name 2024-11-22 10:22:12 -08:00
Connor Hack
377896a4c5 Remove testing llama-stack RC 2024-11-22 09:46:14 -08:00
Connor Hack
143e91f23d Add manual provider back for testing 2024-11-22 09:18:29 -08:00
Connor Hack
25e23a1dfe Add debug statement for PROVIDER_ID 2024-11-22 08:56:53 -08:00
Connor Hack
496879795e Dynamically change provider in tests 2024-11-22 07:22:04 -08:00
Chacksu
4136accf48
Merge branch 'meta-llama:main' into main 2024-11-21 19:49:53 -05:00
Connor Hack
046eec9793 Remove testing llama-stack RC 2024-11-21 16:35:00 -08:00
Ashwin Bharambe
2137b0af40 Bump version to 0.0.54 2024-11-21 16:28:30 -08:00
Ashwin Bharambe
c1025ebfdb Delete some dead code 2024-11-21 15:20:06 -08:00
Ashwin Bharambe
a0a00f1345 Update telemetry to have TEXT be the default log format 2024-11-21 15:18:45 -08:00
Connor Hack
318c98807c Pre-emptively test llama stack RC 2024-11-21 15:15:43 -08:00
Chacksu
94bfd9a1d1
Merge branch 'meta-llama:main' into main 2024-11-21 18:07:53 -05:00
Xi Yan
945db5dac2 fix logging 2024-11-21 15:02:57 -08:00
Ashwin Bharambe
d790be28b3 Don't skip meta-reference for the tests 2024-11-21 13:29:53 -08:00
Ashwin Bharambe
55c55b9f51 Update Quick Start significantly 2024-11-21 13:20:55 -08:00
Chacksu
19bc7e8942
Merge branch 'meta-llama:main' into main 2024-11-21 15:47:54 -05:00
Xi Yan
654722da7d fix model id for llm_as_judge_405b 2024-11-21 11:34:49 -08:00
Dinesh Yeduguru
6395dadc2b
use logging instead of prints (#499)
# What does this PR do?

This PR moves all print statements to use logging. Things changed:
- Had to add `await start_trace("sse_generator")` to server.py to
actually get tracing working. else was not seeing any logs
- If no telemetry provider is provided in the run.yaml, we will write to
stdout
- by default, the logs are going to be in JSON, but we expose an option
to configure to output in a human readable way.
2024-11-21 11:32:53 -08:00
liyunlu0618
4e1105e563
Fix fp8 quantization script. (#500)
# What does this PR do?

Fix fp8 quantization script.

## Test Plan

```
sh run_quantize_checkpoint.sh localhost fp8 /home/yll/fp8_test/ /home/yll/fp8_test/quantized_2 /home/yll/fp8_test/tokenizer.model 1 1
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Ran pre-commit to handle lint / formatting issues.
- [x] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [x] Updated relevant documentation.
- [x] Wrote necessary unit or integration tests.

Co-authored-by: Yunlu Li <yll@meta.com>
2024-11-21 09:15:28 -08:00
Chacksu
09302347d3
Merge branch 'meta-llama:main' into main 2024-11-21 10:21:49 -05:00
Ashwin Bharambe
cf079a22a0 Plurals 2024-11-20 23:24:59 -08:00
Ashwin Bharambe
cd6ccb664c Integrate distro docs into the restructured docs 2024-11-20 23:20:05 -08:00
Ashwin Bharambe
2411a44833 Update more distribution docs to be simpler and partially codegen'ed 2024-11-20 22:03:44 -08:00
Connor Hack
490c5fb730 Undo None check and temporarily move if model check before builder 2024-11-20 19:17:44 -08:00
Connor Hack
16ffe19a20 Account for if a permitted model is None 2024-11-20 18:48:59 -08:00
Chacksu
05f1041bfa
Merge branch 'meta-llama:main' into main 2024-11-20 19:21:20 -05:00
Ashwin Bharambe
e84d4436b5
Since we are pushing for HF repos, we should accept them in inference configs (#497)
# What does this PR do?

As the title says. 

## Test Plan

This needs
8752149f58
to also land. So the next package (0.0.54) will make this work properly.

The test is:

```bash
pytest -v -s -m "llama_3b and meta_reference" test_model_registration.py
```
2024-11-20 16:14:37 -08:00
Dinesh Yeduguru
b3f9e8b2f2
Restructure docs (#494)
Rendered docs at: https://llama-stack.readthedocs.io/en/doc-simplify/
2024-11-20 15:54:47 -08:00
Chacksu
0ec4ddd179
Merge branch 'meta-llama:main' into main 2024-11-20 18:46:45 -05:00
Ashwin Bharambe
068ac00a3b
Don't depend on templates.py when print llama stack build messages (#496) 2024-11-20 15:44:49 -08:00
Chacksu
a5acb59407
Merge branch 'meta-llama:main' into main 2024-11-20 18:30:01 -05:00
Connor Hack
2795731434 Update model name for mete-reference template 2024-11-20 14:40:37 -08:00
Ashwin Bharambe
00816cc8ef make sure codegen doesn't cause spurious diffs for no reason 2024-11-20 13:56:30 -08:00
Chacksu
edfd92d81f
Merge branch 'meta-llama:main' into main 2024-11-20 16:12:38 -05:00
Ashwin Bharambe
681322731b
Make run yaml optional so dockers can start with just --env (#492)
When running with dockers, the idea is that users be able to work purely
with the `llama stack` CLI. They should not need to know about the
existence of any YAMLs unless they need to. This PR enables it.

The docker command now doesn't need to volume mount a yaml and can
simply be:
```bash
docker run -v ~/.llama/:/root/.llama \
  --env A=a --env B=b
```

## Test Plan

Check with conda first (no regressions):
```bash
LLAMA_STACK_DIR=. llama stack build --template ollama
llama stack run ollama --port 5001

# server starts up correctly
```

Check with docker
```bash
# build the docker
LLAMA_STACK_DIR=. llama stack build --template ollama --image-type docker

export INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct"

docker run -it  -p 5001:5001 \
  -v ~/.llama:/root/.llama \
  -v $PWD:/app/llama-stack-source \
  localhost/distribution-ollama:dev \
  --port 5001 \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env OLLAMA_URL=http://host.docker.internal:11434
```

Note that volume mounting to `/app/llama-stack-source` is only needed
because we built the docker with uncommitted source code.
2024-11-20 13:11:40 -08:00
Dinesh Yeduguru
1d8d0593af
register with provider even if present in stack (#491)
# What does this PR do?

Remove a check which skips provider registration if a resource is
already in stack registry. Since we do not reconcile state with
provider, register should always call into provider's register endpoint.


## Test Plan
```
# stack run
╰─❯ llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml

#register memory bank
❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0

Memory Bank Configuration:
{
│   'memory_bank_type': 'vector',
│   'chunk_size_in_tokens': 512,
│   'embedding_model': 'all-MiniLM-L6-v2',
│   'overlap_size_in_tokens': 64
}

#register again
❯ llama-stack-client memory_banks register your_memory_bank_name --type vector --provider-id inline::faiss-0

Memory Bank Configuration:
{
│   'memory_bank_type': 'vector',
│   'chunk_size_in_tokens': 512,
│   'embedding_model': 'all-MiniLM-L6-v2',
│   'overlap_size_in_tokens': 64
}
```
2024-11-20 11:05:50 -08:00
Dinesh Yeduguru
91e7efbc91
fall to back to read from chroma/pgvector when not in cache (#489)
# What does this PR do?

The chroma provider maintains a cache but does not sync up with chroma
on a cold start. this change adds a fallback to read from chroma on a
cache miss.


## Test Plan
```bash
#start stack
llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml
# Add documents
PYTHONPATH=. python -m examples.agents.rag_with_memory_bank localhost 5000

No available shields. Disable safety.
Using model: Llama3.1-8B-Instruct
Created session_id=b951b14f-a9d2-43a3-8b80-d80114d58322 for Agent(0687a251-6906-4081-8d4c-f52e19db9dd7)
memory_retrieval> Retrieved context from banks: ['test_bank'].
====
Here are the retrieved documents for relevant context:
=== START-RETRIEVED-CONTEXT ===
 id:num-1; content:_
the template from Llama2 to better support multiturn conversations. The same text
in the Lla...
>
inference> Based on the retrieved documentation, the top 5 topics that were explained are:
...............

# Kill stack
# Bootup stack
llama stack run /Users/dineshyv/.llama/distributions/llamastack-together/together-run.yaml
# Run a RAG app with just the agent flow. it discovers the previously added documents
No available shields. Disable safety.
Using model: Llama3.1-8B-Instruct
Created session_id=7a30c1a7-c87e-4787-936c-d0306589fe5d for Agent(b30420f3-c928-498a-887b-d084f0f3806c)
memory_retrieval> Retrieved context from banks: ['test_bank'].
====
Here are the retrieved documents for relevant context:
=== START-RETRIEVED-CONTEXT ===
 id:num-1; content:_
the template from Llama2 to better support multiturn conversations. The same text
in the Lla...
>
inference> Based on the provided documentation, the top 5 topics that were explained are:
.....
```
2024-11-20 10:30:23 -08:00