llama-stack/llama_stack/apis
Ashwin Bharambe 0d2eb3bd25 Use inference APIs for running llama guard
Test Plan:

First, start a TGI container with `meta-llama/Llama-Guard-3-8B` model
serving on port 5099. See https://github.com/meta-llama/llama-stack/pull/53 and its
description for how.

Then run llama-stack with the following run config:

```
image_name: safety
docker_image: null
conda_env: safety
apis_to_serve:
- models
- inference
- shields
- safety
api_providers:
  inference:
    providers:
    - remote::tgi
  safety:
    providers:
    - meta-reference
  telemetry:
    provider_id: meta-reference
    config: {}
routing_table:
  inference:
  - provider_id: remote::tgi
    config:
      url: http://localhost:5099
      api_token: null
      hf_endpoint_name: null
    routing_key: Llama-Guard-3-8B
  safety:
  - provider_id: meta-reference
    config:
      llama_guard_shield:
        model: Llama-Guard-3-8B
        excluded_categories: []
        disable_input_check: false
        disable_output_check: false
      prompt_guard_shield: null
    routing_key: llama_guard
```

Now simply run `python -m llama_stack.apis.safety.client localhost
<port>` and check that the llama_guard shield calls run correctly. (The
injection_shield calls fail as expected since we have not set up a
router for them.)
2024-09-24 17:02:57 -07:00
..
agents [API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92) 2024-09-23 14:22:22 -07:00
batch_inference API Updates (#73) 2024-09-17 19:51:35 -07:00
common API Updates (#73) 2024-09-17 19:51:35 -07:00
dataset API Updates (#73) 2024-09-17 19:51:35 -07:00
evals API Updates (#73) 2024-09-17 19:51:35 -07:00
inference Use inference APIs for running llama guard 2024-09-24 17:02:57 -07:00
memory Respect passed in embedding model 2024-09-24 14:40:28 -07:00
memory_banks [API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92) 2024-09-23 14:22:22 -07:00
models [API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92) 2024-09-23 14:22:22 -07:00
post_training API Updates (#73) 2024-09-17 19:51:35 -07:00
reward_scoring API Updates (#73) 2024-09-17 19:51:35 -07:00
safety [API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92) 2024-09-23 14:22:22 -07:00
shields [API Updates] Model / shield / memory-bank routing + agent persistence + support for private headers (#92) 2024-09-23 14:22:22 -07:00
synthetic_data_generation API Updates (#73) 2024-09-17 19:51:35 -07:00
telemetry API Updates (#73) 2024-09-17 19:51:35 -07:00
__init__.py API Updates (#73) 2024-09-17 19:51:35 -07:00