mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-19 11:20:03 +00:00
docs(mcp): add a few lines for how to specify Auth headers in MCP tools
This commit is contained in:
parent
2603f10f95
commit
8dcdce317d
7 changed files with 134 additions and 102 deletions
|
@ -1,4 +1,4 @@
|
|||
# Evaluation Concepts
|
||||
## Evaluation Concepts
|
||||
|
||||
The Llama Stack Evaluation flow allows you to run evaluations on your GenAI application datasets or pre-registered benchmarks.
|
||||
|
||||
|
@ -10,11 +10,7 @@ We introduce a set of APIs in Llama Stack for supporting running evaluations of
|
|||
This guide goes over the sets of APIs and developer experience flow of using Llama Stack to run evaluations for different use cases. Checkout our Colab notebook on working examples with evaluations [here](https://colab.research.google.com/drive/10CHyykee9j2OigaIcRv47BKG9mrNm0tJ?usp=sharing).
|
||||
|
||||
|
||||
## Evaluation Concepts
|
||||
|
||||
The Evaluation APIs are associated with a set of Resources as shown in the following diagram. Please visit the Resources section in our [Core Concepts](../concepts/index.md) guide for better high-level understanding.
|
||||
|
||||

|
||||
The Evaluation APIs are associated with a set of Resources. Please visit the Resources section in our [Core Concepts](../concepts/index.md) guide for better high-level understanding.
|
||||
|
||||
- **DatasetIO**: defines interface with datasets and data loaders.
|
||||
- Associated with `Dataset` resource.
|
||||
|
@ -24,9 +20,9 @@ The Evaluation APIs are associated with a set of Resources as shown in the follo
|
|||
- Associated with `Benchmark` resource.
|
||||
|
||||
|
||||
## Open-benchmark Eval
|
||||
### Open-benchmark Eval
|
||||
|
||||
### List of open-benchmarks Llama Stack support
|
||||
#### List of open-benchmarks Llama Stack support
|
||||
|
||||
Llama stack pre-registers several popular open-benchmarks to easily evaluate model perfomance via CLI.
|
||||
|
||||
|
@ -39,7 +35,7 @@ The list of open-benchmarks we currently support:
|
|||
|
||||
You can follow this [contributing guide](https://llama-stack.readthedocs.io/en/latest/references/evals_reference/index.html#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
|
||||
|
||||
### Run evaluation on open-benchmarks via CLI
|
||||
#### Run evaluation on open-benchmarks via CLI
|
||||
|
||||
We have built-in functionality to run the supported open-benckmarks using llama-stack-client CLI
|
||||
|
||||
|
@ -74,7 +70,7 @@ evaluation results over there.
|
|||
|
||||
|
||||
|
||||
## What's Next?
|
||||
#### What's Next?
|
||||
|
||||
- Check out our Colab notebook on working examples with running benchmark evaluations [here](https://colab.research.google.com/github/meta-llama/llama-stack/blob/main/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb#scrollTo=mxLCsP4MvFqP).
|
||||
- Check out our [Building Applications - Evaluation](../building_applications/evals.md) guide for more details on how to use the Evaluation APIs to evaluate your applications.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue