mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-01 16:24:44 +00:00
add evals work
This commit is contained in:
parent
7df97445a2
commit
f3c6cbfd53
1 changed files with 11 additions and 2 deletions
13
CHANGELOG.md
13
CHANGELOG.md
|
@ -11,10 +11,16 @@
|
||||||
- Provider deprecation support
|
- Provider deprecation support
|
||||||
- Comprehensive Zero-to-Hero notebooks and quickstart guides
|
- Comprehensive Zero-to-Hero notebooks and quickstart guides
|
||||||
- Colab notebook integration for Llama Stack lesson
|
- Colab notebook integration for Llama Stack lesson
|
||||||
- Remote::vllm provider with vision model support
|
|
||||||
- Support for quantized models in Ollama
|
- Support for quantized models in Ollama
|
||||||
- Vision models support for Together, Fireworks, Meta-Reference, and Ollama
|
- Vision models support for Together, Fireworks, Meta-Reference, and Ollama, and vLLM
|
||||||
- Bedrock distribution with safety shields support
|
- Bedrock distribution with safety shields support
|
||||||
|
- Evals API with task registration and scoring functions
|
||||||
|
- MMLU and SimpleQA benchmark scoring functions
|
||||||
|
- Huggingface dataset provider integration for benchmarks
|
||||||
|
- Support for custom dataset registration from local paths
|
||||||
|
- Benchmark evaluation CLI tools with visualization tables
|
||||||
|
- RAG evaluation scoring functions and metrics
|
||||||
|
- Local persistence for datasets and eval tasks
|
||||||
|
|
||||||
### Changed
|
### Changed
|
||||||
- Split safety into distinct providers (llama-guard, prompt-guard, code-scanner)
|
- Split safety into distinct providers (llama-guard, prompt-guard, code-scanner)
|
||||||
|
@ -25,6 +31,9 @@
|
||||||
- Restructured folder organization for providers
|
- Restructured folder organization for providers
|
||||||
- Enhanced Docker build configuration
|
- Enhanced Docker build configuration
|
||||||
- Added version prefixing for REST API routes
|
- Added version prefixing for REST API routes
|
||||||
|
- Enhanced evaluation task registration workflow
|
||||||
|
- Improved benchmark evaluation output formatting
|
||||||
|
- Restructured evals folder organization for better modularity
|
||||||
|
|
||||||
### Removed
|
### Removed
|
||||||
- `llama stack configure` command
|
- `llama stack configure` command
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue