mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 09:53:45 +00:00

History

William Caban e61572daf0 feat(inference): add tokenization utilities for prompt caching Implement token counting utilities to determine prompt cacheability (≥1024 tokens) with support for OpenAI, Llama, and multimodal content. - Add count_tokens() function with model-specific tokenizers - Support OpenAI models (GPT-4, GPT-4o, etc.) via tiktoken - Support Llama models (3.x, 4.x) via transformers - Fallback to character-based estimation for unknown models - Handle multimodal content (text + images) - LRU cache for tokenizer instances (max 10, <1ms cached calls) - Comprehensive unit tests (34 tests, >95% coverage) - Update tiktoken version constraint to >=0.8.0 This enables future PR to determine which prompts should be cached based on token count threshold. Signed-off-by: William Caban <william.caban@gmail.com>		2025-11-15 17:27:08 -05:00
..
cli	fix: generate provider config when using --providers (#4044 )	2025-11-03 11:37:58 -08:00
conversations	fix: rename llama_stack_api dir (#4155 )	2025-11-13 15:04:36 -08:00
core	fix: rename llama_stack_api dir (#4155 )	2025-11-13 15:04:36 -08:00
distribution	fix: MCP authorization parameter implementation (#4052 )	2025-11-14 08:54:42 -08:00
files	fix: rename llama_stack_api dir (#4155 )	2025-11-13 15:04:36 -08:00
models	refactor: remove dead inference API code and clean up imports (#4093 )	2025-11-10 15:29:24 -08:00
prompts/prompts	feat(prompts): attach prompts to storage stores in run configs (#3893 )	2025-10-27 11:12:12 -07:00
providers	feat(inference): add tokenization utilities for prompt caching	2025-11-15 17:27:08 -05:00
rag	fix: rename llama_stack_api dir (#4155 )	2025-11-13 15:04:36 -08:00
registry	fix: rename llama_stack_api dir (#4155 )	2025-11-13 15:04:36 -08:00
server	feat(openapi): switch to fastapi-based generator (#3944 )	2025-11-14 15:53:53 -08:00
tools	fix: rename llama_stack_api dir (#4155 )	2025-11-13 15:04:36 -08:00
utils	fix: rename llama_stack_api dir (#4155 )	2025-11-13 15:04:36 -08:00
__init__.py	chore: Add fixtures to conftest.py (#2067 )	2025-05-06 13:57:48 +02:00
conftest.py	test: suppress expected error logs in SSE test (#3886 )	2025-10-22 14:34:32 -07:00
fixtures.py	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 )	2025-07-30 23:30:53 -07:00
README.md	test: Measure and track code coverage (#2636 )	2025-07-18 18:08:36 +02:00

README.md

Llama Stack Unit Tests

Unit Tests

Unit tests verify individual components and functions in isolation. They are fast, reliable, and don't require external services.

Prerequisites

Python Environment: Ensure you have Python 3.12+ installed
uv Package Manager: Install uv if not already installed

You can run the unit tests by running:

./scripts/unit-tests.sh [PYTEST_ARGS]

Any additional arguments are passed to pytest. For example, you can specify a test directory, a specific test file, or any pytest flags (e.g., -vvv for verbosity). If no test directory is specified, it defaults to "tests/unit", e.g:

./scripts/unit-tests.sh tests/unit/registry/test_registry.py -vvv

If you'd like to run for a non-default version of Python (currently 3.12), pass PYTHON_VERSION variable as follows:

source .venv/bin/activate
PYTHON_VERSION=3.13 ./scripts/unit-tests.sh

Test Configuration

Test Discovery: Tests are automatically discovered in the tests/unit/ directory
Async Support: Tests use --asyncio-mode=auto for automatic async test handling
Coverage: Tests generate coverage reports in htmlcov/ directory
Python Version: Defaults to Python 3.12, but can be overridden with PYTHON_VERSION environment variable

Coverage Reports

After running tests, you can view coverage reports:

# Open HTML coverage report in browser
open htmlcov/index.html  # macOS
xdg-open htmlcov/index.html  # Linux
start htmlcov/index.html  # Windows