forked from phoenix-oss/llama-stack-mirror
		
	# What does this PR do? The goal is to have a fairly complete set of provider and e2e tests for /chat-completion and /completion. This is the current list, ``` grep -oE "def test_[a-zA-Z_+]*" llama_stack/providers/tests/inference/test_text_inference.py | cut -d' ' -f2 ``` - test_model_list - test_text_completion_non_streaming - test_text_completion_streaming - test_text_completion_logprobs_non_streaming - test_text_completion_logprobs_streaming - test_text_completion_structured_output - test_text_chat_completion_non_streaming - test_text_chat_completion_structured_output - test_text_chat_completion_streaming - test_text_chat_completion_with_tool_calling - test_text_chat_completion_with_tool_calling_streaming ``` grep -oE "def test_[a-zA-Z_+]*" tests/client-sdk/inference/test_text_inference.py | cut -d' ' -f2 ``` - test_text_completion_non_streaming - test_text_completion_streaming - test_text_completion_log_probs_non_streaming - test_text_completion_log_probs_streaming - test_text_completion_structured_output - test_text_chat_completion_non_streaming - test_text_chat_completion_streaming - test_text_chat_completion_with_tool_calling_and_non_streaming - test_text_chat_completion_with_tool_calling_and_streaming - test_text_chat_completion_with_tool_choice_required - test_text_chat_completion_with_tool_choice_none - test_text_chat_completion_structured_output - test_text_chat_completion_tool_calling_tools_not_in_request ## Test plan == Set up Ollama local server ``` OLLAMA_HOST=127.0.0.1:8321 with-proxy ollama serve OLLAMA_HOST=127.0.0.1:8321 ollama run llama3.2:3b-instruct-fp16 --keepalive 60m ``` == Run a provider test ``` conda activate stack OLLAMA_URL="http://localhost:8321" \ pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \ llama_stack/providers/tests/inference/test_text_inference.py::TestInference ``` == Run an e2e test ``` conda activate sherpa with-proxy pip install llama-stack export INFERENCE_MODEL=llama3.2:3b-instruct-fp16 export LLAMA_STACK_PORT=8322 with-proxy llama stack build --template ollama with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama ``` ``` conda activate stack LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \ pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \ tests/client-sdk/inference/test_text_inference.py ```
		
			
				
	
	
		
			43 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			JSON
		
	
	
	
	
	
			
		
		
	
	
			43 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			JSON
		
	
	
	
	
	
| {
 | |
|     "sanity": {
 | |
|         "data": {
 | |
|             "content": "Complete the sentence using one word: Roses are red, violets are "
 | |
|         }
 | |
|     },
 | |
|     "non_streaming": {
 | |
|         "data": {
 | |
|             "content": "Micheael Jordan is born in ",
 | |
|             "expected": "1963"
 | |
|         }
 | |
|     },
 | |
|     "streaming": {
 | |
|         "data": {
 | |
|             "content": "Roses are red,"
 | |
|         }
 | |
|     },
 | |
|     "log_probs": {
 | |
|         "data": {
 | |
|             "content": "Complete the sentence: Micheael Jordan is born in "
 | |
|         }
 | |
|     },
 | |
|     "logprobs_non_streaming": {
 | |
|         "data": {
 | |
|             "content": "Micheael Jordan is born in "
 | |
|         }
 | |
|     },
 | |
|     "logprobs_streaming": {
 | |
|         "data": {
 | |
|             "content": "Roses are red,"
 | |
|         }
 | |
|     },
 | |
|     "structured_output": {
 | |
|         "data": {
 | |
|             "user_input": "Michael Jordan was born in 1963. He played basketball for the Chicago Bulls. He retired in 2003.",
 | |
|             "expected": {
 | |
|                 "name": "Michael Jordan",
 | |
|                 "year_born": "1963",
 | |
|                 "year_retired": "2003"
 | |
|             }
 | |
|         }
 | |
|     }
 | |
| }
 |