Add REST api example for chat_completion (#286)

2025-12-03 18:00:36 +00:00 · 2024-10-22 13:35:20 -04:00 · 2024-10-22 13:35:20 -04:00 · 668a495aba
commit 668a495aba
parent e45f121c77
1 changed files with 24 additions and 1 deletions
--- a/docs/getting_started.md
+++ b/docs/getting_started.md
@ -169,7 +169,7 @@ conda activate <env>  # any environment containing the llama-stack pip package w
 python -m llama_stack.apis.inference.client localhost 5000
 ```

-This will run the chat completion client and query the distribution’s /inference/chat_completion API.
+This will run the chat completion client and query the distribution’s `/inference/chat_completion` API.

 Here is an example output:
 ```
@ -180,6 +180,29 @@ The moon glows softly in the midnight sky,
 A beacon of wonder, as it passes by.
 ```

+You may also send a POST request to the server:
+```
+curl http://localhost:5000/inference/chat_completion \
+-H "Content-Type: application/json" \
+-d '{
+	"model": "Llama3.1-8B-Instruct",
+	"messages": [
+		{"role": "system", "content": "You are a helpful assistant."},
+		{"role": "user", "content": "Write me a 2 sentence poem about the moon"}
+	],
+	"sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}
+}'
+
+Output:
+{'completion_message': {'role': 'assistant',
+  'content': 'The moon glows softly in the midnight sky, \nA beacon of wonder, as it catches the eye.',
+  'stop_reason': 'out_of_tokens',
+  'tool_calls': []},
+ 'logprobs': null}
+
+```
+
+
 Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by:

 ```