forked from phoenix-oss/llama-stack-mirror
Web updates to point to latest releases for Mobile SDK (#1650)
# What does this PR do? Web updates to point to latest releases for Mobile SDK - point to `latest-release` branch for mobile sdk repos to minimize the number of change points on the site. - updates to some instructions
This commit is contained in:
parent
d2dda4af64
commit
b56b06037c
4 changed files with 18 additions and 60 deletions
|
@ -71,4 +71,4 @@ While there is a lot of flexibility to mix-and-match providers, often users will
|
||||||
**Locally Hosted Distro**: You may want to run Llama Stack on your own hardware. Typically though, you still need to use Inference via an external service. You can use providers like HuggingFace TGI, Fireworks, Together, etc. for this purpose. Or you may have access to GPUs and can run a [vLLM](https://github.com/vllm-project/vllm) or [NVIDIA NIM](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere&q=llama) instance. If you "just" have a regular desktop machine, you can use [Ollama](https://ollama.com/) for inference. To provide convenient quick access to these options, we provide a number of such pre-configured locally-hosted Distros.
|
**Locally Hosted Distro**: You may want to run Llama Stack on your own hardware. Typically though, you still need to use Inference via an external service. You can use providers like HuggingFace TGI, Fireworks, Together, etc. for this purpose. Or you may have access to GPUs and can run a [vLLM](https://github.com/vllm-project/vllm) or [NVIDIA NIM](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere&q=llama) instance. If you "just" have a regular desktop machine, you can use [Ollama](https://ollama.com/) for inference. To provide convenient quick access to these options, we provide a number of such pre-configured locally-hosted Distros.
|
||||||
|
|
||||||
|
|
||||||
**On-device Distro**: Finally, you may want to run Llama Stack directly on an edge device (mobile phone or a tablet.) We provide Distros for iOS and Android (coming soon.)
|
**On-device Distro**: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/ios_sdk.html) and [Android](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/android_sdk.html)
|
||||||
|
|
|
@ -8,12 +8,12 @@ Features:
|
||||||
- Remote Inferencing: Perform inferencing tasks remotely with Llama models hosted on a remote connection (or serverless localhost).
|
- Remote Inferencing: Perform inferencing tasks remotely with Llama models hosted on a remote connection (or serverless localhost).
|
||||||
- Simple Integration: With easy-to-use APIs, a developer can quickly integrate Llama Stack in their Android app. The difference with local vs remote inferencing is also minimal.
|
- Simple Integration: With easy-to-use APIs, a developer can quickly integrate Llama Stack in their Android app. The difference with local vs remote inferencing is also minimal.
|
||||||
|
|
||||||
Latest Release Notes: [v0.0.58](https://github.com/meta-llama/llama-stack-client-kotlin/releases/tag/v0.0.58)
|
Latest Release Notes: [link](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release)
|
||||||
|
|
||||||
*Tagged releases are stable versions of the project. While we strive to maintain a stable main branch, it's not guaranteed to be free of bugs or issues.*
|
*Tagged releases are stable versions of the project. While we strive to maintain a stable main branch, it's not guaranteed to be free of bugs or issues.*
|
||||||
|
|
||||||
## Android Demo App
|
## Android Demo App
|
||||||
Check out our demo app to see how to integrate Llama Stack into your Android app: [Android Demo App](https://github.com/meta-llama/llama-stack-apps/tree/android-kotlin-app-latest/examples/android_app)
|
Check out our demo app to see how to integrate Llama Stack into your Android app: [Android Demo App](https://github.com/meta-llama/llama-stack-client-kotlin/tree/examples/android_app)
|
||||||
|
|
||||||
The key files in the app are `ExampleLlamaStackLocalInference.kt`, `ExampleLlamaStackRemoteInference.kts`, and `MainActivity.java`. With encompassed business logic, the app shows how to use Llama Stack for both the environments.
|
The key files in the app are `ExampleLlamaStackLocalInference.kt`, `ExampleLlamaStackRemoteInference.kts`, and `MainActivity.java`. With encompassed business logic, the app shows how to use Llama Stack for both the environments.
|
||||||
|
|
||||||
|
@ -24,7 +24,7 @@ The key files in the app are `ExampleLlamaStackLocalInference.kt`, `ExampleLlama
|
||||||
Add the following dependency in your `build.gradle.kts` file:
|
Add the following dependency in your `build.gradle.kts` file:
|
||||||
```
|
```
|
||||||
dependencies {
|
dependencies {
|
||||||
implementation("com.llama.llamastack:llama-stack-client-kotlin:0.0.58")
|
implementation("com.llama.llamastack:llama-stack-client-kotlin:0.1.4.2")
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
This will download jar files in your gradle cache in a directory like `~/.gradle/caches/modules-2/files-2.1/com.llama.llamastack/`
|
This will download jar files in your gradle cache in a directory like `~/.gradle/caches/modules-2/files-2.1/com.llama.llamastack/`
|
||||||
|
@ -36,13 +36,13 @@ If you plan on doing remote inferencing this is sufficient to get started.
|
||||||
For local inferencing, it is required to include the ExecuTorch library into your app.
|
For local inferencing, it is required to include the ExecuTorch library into your app.
|
||||||
|
|
||||||
Include the ExecuTorch library by:
|
Include the ExecuTorch library by:
|
||||||
1. Download the `download-prebuilt-et-lib.sh` script file from the [llama-stack-client-kotlin-client-local](https://github.com/meta-llama/llama-stack-client-kotlin/blob/release/0.0.58/llama-stack-client-kotlin-client-local/download-prebuilt-et-lib.sh) directory to your local machine.
|
1. Download the `download-prebuilt-et-lib.sh` script file from the [llama-stack-client-kotlin-client-local](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/llama-stack-client-kotlin-client-local/download-prebuilt-et-lib.sh) directory to your local machine.
|
||||||
2. Move the script to the top level of your Android app where the app directory resides:
|
2. Move the script to the top level of your Android app where the app directory resides:
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="https://raw.githubusercontent.com/meta-llama/llama-stack-client-kotlin/refs/heads/release/0.0.58/doc/img/example_android_app_directory.png" style="width:300px">
|
<img src="https://github.com/meta-llama/llama-stack-client-kotlin/blob/latest-release/doc/img/example_android_app_directory.png" style="width:300px">
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
3. Run `sh download-prebuilt-et-lib.sh` to create an `app/libs` directory and download the `executorch.aar` in that path. This generates an ExecuTorch library for the XNNPACK delegate with commit: [0a12e33](https://github.com/pytorch/executorch/commit/0a12e33d22a3d44d1aa2af5f0d0673d45b962553).
|
3. Run `sh download-prebuilt-et-lib.sh` to create an `app/libs` directory and download the `executorch.aar` in that path. This generates an ExecuTorch library for the XNNPACK delegate.
|
||||||
4. Add the `executorch.aar` dependency in your `build.gradle.kts` file:
|
4. Add the `executorch.aar` dependency in your `build.gradle.kts` file:
|
||||||
```
|
```
|
||||||
dependencies {
|
dependencies {
|
||||||
|
@ -58,12 +58,12 @@ Breaking down the demo app, this section will show the core pieces that are used
|
||||||
### Setup Remote Inferencing
|
### Setup Remote Inferencing
|
||||||
Start a Llama Stack server on localhost. Here is an example of how you can do this using the firework.ai distribution:
|
Start a Llama Stack server on localhost. Here is an example of how you can do this using the firework.ai distribution:
|
||||||
```
|
```
|
||||||
conda create -n stack-fireworks python=3.10
|
conda create -n stack-fireworks python=3.10
|
||||||
conda activate stack-fireworks
|
conda activate stack-fireworks
|
||||||
pip install llama-stack=0.0.58
|
pip install --no-cache llama-stack==0.1.4
|
||||||
llama stack build --template fireworks --image-type conda
|
llama stack build --template fireworks --image-type conda
|
||||||
export FIREWORKS_API_KEY=<SOME_KEY>
|
export FIREWORKS_API_KEY=<SOME_KEY>
|
||||||
llama stack run /Users/<your_username>/.llama/distributions/llamastack-fireworks/fireworks-run.yaml --port=5050
|
llama stack run fireworks --port 5050
|
||||||
```
|
```
|
||||||
|
|
||||||
Ensure the Llama Stack server version is the same as the Kotlin SDK Library for maximum compatibility.
|
Ensure the Llama Stack server version is the same as the Kotlin SDK Library for maximum compatibility.
|
||||||
|
@ -146,7 +146,7 @@ The purpose of this section is to share more details with users that would like
|
||||||
### Prerequisite
|
### Prerequisite
|
||||||
|
|
||||||
You must complete the following steps:
|
You must complete the following steps:
|
||||||
1. Clone the repo (`git clone https://github.com/meta-llama/llama-stack-client-kotlin.git -b release/0.0.58`)
|
1. Clone the repo (`git clone https://github.com/meta-llama/llama-stack-client-kotlin.git -b latest-release`)
|
||||||
2. Port the appropriate ExecuTorch libraries over into your Llama Stack Kotlin library environment.
|
2. Port the appropriate ExecuTorch libraries over into your Llama Stack Kotlin library environment.
|
||||||
```
|
```
|
||||||
cd llama-stack-client-kotlin-client-local
|
cd llama-stack-client-kotlin-client-local
|
||||||
|
|
|
@ -1,9 +1,8 @@
|
||||||
# iOS SDK
|
# iOS SDK
|
||||||
|
|
||||||
We offer both remote and on-device use of Llama Stack in Swift via two components:
|
We offer both remote and on-device use of Llama Stack in Swift via a single SDK [llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/) that contains two components:
|
||||||
|
1. LlamaStackClient for remote
|
||||||
1. [llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/)
|
2. Local Inference for on-device
|
||||||
2. [LocalInferenceImpl](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/inline/ios/inference)
|
|
||||||
|
|
||||||
```{image} ../../../_static/remote_or_local.gif
|
```{image} ../../../_static/remote_or_local.gif
|
||||||
:alt: Seamlessly switching between local, on-device inference and remote hosted inference
|
:alt: Seamlessly switching between local, on-device inference and remote hosted inference
|
||||||
|
@ -42,7 +41,7 @@ let request = Components.Schemas.CreateAgentTurnRequest(
|
||||||
// ...
|
// ...
|
||||||
```
|
```
|
||||||
|
|
||||||
Check out [iOSCalendarAssistant](https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant) for a complete app demo.
|
Check out [iOSCalendarAssistant](https://github.com/meta-llama/llama-stack-client-swift/tree/main/examples/ios_calendar_assistant) for a complete app demo.
|
||||||
|
|
||||||
## LocalInference
|
## LocalInference
|
||||||
|
|
||||||
|
@ -58,7 +57,7 @@ let inference = LocalInference(queue: runnerQueue)
|
||||||
let agents = LocalAgents(inference: self.inference)
|
let agents = LocalAgents(inference: self.inference)
|
||||||
```
|
```
|
||||||
|
|
||||||
Check out [iOSCalendarAssistantWithLocalInf](https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant) for a complete app demo.
|
Check out [iOSCalendarAssistantWithLocalInf](https://github.com/meta-llama/llama-stack-client-swift/tree/main/examples/ios_calendar_assistant) for a complete app demo.
|
||||||
|
|
||||||
### Installation
|
### Installation
|
||||||
|
|
||||||
|
@ -68,47 +67,6 @@ We're working on making LocalInference easier to set up. For now, you'll need t
|
||||||
1. Install [Cmake](https://cmake.org/) for the executorch build`
|
1. Install [Cmake](https://cmake.org/) for the executorch build`
|
||||||
1. Drag `LocalInference.xcodeproj` into your project
|
1. Drag `LocalInference.xcodeproj` into your project
|
||||||
1. Add `LocalInference` as a framework in your app target
|
1. Add `LocalInference` as a framework in your app target
|
||||||
1. Add a package dependency on https://github.com/pytorch/executorch (branch latest)
|
|
||||||
1. Add all the kernels / backends from executorch (but not exectuorch itself!) as frameworks in your app target:
|
|
||||||
- backend_coreml
|
|
||||||
- backend_mps
|
|
||||||
- backend_xnnpack
|
|
||||||
- kernels_custom
|
|
||||||
- kernels_optimized
|
|
||||||
- kernels_portable
|
|
||||||
- kernels_quantized
|
|
||||||
1. In "Build Settings" > "Other Linker Flags" > "Any iOS Simulator SDK", add:
|
|
||||||
```
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libkernels_optimized-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libkernels_custom-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libkernels_quantized-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libbackend_coreml-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libbackend_mps-simulator-release.a
|
|
||||||
```
|
|
||||||
|
|
||||||
1. In "Build Settings" > "Other Linker Flags" > "Any iOS SDK", add:
|
|
||||||
|
|
||||||
```
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libkernels_optimized-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libkernels_custom-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libkernels_quantized-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libbackend_coreml-simulator-release.a
|
|
||||||
-force_load
|
|
||||||
$(BUILT_PRODUCTS_DIR)/libbackend_mps-simulator-release.a
|
|
||||||
```
|
|
||||||
|
|
||||||
### Preparing a model
|
### Preparing a model
|
||||||
|
|
||||||
|
|
|
@ -38,9 +38,9 @@ We have a number of client-side SDKs available for different languages.
|
||||||
| **Language** | **Client SDK** | **Package** |
|
| **Language** | **Client SDK** | **Package** |
|
||||||
| :----: | :----: | :----: |
|
| :----: | :----: | :----: |
|
||||||
| Python | [llama-stack-client-python](https://github.com/meta-llama/llama-stack-client-python) | [](https://pypi.org/project/llama_stack_client/)
|
| Python | [llama-stack-client-python](https://github.com/meta-llama/llama-stack-client-python) | [](https://pypi.org/project/llama_stack_client/)
|
||||||
| Swift | [llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift) | [](https://swiftpackageindex.com/meta-llama/llama-stack-client-swift)
|
| Swift | [llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/tree/latest-release) | [](https://swiftpackageindex.com/meta-llama/llama-stack-client-swift)
|
||||||
| Node | [llama-stack-client-node](https://github.com/meta-llama/llama-stack-client-node) | [](https://npmjs.org/package/llama-stack-client)
|
| Node | [llama-stack-client-node](https://github.com/meta-llama/llama-stack-client-node) | [](https://npmjs.org/package/llama-stack-client)
|
||||||
| Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin) | [](https://central.sonatype.com/artifact/com.llama.llamastack/llama-stack-client-kotlin)
|
| Kotlin | [llama-stack-client-kotlin](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release) | [](https://central.sonatype.com/artifact/com.llama.llamastack/llama-stack-client-kotlin)
|
||||||
|
|
||||||
## Supported Llama Stack Implementations
|
## Supported Llama Stack Implementations
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue