[docs] update documentations (#356)

* move docs -> source * Add files via upload * mv image * Add files via upload * colocate iOS setup doc * delete image * Add files via upload * fix * delete image * Add files via upload * Update developer_cookbook.md * toctree * wip subfolder * docs update * subfolder * updates * name * updates * index * updates * refactor structure * depth * docs * content * docs * getting started * distributions * fireworks * fireworks * update * theme * theme * theme * pdj theme * pytorch theme * css * theme * agents example * format * index * headers * copy button * test tabs * test tabs * fix * tabs * tab * tabs * sphinx_design * quick start commands * size * width * css * css * download models * asthetic fix * tab format * update * css * width * css * docs * tab based * tab * tabs * docs * style * image * css * color * typo * update docs * missing links * list templates * links * links update * troubleshooting * fix * distributions * docs * fix table * kill llamastack-local-gpu/cpu * Update index.md * Update index.md * mv ios_setup.md * Update ios_setup.md * Add remote_or_local.gif * Update ios_setup.md * release notes * typos * Add ios_setup to index * nav bar * hide torctree * ios image * links update * rename * rename * docs * rename * links * distributions * distributions * distributions * distributions * remove release * remote --------- Co-authored-by: dltn <6599399+dltn@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2024-11-04 16:52:38 -08:00 · 2024-11-04 16:52:38 -08:00 · c810a4184d
commit c810a4184d
parent ac93dd89cf
37 changed files with 1777 additions and 2154 deletions
--- a/docs/source/getting_started/distributions/ondevice_distro/index.md
+++ b/docs/source/getting_started/distributions/ondevice_distro/index.md
@ -0,0 +1,9 @@
+# On-Device Distribution
+
+On-device distributions are Llama Stack distributions that run locally on your iOS / Android device.
+
+```{toctree}
+:maxdepth: 1
+
+ios_sdk
+```
--- a/docs/source/getting_started/distributions/ondevice_distro/ios_sdk.md
+++ b/docs/source/getting_started/distributions/ondevice_distro/ios_sdk.md
@ -0,0 +1,176 @@
+# iOS SDK
+
+We offer both remote and on-device use of Llama Stack in Swift via two components:
+
+1. [llama-stack-client-swift](https://github.com/meta-llama/llama-stack-client-swift/)
+2. [LocalInferenceImpl](https://github.com/meta-llama/llama-stack/tree/main/llama_stack/providers/impls/ios/inference)
+
+```{image} ../../../../_static/remote_or_local.gif
+:alt: Seamlessly switching between local, on-device inference and remote hosted inference
+:width: 412px
+:align: center
+```
+
+## Remote Only
+
+If you don't want to run inference on-device, then you can connect to any hosted Llama Stack distribution with #1.
+
+1. Add `https://github.com/meta-llama/llama-stack-client-swift/` as a Package Dependency in Xcode
+
+2. Add `LlamaStackClient` as a framework to your app target
+
+3. Call an API:
+
+```swift
+import LlamaStackClient
+
+let agents = RemoteAgents(url: URL(string: "http://localhost:5000")!)
+let request = Components.Schemas.CreateAgentTurnRequest(
+        agent_id: agentId,
+        messages: [
+          .UserMessage(Components.Schemas.UserMessage(
+            content: .case1("Hello Llama!"),
+            role: .user
+          ))
+        ],
+        session_id: self.agenticSystemSessionId,
+        stream: true
+      )
+
+      for try await chunk in try await agents.createTurn(request: request) {
+        let payload = chunk.event.payload
+      // ...
+```
+
+Check out [iOSCalendarAssistant](https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant) for a complete app demo.
+
+## LocalInference
+
+LocalInference provides a local inference implementation powered by [executorch](https://github.com/pytorch/executorch/).
+
+Llama Stack currently supports on-device inference for iOS with Android coming soon. You can run on-device inference on Android today using [executorch](https://github.com/pytorch/executorch/tree/main/examples/demo-apps/android/LlamaDemo), PyTorch’s on-device inference library.
+
+The APIs *work the same as remote* – the only difference is you'll instead use the `LocalAgents` / `LocalInference` classes and pass in a `DispatchQueue`:
+
+```swift
+private let runnerQueue = DispatchQueue(label: "org.llamastack.stacksummary")
+let inference = LocalInference(queue: runnerQueue)
+let agents = LocalAgents(inference: self.inference)
+```
+
+Check out [iOSCalendarAssistantWithLocalInf](https://github.com/meta-llama/llama-stack-apps/tree/main/examples/ios_calendar_assistant) for a complete app demo.
+
+### Installation
+
+We're working on making LocalInference easier to set up. For now, you'll need to import it via `.xcframework`:
+
+1. Clone the executorch submodule in this repo and its dependencies: `git submodule update --init --recursive`
+1. Install [Cmake](https://cmake.org/) for the executorch build`
+1. Drag `LocalInference.xcodeproj` into your project
+1. Add `LocalInference` as a framework in your app target
+1. Add a package dependency on https://github.com/pytorch/executorch (branch latest)
+1. Add all the kernels / backends from executorch (but not exectuorch itself!) as frameworks in your app target:
+    - backend_coreml
+    - backend_mps
+    - backend_xnnpack
+    - kernels_custom
+    - kernels_optimized
+    - kernels_portable
+    - kernels_quantized
+1. In "Build Settings" > "Other Linker Flags" > "Any iOS Simulator SDK", add:
+    ```
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libkernels_optimized-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libkernels_custom-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libkernels_quantized-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libbackend_coreml-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libbackend_mps-simulator-release.a
+    ```
+
+1. In "Build Settings" > "Other Linker Flags" > "Any iOS SDK", add:
+
+    ```
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libkernels_optimized-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libkernels_custom-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libkernels_quantized-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libbackend_xnnpack-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libbackend_coreml-simulator-release.a
+    -force_load
+    $(BUILT_PRODUCTS_DIR)/libbackend_mps-simulator-release.a
+    ```
+
+### Preparing a model
+
+1. Prepare a `.pte` file [following the executorch docs](https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#step-2-prepare-model)
+2. Bundle the `.pte` and `tokenizer.model` file into your app
+
+We now support models quantized using SpinQuant and QAT-LoRA which offer a significant performance boost (demo app on iPhone 13 Pro):
+
+
+| Llama 3.2 1B | Tokens / Second (total) |  | Time-to-First-Token (sec) |  |
+| :---- | :---- | :---- | :---- | :---- |
+|  | Haiku | Paragraph | Haiku | Paragraph |
+| BF16 | 2.2 | 2.5 | 2.3 | 1.9 |
+| QAT+LoRA | 7.1 | 3.3 | 0.37 | 0.24 |
+| SpinQuant | 10.1 | 5.2 | 0.2 | 0.2 |
+
+
+### Using LocalInference
+
+1. Instantiate LocalInference with a DispatchQueue. Optionally, pass it into your agents service:
+
+```swift
+  init () {
+    runnerQueue = DispatchQueue(label: "org.meta.llamastack")
+    inferenceService = LocalInferenceService(queue: runnerQueue)
+    agentsService = LocalAgentsService(inference: inferenceService)
+  }
+```
+
+2. Before making any inference calls, load your model from your bundle:
+
+```swift
+let mainBundle = Bundle.main
+inferenceService.loadModel(
+    modelPath: mainBundle.url(forResource: "llama32_1b_spinquant", withExtension: "pte"),
+    tokenizerPath: mainBundle.url(forResource: "tokenizer", withExtension: "model"),
+    completion: {_ in } // use to handle load failures
+)
+```
+
+3. Make inference calls (or agents calls) as you normally would with LlamaStack:
+
+```
+for await chunk in try await agentsService.initAndCreateTurn(
+    messages: [
+    .UserMessage(Components.Schemas.UserMessage(
+        content: .case1("Call functions as needed to handle any actions in the following text:\n\n" + text),
+        role: .user))
+    ]
+) {
+```
+
+### Troubleshooting
+
+If you receive errors like "missing package product" or "invalid checksum", try cleaning the build folder and resetting the Swift package cache:
+
+(Opt+Click) Product > Clean Build Folder Immediately
+
+```
+rm -rf \
+  ~/Library/org.swift.swiftpm \
+  ~/Library/Caches/org.swift.swiftpm \
+  ~/Library/Caches/com.apple.dt.Xcode \
+  ~/Library/Developer/Xcode/DerivedData
+```