add everyting for docs

2023-07-29 07:00:13 -07:00 · 2023-07-29 07:00:13 -07:00 · 0fe8799f94
commit 0fe8799f94
parent de45a738ee
1015 changed files with 185353 additions and 0 deletions
--- a/docs/extras/integrations/document_loaders/embaas.ipynb
+++ b/docs/extras/integrations/document_loaders/embaas.ipynb
@ -0,0 +1,167 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# Embaas\n",
+    "[embaas](https://embaas.io) is a fully managed NLP API service that offers features like embedding generation, document text extraction, document to embeddings and more. You can choose a [variety of pre-trained models](https://embaas.io/docs/models/embeddings).\n",
+    "\n",
+    "### Prerequisites\n",
+    "Create a free embaas account at [https://embaas.io/register](https://embaas.io/register) and generate an [API key](https://embaas.io/dashboard/api-keys)\n",
+    "\n",
+    "### Document Text Extraction API\n",
+    "The document text extraction API allows you to extract the text from a given document. The API supports a variety of document formats, including PDF, mp3, mp4 and more. For a full list of supported formats, check out the API docs (link below)."
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# Set API key\n",
+    "embaas_api_key = \"YOUR_API_KEY\"\n",
+    "# or set environment variable\n",
+    "os.environ[\"EMBAAS_API_KEY\"] = \"YOUR_API_KEY\""
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "#### Using a blob (bytes)"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders.embaas import EmbaasBlobLoader\n",
+    "from langchain.document_loaders.blob_loaders import Blob"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "blob_loader = EmbaasBlobLoader()\n",
+    "blob = Blob.from_path(\"example.pdf\")\n",
+    "documents = blob_loader.load(blob)"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "# You can also directly create embeddings with your preferred embeddings model\n",
+    "blob_loader = EmbaasBlobLoader(params={\"model\": \"e5-large-v2\", \"should_embed\": True})\n",
+    "blob = Blob.from_path(\"example.pdf\")\n",
+    "documents = blob_loader.load(blob)\n",
+    "\n",
+    "print(documents[0][\"metadata\"][\"embedding\"])"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "start_time": "2023-06-12T22:19:48.366886Z",
+     "end_time": "2023-06-12T22:19:48.380467Z"
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "#### Using a file"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "from langchain.document_loaders.embaas import EmbaasLoader"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "outputs": [],
+   "source": [
+    "file_loader = EmbaasLoader(file_path=\"example.pdf\")\n",
+    "documents = file_loader.load()"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "outputs": [],
+   "source": [
+    "# Disable automatic text splitting\n",
+    "file_loader = EmbaasLoader(file_path=\"example.mp3\", params={\"should_chunk\": False})\n",
+    "documents = file_loader.load()"
+   ],
+   "metadata": {
+    "collapsed": false,
+    "ExecuteTime": {
+     "start_time": "2023-06-12T22:24:31.880857Z",
+     "end_time": "2023-06-12T22:24:31.894665Z"
+    }
+   }
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "For more detailed information about the embaas document text extraction API, please refer to [the official embaas API documentation](https://embaas.io/api-reference)."
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}