trying to add docs

This commit is contained in:
ishaan-jaff 2023-07-29 07:06:56 -07:00
parent 0fe8799f94
commit 2cf949990e
834 changed files with 0 additions and 161273 deletions

View file

@ -1,255 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f08772b0",
"metadata": {},
"source": [
"# Alibaba Cloud MaxCompute\n",
"\n",
">[Alibaba Cloud MaxCompute](https://www.alibabacloud.com/product/maxcompute) (previously known as ODPS) is a general purpose, fully managed, multi-tenancy data processing platform for large-scale data warehousing. MaxCompute supports various data importing solutions and distributed computing models, enabling users to effectively query massive datasets, reduce production costs, and ensure data security.\n",
"\n",
"The `MaxComputeLoader` lets you execute a MaxCompute SQL query and loads the results as one document per row."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "067b7213",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Collecting pyodps\n",
" Downloading pyodps-0.11.4.post0-cp39-cp39-macosx_10_9_universal2.whl (2.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.0/2.0 MB\u001b[0m \u001b[31m1.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m0m\n",
"\u001b[?25hRequirement already satisfied: charset-normalizer>=2 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (3.1.0)\n",
"Requirement already satisfied: urllib3<2.0,>=1.26.0 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (1.26.15)\n",
"Requirement already satisfied: idna>=2.5 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (3.4)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /Users/newboy/anaconda3/envs/langchain/lib/python3.9/site-packages (from pyodps) (2023.5.7)\n",
"Installing collected packages: pyodps\n",
"Successfully installed pyodps-0.11.4.post0\n"
]
}
],
"source": [
"!pip install pyodps"
]
},
{
"cell_type": "markdown",
"id": "19641457",
"metadata": {},
"source": [
"## Basic Usage\n",
"To instantiate the loader you'll need a SQL query to execute, your MaxCompute endpoint and project name, and you access ID and secret access key. The access ID and secret access key can either be passed in direct via the `access_id` and `secret_access_key` parameters or they can be set as environment variables `MAX_COMPUTE_ACCESS_ID` and `MAX_COMPUTE_SECRET_ACCESS_KEY`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "71a0da4b",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import MaxComputeLoader"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d4770c4a",
"metadata": {},
"outputs": [],
"source": [
"base_query = \"\"\"\n",
"SELECT *\n",
"FROM (\n",
" SELECT 1 AS id, 'content1' AS content, 'meta_info1' AS meta_info\n",
" UNION ALL\n",
" SELECT 2 AS id, 'content2' AS content, 'meta_info2' AS meta_info\n",
" UNION ALL\n",
" SELECT 3 AS id, 'content3' AS content, 'meta_info3' AS meta_info\n",
") mydata;\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1616c174",
"metadata": {},
"outputs": [],
"source": [
"endpoint = \"<ENDPOINT>\"\n",
"project = \"<PROJECT>\"\n",
"ACCESS_ID = \"<ACCESS ID>\"\n",
"SECRET_ACCESS_KEY = \"<SECRET ACCESS KEY>\""
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "e5c25041",
"metadata": {},
"outputs": [],
"source": [
"loader = MaxComputeLoader.from_params(\n",
" base_query,\n",
" endpoint,\n",
" project,\n",
" access_id=ACCESS_ID,\n",
" secret_access_key=SECRET_ACCESS_KEY,\n",
")\n",
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "311e74ea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='id: 1\\ncontent: content1\\nmeta_info: meta_info1', metadata={}), Document(page_content='id: 2\\ncontent: content2\\nmeta_info: meta_info2', metadata={}), Document(page_content='id: 3\\ncontent: content3\\nmeta_info: meta_info3', metadata={})]\n"
]
}
],
"source": [
"print(data)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "a4d8c388",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"id: 1\n",
"content: content1\n",
"meta_info: meta_info1\n"
]
}
],
"source": [
"print(data[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "f2422e6c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{}\n"
]
}
],
"source": [
"print(data[0].metadata)"
]
},
{
"cell_type": "markdown",
"id": "85e07e28",
"metadata": {},
"source": [
"## Specifying Which Columns are Content vs Metadata\n",
"You can configure which subset of columns should be loaded as the contents of the Document and which as the metadata using the `page_content_columns` and `metadata_columns` parameters."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "a7b9d726",
"metadata": {},
"outputs": [],
"source": [
"loader = MaxComputeLoader.from_params(\n",
" base_query,\n",
" endpoint,\n",
" project,\n",
" page_content_columns=[\"content\"], # Specify Document page content\n",
" metadata_columns=[\"id\", \"meta_info\"], # Specify Document metadata\n",
" access_id=ACCESS_ID,\n",
" secret_access_key=SECRET_ACCESS_KEY,\n",
")\n",
"data = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "532c19e9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"content: content1\n"
]
}
],
"source": [
"print(data[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "5fe4990a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'id': 1, 'meta_info': 'meta_info1'}\n"
]
}
],
"source": [
"print(data[0].metadata)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}