diff --git a/docs/my-website/docs/tutorials/sagemaker_llms.md b/docs/my-website/docs/tutorials/sagemaker_llms.md deleted file mode 100644 index 1fe9594ab5..0000000000 --- a/docs/my-website/docs/tutorials/sagemaker_llms.md +++ /dev/null @@ -1,72 +0,0 @@ -import Image from '@theme/IdealImage'; - -# Deploy & Query Llama2-7B on Sagemaker - -This tutorial has 2 major components: -1. Deploy Llama2-7B on Jumpstart -2. Use LiteLLM to Query Llama2-7B on Sagemaker - -## Deploying Llama2-7B on AWS Sagemaker -### Pre-requisites -Ensure you have AWS quota for deploying your selected LLM. You can apply for a quota increase here: https://console.aws.amazon.com/servicequotas/home -* ml.g5.48xlarge -* ml.g5.2xlarge - -### Create an Amazon SageMaker domain to use Studio and Studio Notebooks - -- Head to AWS console https://aws.amazon.com/console/ -- Navigate to AWS Sagemaker from the console -- On AWS Sagemaker select 'Create a Sagemaker Domain' - - -### Deploying Llama2-7B using AWS Sagemaker Jumpstart - -- After creating your sagemaker domain, click 'Open Studio', which should take you to AWS sagemaker studio - -- On the left sidebar navigate to SageMaker Jumpstart -> Models, notebooks, solutions -- Now select the LLM you want to deploy by clicking 'View Model' - (in this case select Llama2-7B) - -- Click `Deploy` for the Model you want to deploy - - -- After deploying Llama2, copy your model endpoint - - -### Use LiteLLM to Query Llama2-7B on Sagemaker - -#### Prerequisites -* `pip install boto3` -* `pip install litellm` -* Create your AWS Access Key, get your `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`. You can create a new aws access key on the aws console under `Security Credentials` under your profile - -#### Querying deployed Llama2-7b -Set `model` = `sagemaker/` for `completion`. Use the model endpoint you got after deploying llama2-7b on sagemaker. If you used jumpstart your model endpoint will look like this `jumpstart-dft-meta-textgeneration-llama-2-7b` - -Code Example: -```python -from litellm import completion -os.environ['AWS_ACCESS_KEY_ID'] = "your-access-key-id" -os.environ['AWS_SECRET_ACCESS_KEY'] = "your-secret-key" - -response = completion( - model="sagemaker/jumpstart-dft-meta-textgeneration-llama-2-7b", - messages=[{'role': 'user', 'content': 'are you a llama'}], - temperature=0.2, # optional params - max_tokens=80, - ) - -``` - -That's it! Happy completion()! - -#### Next Steps: -- Add Caching: https://docs.litellm.ai/docs/caching/gpt_cache -- Add Logging and Observability to your deployed LLM: https://docs.litellm.ai/docs/observability/callbacks - - - - - - - - diff --git a/docs/my-website/sidebars.js b/docs/my-website/sidebars.js index 3c3e1cbf97..ac9586d769 100644 --- a/docs/my-website/sidebars.js +++ b/docs/my-website/sidebars.js @@ -254,7 +254,6 @@ const sidebars = { "tutorials/huggingface_tutorial", "tutorials/TogetherAI_liteLLM", "tutorials/finetuned_chat_gpt", - "tutorials/sagemaker_llms", "tutorials/text_completion", "tutorials/first_playground", "tutorials/model_fallbacks",