mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-05 04:17:32 +00:00
306 lines
7.7 KiB
Markdown
306 lines
7.7 KiB
Markdown
# LangChain + Llama Stack Document Processing
|
|
|
|
This repository contains two different implementations of document processing using LangChain and Llama Stack:
|
|
|
|
1. **`langchain_llamastack.py`** - Interactive CLI version
|
|
2. **`langchain_llamastack_ray.py`** - Ray Serve API version
|
|
|
|
Both versions provide AI-powered document processing capabilities including summarization, fact extraction, and question-answering.
|
|
|
|
---
|
|
|
|
## 📋 Prerequisites
|
|
|
|
### System Requirements
|
|
- Python 3.12+
|
|
- Ray Serve (for API version)
|
|
- Llama Stack server running on `http://localhost:8321/`
|
|
- Ollama or compatible model server
|
|
|
|
### Required Python Packages
|
|
```bash
|
|
pip install llama-stack-client langchain langchain-core langchain-community
|
|
pip install beautifulsoup4 markdownify readability-lxml requests
|
|
pip install ray[serve] starlette # For Ray Serve version only
|
|
```
|
|
|
|
### Environment Setup
|
|
```bash
|
|
# Create and activate virtual environment
|
|
python3.12 -m venv llama-env-py312
|
|
source llama-env-py312/bin/activate
|
|
|
|
# Install dependencies
|
|
pip install llama-stack-client langchain langchain-core langchain-community beautifulsoup4 markdownify readability-lxml requests ray[serve] starlette
|
|
```
|
|
|
|
---
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Start Llama Stack Server
|
|
Before running either version, ensure your Llama Stack server is running:
|
|
```bash
|
|
# Start Llama Stack server (example)
|
|
llama stack run your-config --port 8321
|
|
```
|
|
|
|
---
|
|
|
|
## 📖 Option 1: Interactive CLI Version (`langchain_llamastack_updated.py`)
|
|
|
|
### Features
|
|
- ✅ Interactive command-line interface
|
|
- ✅ Document loading from URLs and PDFs
|
|
- ✅ AI-powered summarization and fact extraction
|
|
- ✅ Question-answering based on document content
|
|
- ✅ Session-based document storage
|
|
|
|
### How to Run
|
|
```bash
|
|
# Activate environment
|
|
source llama-env-py312/bin/activate
|
|
|
|
# Run the interactive CLI
|
|
cd /home/omara/langchain_llamastack
|
|
python langchain_llamastack_updated.py
|
|
```
|
|
|
|
### Usage Commands
|
|
Once running, you can use these interactive commands:
|
|
|
|
```
|
|
🎯 Interactive Document Processing Demo
|
|
Commands:
|
|
load <url_or_path> - Process a document
|
|
ask <question> - Ask about the document
|
|
summary - Show document summary
|
|
facts - Show extracted facts
|
|
help - Show commands
|
|
quit - Exit demo
|
|
```
|
|
|
|
### Example Session
|
|
```
|
|
> load https://en.wikipedia.org/wiki/Artificial_intelligence
|
|
📄 Loading document from: https://en.wikipedia.org/wiki/Artificial_intelligence
|
|
✅ Loaded 45,832 characters
|
|
📝 Generating summary...
|
|
🔍 Extracting key facts...
|
|
✅ Processing complete!
|
|
|
|
> summary
|
|
📝 Summary:
|
|
Artificial intelligence (AI) is the simulation of human intelligence...
|
|
|
|
> ask What are the main types of AI?
|
|
💬 Q: What are the main types of AI?
|
|
📝 A: Based on the document, the main types of AI include...
|
|
|
|
> facts
|
|
🔍 Key Facts:
|
|
- AI was founded as an academic discipline in 1956
|
|
- Machine learning is a subset of AI...
|
|
|
|
> quit
|
|
👋 Thanks for exploring LangChain chains!
|
|
```
|
|
|
|
---
|
|
|
|
## 🌐 Option 2: Ray Serve API Version (`langchain_llamastack_ray.py`)
|
|
|
|
### Features
|
|
- ✅ RESTful HTTP API
|
|
- ✅ Persistent service (runs indefinitely)
|
|
- ✅ Multiple endpoints for different operations
|
|
- ✅ JSON request/response format
|
|
- ✅ Concurrent request handling
|
|
|
|
### How to Run
|
|
```bash
|
|
# Activate environment
|
|
source llama-env-py312/bin/activate
|
|
|
|
# Start the Ray Serve API
|
|
cd /home/omara/langchain_llamastack
|
|
python langchain_llamastack_ray.py
|
|
```
|
|
|
|
### Service Endpoints
|
|
|
|
| Method | Endpoint | Description | Parameters |
|
|
|--------|----------|-------------|------------|
|
|
| GET | `/` | Service status | None |
|
|
| POST | `/process` | Process document | `{"source": "url_or_path"}` |
|
|
| POST | `/ask` | Ask question | `{"question": "text", "source": "optional"}` |
|
|
| GET | `/summary` | Get summary | `?source=url` (optional) |
|
|
| GET | `/facts` | Get facts | `?source=url` (optional) |
|
|
| GET | `/docs` | List documents | None |
|
|
|
|
### API Usage Examples
|
|
|
|
#### Using curl:
|
|
```bash
|
|
# Check service status
|
|
curl http://localhost:8000/
|
|
|
|
# Process a document
|
|
curl -X POST http://localhost:8000/process \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"source": "https://en.wikipedia.org/wiki/Machine_learning"}'
|
|
|
|
# Ask a question
|
|
curl -X POST http://localhost:8000/ask \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"question": "What is machine learning?"}'
|
|
|
|
# Get summary
|
|
curl http://localhost:8000/summary
|
|
|
|
# Get facts
|
|
curl http://localhost:8000/facts
|
|
|
|
# List all processed documents
|
|
curl http://localhost:8000/docs
|
|
```
|
|
|
|
#### Using Python requests:
|
|
```python
|
|
import requests
|
|
|
|
# Process a document
|
|
response = requests.post(
|
|
"http://localhost:8000/process",
|
|
json={"source": "https://en.wikipedia.org/wiki/Deep_learning"}
|
|
)
|
|
print(response.json())
|
|
|
|
# Ask a question
|
|
response = requests.post(
|
|
"http://localhost:8000/ask",
|
|
json={"question": "What are neural networks?"}
|
|
)
|
|
print(response.json())
|
|
|
|
# Get facts
|
|
response = requests.get("http://localhost:8000/facts")
|
|
print(response.json())
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Configuration
|
|
|
|
### Model Configuration
|
|
Both versions use these models by default:
|
|
- **Model ID**: `llama3.2:3b`
|
|
- **Llama Stack URL**: `http://localhost:8321/`
|
|
|
|
To change the model, edit the `model_id` parameter in the respective files.
|
|
|
|
### Supported Document Types
|
|
- ✅ **URLs**: Any web page (extracted using readability)
|
|
- ✅ **PDF files**: Local or remote PDF documents
|
|
- ❌ Plain text files (can be added if needed)
|
|
|
|
---
|
|
|
|
## 🛠️ Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### 1. Connection Refused to Llama Stack
|
|
**Error**: `Connection refused to http://localhost:8321/`
|
|
**Solution**:
|
|
- Ensure Llama Stack server is running
|
|
- Check if port 8321 is correct
|
|
- Verify network connectivity
|
|
|
|
#### 2. Model Not Found
|
|
**Error**: `Model not found: llama3.2:3b`
|
|
**Solution**:
|
|
- Check available models: `curl http://localhost:8321/models/list`
|
|
- Update `model_id` in the code to match available models
|
|
|
|
#### 3. Ray Serve Port Already in Use
|
|
**Error**: `Port 8000 already in use`
|
|
**Solution**:
|
|
```bash
|
|
# Kill process using port 8000
|
|
lsof -ti :8000 | xargs kill -9
|
|
|
|
# Or use a different port by modifying the code
|
|
```
|
|
|
|
#### 4. Missing Dependencies
|
|
**Error**: `ModuleNotFoundError: No module named 'ray'`
|
|
**Solution**:
|
|
```bash
|
|
pip install ray[serve] starlette
|
|
```
|
|
|
|
### Debug Mode
|
|
To enable verbose logging, add this to the beginning of either file:
|
|
```python
|
|
import logging
|
|
logging.basicConfig(level=logging.DEBUG)
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Performance Notes
|
|
|
|
### CLI Version
|
|
- **Pros**: Simple to use, interactive, good for testing
|
|
- **Cons**: Single-threaded, session-based only
|
|
- **Best for**: Development, testing, manual document analysis
|
|
|
|
### Ray Serve Version
|
|
- **Pros**: Concurrent requests, persistent service, API integration
|
|
- **Cons**: More complex setup, requires Ray
|
|
- **Best for**: Production, integration with other services, high throughput
|
|
|
|
---
|
|
|
|
## 🛑 Stopping Services
|
|
|
|
### CLI Version
|
|
- Press `Ctrl+C` or type `quit` in the interactive prompt
|
|
|
|
### Ray Serve Version
|
|
- Press `Ctrl+C` in the terminal running the service
|
|
- The service will gracefully shutdown and clean up resources
|
|
|
|
---
|
|
|
|
## 📝 Examples
|
|
|
|
### CLI Workflow
|
|
1. Start: `python langchain_llamastack_updated.py`
|
|
2. Load document: `load https://arxiv.org/pdf/2103.00020.pdf`
|
|
3. Get summary: `summary`
|
|
4. Ask questions: `ask What are the main contributions?`
|
|
5. Exit: `quit`
|
|
|
|
### API Workflow
|
|
1. Start: `python langchain_llamastack_ray.py`
|
|
2. Process: `curl -X POST http://localhost:8000/process -d '{"source": "https://example.com"}'`
|
|
3. Query: `curl -X POST http://localhost:8000/ask -d '{"question": "What is this about?"}'`
|
|
4. Stop: `Ctrl+C`
|
|
|
|
---
|
|
|
|
## 🤝 Contributing
|
|
|
|
To extend functionality:
|
|
1. Add new prompt templates for different analysis types
|
|
2. Support additional document formats
|
|
3. Add caching for processed documents
|
|
4. Implement user authentication for API version
|
|
|
|
---
|
|
|
|
## 📜 License
|
|
|
|
This project is for educational and research purposes.
|