Installation¶
Requirements¶
- Python 3.10 or higher
- pip or uv package manager
Installation Options¶
Basic Installation¶
The basic installation includes DuckDB storage and HTTP-only fetching:
Or with uv:
Browser Rendering Support¶
For JavaScript-heavy sites, install with browser support:
This installs Playwright for headless browser rendering.
After installation, set up Playwright:
DynamoDB Support¶
For cloud deployments with AWS DynamoDB:
Full Installation¶
Install all optional dependencies:
Development Installation¶
For contributing to the project:
Verifying Installation¶
Test your installation:
Bash
# Check CLI is available
ragcrawl --version
# Run a simple crawl
ragcrawl crawl https://example.com --max-pages 5 --output ./test-output
Dependencies¶
Core Dependencies¶
| Package | Purpose |
|---|---|
| crawl4ai | Web fetching and HTML-to-Markdown conversion |
| duckdb | Default local storage backend |
| httpx | Async HTTP client |
| pydantic | Data validation and configuration |
| structlog | Structured logging |
| xxhash | Fast content hashing |
| tiktoken | Token counting for chunking |
| click | Command-line interface |
Optional Dependencies¶
| Package | Purpose | Extra |
|---|---|---|
| playwright | Browser rendering | [browser] |
| pynamodb | DynamoDB ORM | [dynamodb] |
Troubleshooting¶
Playwright Installation Issues¶
If you encounter issues with Playwright:
Bash
# Install system dependencies (Ubuntu/Debian)
sudo apt-get install libnss3 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2
# Install browsers
playwright install chromium
DuckDB Permission Issues¶
Ensure the storage directory is writable:
Bash
# Create with proper permissions
mkdir -p ./data
chmod 755 ./data
ragcrawl crawl https://example.com --storage ./data/crawler.duckdb
AWS Credentials for DynamoDB¶
Set up AWS credentials for DynamoDB:
Bash
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_DEFAULT_REGION=us-east-1
Or use AWS profiles: