Contributing to ragcrawl¶
Thanks for taking the time to contribute! 🎉
Code of Conduct¶
By participating in this project, you agree to follow the rules in our Code of Conduct.
How to Contribute¶
Report Bugs¶
Open an Issue with: - Reproduction steps - Logs (redact any secrets) - Environment details (OS, Python version, ragcrawl version)
Request Features¶
Open an Issue describing: - The use case and desired behavior - Any constraints (scale, compliance, auth, JS rendering, etc.)
Submit Pull Requests¶
- Fork the repository
- Create a feature branch
- Make your changes
- Add/update tests
- Update documentation
- Open a Pull Request
Development Setup¶
Prerequisites¶
- Python 3.10+ (3.11 or 3.12 recommended)
uvrecommended (butpipworks too)
Setup with uv (recommended)¶
git clone https://github.com/vamshirapolu/ragcrawl.git
cd ragcrawl
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"
Setup with pip¶
git clone https://github.com/vamshirapolu/ragcrawl.git
cd ragcrawl
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -U pip
pip install -e ".[dev]"
Running the Project¶
# Show CLI help
ragcrawl --help
# Run a simple crawl
ragcrawl crawl https://example.com --max-pages 10
Running Tests¶
# Run all tests
pytest
# Run with coverage
pytest --cov=ragcrawl --cov-report=html
# Run specific test file
pytest tests/unit/test_url_normalizer.py
# Run with verbose output
pytest -v
Linting and Formatting¶
We use Ruff for linting and formatting:
Type Checking¶
We use mypy for type checking:
Documentation¶
We use MkDocs with Material theme:
# Serve docs locally
mkdocs serve
# Build docs
mkdocs build
# Deploy docs (maintainers only)
mkdocs gh-deploy
Pull Request Guidelines¶
- Keep PRs focused - One feature/fix per PR when possible
- Add/adjust tests - When behavior changes
- Update docs - If you add/modify public-facing behavior
- Backwards compatibility - Keep it for public APIs, or clearly call out breaking changes
- Write clear commit messages - See guidance below
Commit Message Guidance¶
We follow conventional commits (suggested):
feat: ...- New featurefix: ...- Bug fixdocs: ...- Documentation onlytest: ...- Adding or updating testsrefactor: ...- Code refactoringchore: ...- Build/tooling changes
Examples:
feat: add support for custom user agents
fix: handle 404 errors in sync job
docs: update installation instructions
test: add tests for URL normalization
Code Style¶
- Follow PEP 8 (enforced by Ruff)
- Use type hints for all functions
- Write docstrings in Google style
- Keep functions focused and testable
- Prefer composition over inheritance
Testing Guidelines¶
- Write unit tests for new functionality
- Use pytest fixtures for common setup
- Mock external dependencies (HTTP, file system, etc.)
- Aim for high coverage, but focus on critical paths
- Add integration tests for end-to-end scenarios
Security¶
Please do not open public issues for security vulnerabilities. See Support for the preferred reporting path.
Questions?¶
If you have questions about contributing, feel free to: - Open a GitHub Discussion (if enabled) - Ask in an Issue - Check our Support page
Thank you for contributing to ragcrawl! 🚀