A headless-Chrome web crawler that discovers same-host links and optionally saves HTML, Markdown, PDF, or screenshots. Use as a library or via the stealth-crawler CLI.
- Asynchronous, headless Chrome browsing via
pydoll - Discovers internal links starting from a root URL
- Optional content saving:
- HTML
- Markdown (via
html2text) - PDF snapshots
- PNG screenshots
- Rich progress bars with
rich - Configurable URL filtering (base, exclude)
- Pure-Python API and CLI
Install the latest stable release:
pip install stealth-crawlerOr in isolation:
pipx install stealth-crawlerOr via other tools:
-
uv
uv venv .venv source .venv/bin/activate uv pip install stealth-crawler -
Poetry
poetry add stealth-crawler
# Discover URLs only
stealth-crawler crawl https://example.com --urls-only
# Crawl and save HTML + Markdown
stealth-crawler crawl https://example.com \
--save-html --save-md \
--output-dir ./output
# Exclude specific paths
stealth-crawler crawl https://example.com \
--exclude /private,/logoutRun stealth-crawler --help for full options.
import asyncio
from stealthcrawler import StealthCrawler
crawler = StealthCrawler(
base="https://example.com",
exclude=["/admin"],
save_html=True,
save_md=True,
output_dir="export"
)
urls = asyncio.run(crawler.crawl("https://example.com"))
print(urls)| Option | CLI flag | API param | Default |
|---|---|---|---|
| Base URL(s) | --base |
base |
start URL |
| Exclude paths | --exclude |
exclude |
none |
| Save HTML | --save-html |
save_html |
False |
| Save Markdown | --save-md |
save_md |
False |
| URLs only | --urls-only |
urls_only |
False |
| Output folder | --output-dir |
output_dir |
./output |
-
Run tests:
pytest
-
Check formatting & linting:
black src tests ruff check src tests
-
Fork the repository and create a feature branch.
-
Set up your development environment:
python3 -m venv .venv source .venv/bin/activate pip install -e ".[dev]"
Or with uv:
uv venv .venv source .venv/bin/activate uv pip install -e ".[dev]"
-
Implement your changes, add tests, and run:
black src tests ruff check src tests pytest
-
Open a pull request against
main.
This project is licensed under the GNU General Public License v3.0 or later (GPL-3.0-or-later). You are free to use, modify, and redistribute under the terms of the GPL. See LICENSE for full details.