Automated B2B Lead Generation Tool
Find qualified leads, extract emails, scrape company data, and push to your CRM
Features β’ Installation β’ Quick Start β’ CLI Commands β’ CRM Integration β’ Contributing
- Google Search Integration - Find companies by industry, location, and keywords
- LinkedIn Scraping - Extract company profiles and information
- Website Scraping - Deep scrape websites for contact information
- Pattern-Based Discovery - Generate potential emails from name + domain
- Confidence Scoring - Prioritize high-confidence email addresses
- Multi-Level Validation - Syntax, DNS MX, and optional SMTP verification
- Comprehensive Profiles - Name, industry, size, location, social links
- Technology Detection - Identify tech stack from website analysis
- Structured Data Extraction - JSON-LD and meta tag parsing
- CSV Export - Excel-compatible with customizable columns
- Excel Export - Formatted spreadsheets with multiple sheets
- JSON Export - API-ready format with NDJSON support
- Cold Outreach - Personalized first-touch templates
- Follow-ups - Multi-stage follow-up sequences
- Variable Substitution - Dynamic personalization with
{{variables}}
- Salesforce - Create Leads, Contacts, and Accounts
- HubSpot - Create Contacts and Companies with associations
- Extensible - Easy to add new CRM adapters
- Python 3.9 or higher
- pip package manager
# Clone the repository
git clone https://github.com/yourusername/sales-lead-scraper-tool.git
cd sales-lead-scraper-tool
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Install as editable package (optional)
pip install -e .- Copy the example configuration:
cp .env.example .env
cp config/config.example.yaml config/config.yaml- Edit
.envwith your CRM credentials:
# Salesforce
SALESFORCE_USERNAME=your_username
SALESFORCE_PASSWORD=your_password
SALESFORCE_SECURITY_TOKEN=your_token
# HubSpot
HUBSPOT_ACCESS_TOKEN=your_token# Search for software companies
python -m cli.main search "software companies"
# Search with filters
python -m cli.main search "marketing agencies" -l "New York" -n 50
# Export results to Excel
python -m cli.main search "saas startups" -f excel -o leads.xlsx# Basic extraction
python -m cli.main extract https://example.com
# Deep scrape with email validation
python -m cli.main extract https://example.com --deep --validate# Generate potential emails from name
python -m cli.main generate John Doe acme.com
# With DNS validation
python -m cli.main generate John Doe acme.com --validate# Push leads to Salesforce
python -m cli.main push leads.csv --crm salesforce --type lead
# Dry run to preview
python -m cli.main push leads.csv --crm hubspot --dry-run| Command | Description | Example |
|---|---|---|
search |
Search for leads by keyword | search "tech startups" -l "SF" |
extract |
Extract data from website | extract example.com --deep |
generate |
Generate email patterns | generate John Doe acme.com |
export |
Convert between formats | export leads.json -f csv |
push |
Push leads to CRM | push leads.csv --crm hubspot |
template |
Generate email from template | template cold_outreach -v name=John |
validate-config |
Check configuration status | validate-config |
python -m cli.main --help # Show all commands
python -m cli.main search --help # Show command options
python -m cli.main --version # Show versionfrom src.scrapers import GoogleScraper
with GoogleScraper() as scraper:
results = scraper.search_companies(
industry="technology",
location="San Francisco",
max_results=50
)
for company in results:
print(f"{company['title']} - {company['url']}")from src.scrapers import WebsiteScraper
from src.extractors import EmailExtractor
from src.validators import EmailValidator, ValidationLevel
# Scrape website
with WebsiteScraper() as scraper:
data = scraper.scrape_website("https://example.com", deep_scrape=True)
# Extract and validate emails
extractor = EmailExtractor()
validator = EmailValidator()
for email in data['emails']:
result = validator.validate(email, ValidationLevel.DNS)
if result.is_valid:
print(f"β {email}")from src.extractors import EmailExtractor
extractor = EmailExtractor()
patterns = extractor.generate_patterns("John", "Doe", "acme.com")
for p in patterns:
print(f"{p.email} (confidence: {p.confidence:.0%})")from src.exporters import CSVExporter, ExcelExporter
# Export to CSV
csv_exporter = CSVExporter()
filepath = csv_exporter.export(leads, "my_leads.csv")
# Export to Excel with formatting
excel_exporter = ExcelExporter()
filepath = excel_exporter.export_with_summary(leads)from src.integrations import SalesforceIntegration, HubSpotIntegration
from src.integrations.base_crm import CRMRecord, RecordType
# Create record
record = CRMRecord(
record_type=RecordType.LEAD,
first_name="John",
last_name="Doe",
email="[email protected]",
company_name="Acme Inc"
)
# Push to Salesforce
with SalesforceIntegration() as sf:
result = sf.create_lead(record)
print(f"Created lead: {result.record_id}")
# Push to HubSpot
with HubSpotIntegration() as hs:
result = hs.create_contact(record)
print(f"Created contact: {result.record_id}")from src.templates import TemplateEngine
engine = TemplateEngine()
# List available templates
print(engine.list_templates())
# Render template
email = engine.render_template("cold_outreach", {
"first_name": "John",
"company_name": "Acme Inc",
"sender_name": "Jane Smith",
"sender_company": "Our Company"
})
print(email)- Get your Security Token from Salesforce Settings
- Add credentials to
.env:
SALESFORCE_USERNAME=[email protected]
SALESFORCE_PASSWORD=your_password
SALESFORCE_SECURITY_TOKEN=your_token
SALESFORCE_DOMAIN=login # Use "test" for sandbox- Create a Private App in HubSpot Settings
- Grant scopes:
crm.objects.contacts.write,crm.objects.companies.write - Add access token to
.env:
HUBSPOT_ACCESS_TOKEN=pat-na1-xxxxxxxxsales-lead-scraper-tool/
βββ cli/ # Command-line interface
β βββ main.py # CLI entry point
βββ config/ # Configuration
β βββ settings.py # Settings management
β βββ config.example.yaml
βββ src/
β βββ scrapers/ # Web scrapers
β β βββ google_scraper.py
β β βββ linkedin_scraper.py
β β βββ website_scraper.py
β βββ extractors/ # Data extractors
β β βββ email_extractor.py
β β βββ company_extractor.py
β βββ validators/ # Data validators
β β βββ email_validator.py
β βββ exporters/ # Export formats
β β βββ csv_exporter.py
β β βββ excel_exporter.py
β β βββ json_exporter.py
β βββ templates/ # Email templates
β β βββ template_engine.py
β β βββ email_templates/
β βββ integrations/ # CRM integrations
β β βββ salesforce.py
β β βββ hubspot.py
β βββ utils/ # Utilities
β βββ logger.py
β βββ rate_limiter.py
β βββ proxy_manager.py
βββ tests/ # Test suite
βββ output/ # Default export directory
βββ requirements.txt
βββ README.md
# Run all tests
python -m pytest tests/ -v
# Run with coverage
python -m pytest tests/ -v --cov=src
# Run specific test file
python -m pytest tests/test_validators.py -vscraper:
rate_limit_delay: 2.0 # Seconds between requests
max_retries: 3 # Retry failed requests
timeout: 30 # Request timeout
use_proxy: false # Enable proxy rotation
respect_robots_txt: true # Honor robots.txt
email:
validate_dns: true # Check MX records
validate_smtp: false # SMTP verification (slow)
common_patterns: # Email pattern templates
- "{first}.{last}"
- "{first}{last}"
- "{f}{last}"
export:
output_directory: "./output"
csv_delimiter: ","
include_timestamp: true
default_format: "csv"Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is intended for legitimate B2B lead generation purposes. Always:
- Respect website terms of service and robots.txt
- Comply with data protection regulations (GDPR, CCPA)
- Use rate limiting to avoid overloading servers
- Only contact businesses who may genuinely benefit from your services
- π« Open an issue for bug reports or feature requests
- π¬ Start a discussion for questions or ideas
Built with β€οΈ for Sales & Marketing Teams