Skip to content

Automated B2B lead generation tool built with Python. Find qualified leads, extract emails, scrape company data, and export to CSV. Includes email templates and CRM integrations (Salesforce, HubSpot). Perfect for sales teams and marketers.

License

Notifications You must be signed in to change notification settings

codiebyheaart/sales-lead-scraper-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ B2B Lead Scraper

Python License Platform

Automated B2B Lead Generation Tool

Find qualified leads, extract emails, scrape company data, and push to your CRM

Features β€’ Installation β€’ Quick Start β€’ CLI Commands β€’ CRM Integration β€’ Contributing


✨ Features

πŸ” Lead Discovery

  • Google Search Integration - Find companies by industry, location, and keywords
  • LinkedIn Scraping - Extract company profiles and information
  • Website Scraping - Deep scrape websites for contact information

πŸ“§ Email Extraction

  • Pattern-Based Discovery - Generate potential emails from name + domain
  • Confidence Scoring - Prioritize high-confidence email addresses
  • Multi-Level Validation - Syntax, DNS MX, and optional SMTP verification

🏒 Company Data

  • Comprehensive Profiles - Name, industry, size, location, social links
  • Technology Detection - Identify tech stack from website analysis
  • Structured Data Extraction - JSON-LD and meta tag parsing

πŸ“Š Export Options

  • CSV Export - Excel-compatible with customizable columns
  • Excel Export - Formatted spreadsheets with multiple sheets
  • JSON Export - API-ready format with NDJSON support

πŸ“ Email Templates

  • Cold Outreach - Personalized first-touch templates
  • Follow-ups - Multi-stage follow-up sequences
  • Variable Substitution - Dynamic personalization with {{variables}}

πŸ”— CRM Integrations

  • Salesforce - Create Leads, Contacts, and Accounts
  • HubSpot - Create Contacts and Companies with associations
  • Extensible - Easy to add new CRM adapters

πŸ“¦ Installation

Prerequisites

  • Python 3.9 or higher
  • pip package manager

Quick Install

# Clone the repository
git clone https://github.com/yourusername/sales-lead-scraper-tool.git
cd sales-lead-scraper-tool

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Windows:
venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install as editable package (optional)
pip install -e .

Configuration

  1. Copy the example configuration:
cp .env.example .env
cp config/config.example.yaml config/config.yaml
  1. Edit .env with your CRM credentials:
# Salesforce
SALESFORCE_USERNAME=your_username
SALESFORCE_PASSWORD=your_password
SALESFORCE_SECURITY_TOKEN=your_token

# HubSpot
HUBSPOT_ACCESS_TOKEN=your_token

πŸš€ Quick Start

Search for Leads

# Search for software companies
python -m cli.main search "software companies"

# Search with filters
python -m cli.main search "marketing agencies" -l "New York" -n 50

# Export results to Excel
python -m cli.main search "saas startups" -f excel -o leads.xlsx

Extract Emails from Website

# Basic extraction
python -m cli.main extract https://example.com

# Deep scrape with email validation
python -m cli.main extract https://example.com --deep --validate

Generate Email Patterns

# Generate potential emails from name
python -m cli.main generate John Doe acme.com

# With DNS validation
python -m cli.main generate John Doe acme.com --validate

Push to CRM

# Push leads to Salesforce
python -m cli.main push leads.csv --crm salesforce --type lead

# Dry run to preview
python -m cli.main push leads.csv --crm hubspot --dry-run

πŸ’» CLI Commands

Command Description Example
search Search for leads by keyword search "tech startups" -l "SF"
extract Extract data from website extract example.com --deep
generate Generate email patterns generate John Doe acme.com
export Convert between formats export leads.json -f csv
push Push leads to CRM push leads.csv --crm hubspot
template Generate email from template template cold_outreach -v name=John
validate-config Check configuration status validate-config

Global Options

python -m cli.main --help           # Show all commands
python -m cli.main search --help    # Show command options
python -m cli.main --version        # Show version

πŸ“‹ Python API Usage

Search for Companies

from src.scrapers import GoogleScraper

with GoogleScraper() as scraper:
    results = scraper.search_companies(
        industry="technology",
        location="San Francisco",
        max_results=50
    )
    
for company in results:
    print(f"{company['title']} - {company['url']}")

Extract Emails

from src.scrapers import WebsiteScraper
from src.extractors import EmailExtractor
from src.validators import EmailValidator, ValidationLevel

# Scrape website
with WebsiteScraper() as scraper:
    data = scraper.scrape_website("https://example.com", deep_scrape=True)

# Extract and validate emails
extractor = EmailExtractor()
validator = EmailValidator()

for email in data['emails']:
    result = validator.validate(email, ValidationLevel.DNS)
    if result.is_valid:
        print(f"βœ“ {email}")

Generate Email Patterns

from src.extractors import EmailExtractor

extractor = EmailExtractor()
patterns = extractor.generate_patterns("John", "Doe", "acme.com")

for p in patterns:
    print(f"{p.email} (confidence: {p.confidence:.0%})")

Export to CSV

from src.exporters import CSVExporter, ExcelExporter

# Export to CSV
csv_exporter = CSVExporter()
filepath = csv_exporter.export(leads, "my_leads.csv")

# Export to Excel with formatting
excel_exporter = ExcelExporter()
filepath = excel_exporter.export_with_summary(leads)

Push to CRM

from src.integrations import SalesforceIntegration, HubSpotIntegration
from src.integrations.base_crm import CRMRecord, RecordType

# Create record
record = CRMRecord(
    record_type=RecordType.LEAD,
    first_name="John",
    last_name="Doe",
    email="[email protected]",
    company_name="Acme Inc"
)

# Push to Salesforce
with SalesforceIntegration() as sf:
    result = sf.create_lead(record)
    print(f"Created lead: {result.record_id}")

# Push to HubSpot
with HubSpotIntegration() as hs:
    result = hs.create_contact(record)
    print(f"Created contact: {result.record_id}")

Use Email Templates

from src.templates import TemplateEngine

engine = TemplateEngine()

# List available templates
print(engine.list_templates())

# Render template
email = engine.render_template("cold_outreach", {
    "first_name": "John",
    "company_name": "Acme Inc",
    "sender_name": "Jane Smith",
    "sender_company": "Our Company"
})

print(email)

πŸ”— CRM Integration

Salesforce Setup

  1. Get your Security Token from Salesforce Settings
  2. Add credentials to .env:
SALESFORCE_USERNAME=[email protected]
SALESFORCE_PASSWORD=your_password
SALESFORCE_SECURITY_TOKEN=your_token
SALESFORCE_DOMAIN=login  # Use "test" for sandbox

HubSpot Setup

  1. Create a Private App in HubSpot Settings
  2. Grant scopes: crm.objects.contacts.write, crm.objects.companies.write
  3. Add access token to .env:
HUBSPOT_ACCESS_TOKEN=pat-na1-xxxxxxxx

πŸ“ Project Structure

sales-lead-scraper-tool/
β”œβ”€β”€ cli/                    # Command-line interface
β”‚   └── main.py            # CLI entry point
β”œβ”€β”€ config/                 # Configuration
β”‚   β”œβ”€β”€ settings.py        # Settings management
β”‚   └── config.example.yaml
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ scrapers/          # Web scrapers
β”‚   β”‚   β”œβ”€β”€ google_scraper.py
β”‚   β”‚   β”œβ”€β”€ linkedin_scraper.py
β”‚   β”‚   └── website_scraper.py
β”‚   β”œβ”€β”€ extractors/        # Data extractors
β”‚   β”‚   β”œβ”€β”€ email_extractor.py
β”‚   β”‚   └── company_extractor.py
β”‚   β”œβ”€β”€ validators/        # Data validators
β”‚   β”‚   └── email_validator.py
β”‚   β”œβ”€β”€ exporters/         # Export formats
β”‚   β”‚   β”œβ”€β”€ csv_exporter.py
β”‚   β”‚   β”œβ”€β”€ excel_exporter.py
β”‚   β”‚   └── json_exporter.py
β”‚   β”œβ”€β”€ templates/         # Email templates
β”‚   β”‚   β”œβ”€β”€ template_engine.py
β”‚   β”‚   └── email_templates/
β”‚   β”œβ”€β”€ integrations/      # CRM integrations
β”‚   β”‚   β”œβ”€β”€ salesforce.py
β”‚   β”‚   └── hubspot.py
β”‚   └── utils/             # Utilities
β”‚       β”œβ”€β”€ logger.py
β”‚       β”œβ”€β”€ rate_limiter.py
β”‚       └── proxy_manager.py
β”œβ”€β”€ tests/                  # Test suite
β”œβ”€β”€ output/                 # Default export directory
β”œβ”€β”€ requirements.txt
└── README.md

πŸ§ͺ Running Tests

# Run all tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ -v --cov=src

# Run specific test file
python -m pytest tests/test_validators.py -v

βš™οΈ Configuration Options

config.yaml

scraper:
  rate_limit_delay: 2.0      # Seconds between requests
  max_retries: 3             # Retry failed requests
  timeout: 30                # Request timeout
  use_proxy: false           # Enable proxy rotation
  respect_robots_txt: true   # Honor robots.txt

email:
  validate_dns: true         # Check MX records
  validate_smtp: false       # SMTP verification (slow)
  common_patterns:           # Email pattern templates
    - "{first}.{last}"
    - "{first}{last}"
    - "{f}{last}"

export:
  output_directory: "./output"
  csv_delimiter: ","
  include_timestamp: true
  default_format: "csv"

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


⚠️ Disclaimer

This tool is intended for legitimate B2B lead generation purposes. Always:

  • Respect website terms of service and robots.txt
  • Comply with data protection regulations (GDPR, CCPA)
  • Use rate limiting to avoid overloading servers
  • Only contact businesses who may genuinely benefit from your services

πŸ“ž Support

  • πŸ“« Open an issue for bug reports or feature requests
  • πŸ’¬ Start a discussion for questions or ideas

Built with ❀️ for Sales & Marketing Teams

About

Automated B2B lead generation tool built with Python. Find qualified leads, extract emails, scrape company data, and export to CSV. Includes email templates and CRM integrations (Salesforce, HubSpot). Perfect for sales teams and marketers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages