GitHub - noobosaurus-r3x/InspecTor

InspecTor is a command-line tool designed to extract metadata from websites, including .onion sites, anonymously via the Tor network. It allows users to specify target URLs and retrieve various metadata fields such as emails, phone numbers, links, images, and more. The script supports concurrent requests, saving results to JSON or an SQLite database, and optional use of Selenium for dynamic content.

Introduction

InspecTor is a command-line tool designed to extract metadata from .onion websites anonymously via the Tor network. It allows users to specify target .onion URLs and retrieve various metadata fields such as emails, links, images, and more. The script supports concurrent requests, saving results to JSON or an SQLite database, and optional use of Selenium for dynamic content.

Features

Extract metadata from .onion websites
Support for multiple URLs and input files
Concurrent processing with configurable number of threads
Optional SSL verification
Extraction of specific metadata fields
Optional use of Selenium for dynamic content
Output to JSON file or stdout
Save results to SQLite database
Human-readable output option

Requirements

Python 3.x
Tor installed and running on 127.0.0.1:9050
Chrome browser and ChromeDriver (if using Selenium)

Python Packages

The required Python packages are listed in requirements.txt:

requests
beautifulsoup4
selenium
fake-useragent
colorama
urllib3
phonenumbers

Installation

Clone the repository:

git clone https://github.com/noobosaurus-r3x/InspecTor.git
cd InspecTor

Install Python packages:
```
pip install -r requirements.txt
```
Install Tor:
```
sudo apt update
sudo apt install tor
```

Start Tor service:

sudo systemctl start tor
sudo systemctl status tor

Install Chrome and ChromeDriver (if using Selenium):
- Chrome Browser:
  
  Download and install from the Google Chrome website.
- ChromeDriver:
  - Find the version of your Chrome browser:
```
google-chrome --version
```
  - Download the corresponding ChromeDriver.
  - Ensure chromedriver is in your system's PATH or specify the path in the script.

Usage

Basic Usage

Extract metadata from one or more URLs (both .onion and regular websites):

python3 InspecTor.py -u https://exampleonionsite1.onion https://www.example.com

Extract metadata from URLs listed in a file:

python3 InspecTor.py -f urls.txt

Force all traffic through Tor:

python3 InspecTor.py -u https://www.example.com --force-tor

Command-Line Arguments

u, -urls

List of .onion URLs to scrape.
f, -file

Path to a file containing .onion URLs, one per line.
o, -output

Output JSON file to save metadata (use "-" for stdout). Default is onion_site_metadata.json.
-force-tor

Route all traffic through the Tor network, even for regular URLs.
-verify-ssl

Enable SSL certificate verification (default: enabled).
-no-verify-ssl

Disable SSL certificate verification.
-use-selenium

Use Selenium for handling dynamic content.
-max-workers

Maximum number of concurrent threads (default: 5).
-database

SQLite database file to store metadata (default: metadata.db).
-fields

Specify which metadata fields to extract. Available fields are listed below.
-extract-all

Extract all available metadata fields.
-human-readable, hr

Output the results in a human-readable format.
--default-region

Specify the phone numbers' format (FR for France)

Available Fields

The following fields can be specified with the --fields argument:

emails
phone_numbers
links
external_links
images
scripts
css_files
social_links
csp
server_technologies
crypto_wallets
headers
title
description
keywords
og_title
og_description
timestamp
http_headers

Examples

Extract only emails from a .onion site:

python3 InspecTor.py -u https://example.onion --fields emails -o emails.json

To extract phone numbers from a website with French phone numbers:

python3 InspecTor.py -u https://example.com --fields phone_numbers --default-region FR

Extract emails and links:

python3 InspecTor.py -u https://example.onion --fields emails links -o data.json

Extract all metadata:

python3 InspecTor.py -u https://example.onion --extract-all -o all_metadata.json

Extract emails and phone numbers:

python3 InspecTor.py -u https://example.com --fields emails phone_numbers -o contact_info.json

Disable SSL verification and use Selenium:

python3 InspecTor.py -u https://example.onion -o metadata.json --no-verify-ssl --use-selenium

Output results in a human-readable format:

python3 InspecTor.py -u https://example.onion --human-readable

Output JSON to stdout and pipe to jq for formatting:

python3 InspecTor.py -u https://example.onion -o - | jq '.'

Output Formats

JSON File:

By default, the script saves the extracted metadata to onion_site_metadata.json. Use the -o argument to specify a different output file or use - to output to stdout.
SQLite Database:

The script saves metadata to an SQLite database (metadata.db by default). Use the --database argument to specify a different database file.
Human-Readable:

Use the --human-readable or -hr flag to print the results in a human-readable format with colored output.

Notes

Tor Configuration:

Ensure that the Tor service is running on 127.0.0.1:9050. The script routes all HTTP requests through the Tor SOCKS5 proxy.
Selenium Usage:

If the --use-selenium flag is used, Chrome browser and ChromeDriver must be installed. Selenium is used to handle dynamic content that requires JavaScript execution.
SSL Verification:

SSL certificate verification is enabled by default. Some .onion sites may have invalid certificates. Use the --no-verify-ssl flag to disable SSL verification.
Concurrency:

The script uses multithreading to process multiple URLs concurrently. Adjust the number of workers with the --max-workers argument as needed.
Dependencies:

All Python dependencies are listed in requirements.txt. Install them using pip install -r requirements.txt.
Tor Accessibility:

If you're scraping .onion sites or using the --force-tor option, ensure that the Tor service is accessible and running properly. The script checks if the Tor SOCKS5 proxy is open.

Contributing

I am not a professional developer, and this tool could be improved with your help. Feel free to fork the repository and enhance it by adding features, fixing bugs, or optimizing the code. Your contributions are welcome and highly appreciated !

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
InspecTor.py		InspecTor.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Features

Requirements

Python Packages

Installation

Usage

Basic Usage

Command-Line Arguments

Available Fields

Examples

Output Formats

Notes

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

noobosaurus-r3x/InspecTor

Folders and files

Latest commit

History

Repository files navigation

Introduction

Features

Requirements

Python Packages

Installation

Usage

Basic Usage

Command-Line Arguments

Available Fields

Examples

Output Formats

Notes

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages