OpenScrape

An open-source web scraping tool with LLM-ready data extraction capabilities, built with TypeScript and Playwright.

Features

Extract text content from web pages
Capture screenshots
Extract links and images
Wait for specific elements before scraping
Output results in JSON format
Docker support for containerized execution

Installation

Local Installation

# Clone the repository
git clone https://github.com/pureartisan/openscrape.git
cd openscrape

# Install dependencies
npm install

# Build the project
npm run build

Docker Installation

# Build the Docker image
docker build -t openscrape .

# Run the container
docker run -v $(pwd)/output:/app/output openscrape scrape -u "https://example.com" -o /app/output/result.json

Usage

Basic Usage

# Scrape a website and extract text
npx openscrape scrape -u "https://example.com" -t

# Take a screenshot
npx openscrape scrape -u "https://example.com" -s

# Extract links
npx openscrape scrape -u "https://example.com" -l

# Extract images
npx openscrape scrape -u "https://example.com" -i

# Wait for a specific element before scraping
npx openscrape scrape -u "https://example.com" -w "#main-content"

# Save results to a file
npx openscrape scrape -u "https://example.com" -t -o results.json

Command Line Options

-u, --url <url>: URL to scrape (required)
-w, --wait-for <selector>: Wait for a specific selector before scraping
-s, --screenshot: Take a screenshot of the page
-t, --text: Extract text content
-l, --links: Extract links
-i, --images: Extract images
-o, --output <file>: Output file path (JSON)

Development

# Install dependencies
npm install

# Run in development mode
npm run dev

# Run tests
npm test

# Build the project
npm run build

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenScrape

Features

Installation

Local Installation

Docker Installation

Usage

Basic Usage

Command Line Options

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

pureartisan/openscrape

Folders and files

Latest commit

History

Repository files navigation

OpenScrape

Features

Installation

Local Installation

Docker Installation

Usage

Basic Usage

Command Line Options

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages