Skip to content

A complete web scraping pipeline built with Python, Requests, BeautifulSoup, and SQLite, SQL Server, PostgreSQL and MongoDB to collect and store perfume product details.

Notifications You must be signed in to change notification settings

SamiraSiavash/Perfume_Scraper_Multi_DB

Repository files navigation

🌸 Perfume Scraper

A Python-based web scraping project designed to extract structured perfume product data from Liliome.com.
The scraper collects brand and product information and supports multiple database backends, allowing the same data pipeline to be stored in relational and NoSQL databases.


πŸ“Œ Features

βœ” Robust HTTP session

  • Uses requests.Session with retry logic
  • Handles connection failures gracefully (safe_get())

βœ” Web scraping

  • Extracts:
    • Brand name
    • English title
    • Persian title
    • Old price
    • New price
    • Product rating (Point)
    • Photo URL
  • Automatically discovers all available brands and their product pages

βœ” Pagination handling

  • Detects number of pages for each brand using total_pages()

πŸ—„οΈ Supported Databases

The project has been refactored to support multiple storage backends, making it easy to switch between databases:

  • SQLite – lightweight local storage
  • SQL Server – enterprise relational database
  • PostgreSQL – open-source relational database
  • MongoDB – NoSQL document-based storage

This design enables comparison between SQL and NoSQL data models using the same scraping logic. Two tables are created automatically:

Brands

Column Type Description
Brand_ID INTEGER Primary key
Brand_Link TEXT URL of brand page
Brand_Name TEXT Extracted brand name

Master

Column Type Description
ID INTEGER Primary key
Brand TEXT Brand slug
EnglishName TEXT Product English title
Name TEXT Product Persian title
Point FLOAT Product rating
OldPrice INTEGER Old price
NewPrice INTEGER New price
Photo TEXT Image URL

πŸ›  Technologies Used

  • Python 3
  • Requests
  • BeautifulSoup4
  • SQLite3
  • SQL Server
  • PostgreSQL
  • MongoDB
  • Retry & Timeout handling
  • Regex for price cleanup

πŸ“ Project Structure

Perfume_Scraper/
β”‚
β”œβ”€β”€ assets/
β”‚ └── mongodb_brands.png
β”‚ └── mongodb_master.png
β”‚ └── postgres_brands.png
β”‚ └── postgres_master.png
β”‚ └── sqlite_brands.png
β”‚ └── sqlite_master.png
β”‚ └── sqlserver_brands.png
β”‚ └── sqlserver_master.png
β”‚
β”œβ”€β”€ db/
β”‚ └── Perfume.db # Automatically created database for SQLite
β”‚
β”œβ”€β”€ Scraper_MongoDB.py
β”œβ”€β”€ Scraper_MongoDB_Safe.py
β”œβ”€β”€ Scraper_PostgreSQL.py
β”œβ”€β”€ Scraper_PostgreSQL_Safe.py
β”œβ”€β”€ Scraper_SQL.py
β”œβ”€β”€ Scraper_SQL_Safe.py
β”œβ”€β”€ Scraper_SQLite.py
β”œβ”€β”€ Scraper_SQLite_Safe.py
β”œβ”€β”€ README.md
β”œβ”€β”€ requirements.txt

πŸš€ How It Works

1️⃣ Load Liliome brand list

The script visits:

https://liliome.com/Ψ¨Ψ±Ω†Ψ―Ω‡Ψ§-ΨΉΨ·Ψ±-Ψ§Ψ―Ϊ©Ω„Ω†-ΩΨ±ΩˆΨ΄Ϊ―Ψ§Ω‡-ΨΉΨ·Ψ±-Ω„ΫŒΩ„ΫŒΩˆΩ…

It finds all brand links and stores them in the Brands table.


2️⃣ For each brand:

  • Detects how many pages of products exist
  • Extracts products from each page
  • Saves structured data into the Master table

▢️ How to Run

  1. Clone the repository:
git clone https://github.com/SamiraSiavash/Perfume_Scraper_Multi_DB.git
cd Perfume_Scraper_Multi_DB
  1. Install dependencies:
pip install -r requirements.txt
  1. Run one of the scrapers:
Scraper_MongoDB.py
Scraper_MongoDB_Safe.py
Scraper_PostgreSQL.py
Scraper_PostgreSQL_Safe.py
Scraper_SQL.py
Scraper_SQL_Safe.py
Scraper_SQLite.py
Scraper_SQLite_Safe.py

πŸ–Ό Screenshots

SQLite

![Brands Table](assets/sqlite_brands.png)
sqlite_brands
![Master Table](assets/sqlite_master.png)
sqlite_master

SQL Server

![Brands Table](assets/sqlserver_brands.png)
sqlserver_brands
![Master Table](assets/sqlserver_master.png)
sqlserver_master

PostgreSQL

![Brands Table](assets/postgres_brands.png)
postgres_brands
![Master Table](assets/postgres_master.png)
postgres_master

MongoDB

![Brands Collection](assets/mongodb_brands.png)
mongodb_brands
![Master Collection](assets/mongodb_master.png)
mongodb_master

πŸ“ Notes

  • Adjust CSS selectors depending on website structure.
  • Website layouts may change; update selectors accordingly.
  • Always follow the target website’s Terms of Service.

πŸ“„ License

MIT License (optional)


✨ Author

Samira Siavash

πŸ”— GitHub: https://github.com/SamiraSiavash

πŸ”— LinkedIn: https://linkedin.com/in/samira-siavash

About

A complete web scraping pipeline built with Python, Requests, BeautifulSoup, and SQLite, SQL Server, PostgreSQL and MongoDB to collect and store perfume product details.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages