GITBOOK-2: No subject

hatamiarash7 · gitbook-bot · commit bf5e5e3876a8 · 2025-02-10T12:31:35.000Z
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,24 @@
+---
+icon: face-glasses
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Introduce
+
+Netquack DuckDB extension is designed to simplify working with domains, URIs, and web paths directly within your database queries. Whether you're extracting top-level domains (TLDs), parsing URI components, or analyzing web paths, Netquack provides a suite of intuitive functions to handle all your network tasks efficiently. Built for data engineers, analysts, and developers.
+
+With Netquack, you can unlock deeper insights from your web-related datasets without the need for external tools or complex workflows.
+
+### What is DuckDB <a href="#what-is-duckdb" id="what-is-duckdb"></a>
+
+DuckDB is an in-process SQL OLAP database management system designed to efficiently handle analytical query workloads. It is lightweight, easy to integrate, and features an intuitive interface for querying and processing data directly within applications. DuckDB is gaining popularity for its performance and low overhead, making it an excellent choice for processing large datasets directly in various programming environments.
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
@@ -0,0 +1,22 @@
+# Table of contents
+
+* [Introduce](README.md)
+* [Why Netquack](why-netquack.md)
+
+## Getting Started
+
+* [Quickstart](getting-started/quickstart.md)
+* [How to build](getting-started/publish-your-docs.md)
+
+## Functions
+
+* [Extract Domain](functions/editor.md)
+* [Extract Subdomain](functions/extract-subdomain.md)
+* [Extract Path](functions/extract-path.md)
+* [Extract Host](functions/extract-host.md)
+* [Extract Schema](functions/extract-schema.md)
+* [Extract Query](functions/extract-query.md)
+* [Extract TLD](functions/extract-tld.md)
+* [Tranco](functions/tranco/README.md)
+  * [Get Tranco Rank](functions/tranco/get-tranco-rank.md)
+  * [Download / Update Tranco](functions/tranco/download-update-tranco.md)
diff --git a/docs/functions/editor.md b/docs/functions/editor.md
@@ -0,0 +1,38 @@
+---
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Extract Domain
+
+This function extracts the main domain from a URL. For this purpose, the extension will get all public suffixes from the [publicsuffix.org](https://publicsuffix.org/) list and extract the main domain from the URL.
+
+The download process of the public suffix list is done automatically when the function is called for the first time. After that, the list is stored in the `public_suffix_list` table to avoid downloading it again.
+
+```sql
+D SELECT extract_domain('a.example.com') as domain;
+┌─────────────┐
+│   domain    │
+│   varchar   │
+├─────────────┤
+│ example.com │
+└─────────────┘
+
+D SELECT extract_domain('https://b.a.example.com/path') as domain;
+┌─────────────┐
+│   domain    │
+│   varchar   │
+├─────────────┤
+│ example.com │
+└─────────────┘
+```
+
diff --git a/docs/functions/extract-host.md b/docs/functions/extract-host.md
@@ -0,0 +1,35 @@
+---
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Extract Host
+
+This function extracts the host from a URL.
+
+```sql
+D SELECT extract_host('https://b.a.example.com/path/path') as host;
+┌─────────────────┐
+│      host       │
+│     varchar     │
+├─────────────────┤
+│ b.a.example.com │
+└─────────────────┘
+
+D SELECT extract_host('example.com:443/path/image.png') as host;
+┌─────────────┐
+│    host     │
+│   varchar   │
+├─────────────┤
+│ example.com │
+└─────────────┘
+```
diff --git a/docs/functions/extract-path.md b/docs/functions/extract-path.md
@@ -0,0 +1,35 @@
+---
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Extract Path
+
+This function extracts the path from a URL.
+
+```sql
+D SELECT extract_path('https://b.a.example.com/path/path') as path;
+┌────────────┐
+│    path    │
+│  varchar   │
+├────────────┤
+│ /path/path │
+└────────────┘
+
+D SELECT extract_path('example.com/path/path/image.png') as path;
+┌──────────────────────┐
+│         path         │
+│       varchar        │
+├──────────────────────┤
+│ /path/path/image.png │
+└──────────────────────┘
+```
diff --git a/docs/functions/extract-query.md b/docs/functions/extract-query.md
@@ -0,0 +1,35 @@
+---
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Extract Query
+
+This function extracts the query string from a URL.
+
+```sql
+D SELECT extract_query_string('example.com?key=value') as query;
+┌───────────┐
+│   query   │
+│  varchar  │
+├───────────┤
+│ key=value │
+└───────────┘
+
+D SELECT extract_query_string('http://example.com.ac/path/?a=1&b=2&') as query;
+┌──────────┐
+│  query   │
+│ varchar  │
+├──────────┤
+│ a=1&b=2& │
+└──────────┘
+```
diff --git a/docs/functions/extract-schema.md b/docs/functions/extract-schema.md
@@ -0,0 +1,48 @@
+---
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Extract Schema
+
+This function extracts the schema from a URL. Supported schemas for now:
+
+* `http` | `https`
+* `ftp`
+* `mailto`
+* `tel` | `sms`
+
+```sql
+D SELECT extract_schema('https://b.a.example.com/path/path') as schema;
+┌─────────┐
+│ schema  │
+│ varchar │
+├─────────┤
+│ https   │
+└─────────┘
+
+D SELECT extract_schema('mailto:someone@example.com') as schema;
+┌─────────┐
+│ schema  │
+│ varchar │
+├─────────┤
+│ mailto  │
+└─────────┘
+
+D SELECT extract_schema('tel:+123456789') as schema;
+┌─────────┐
+│ schema  │
+│ varchar │
+├─────────┤
+│ tel     │
+└─────────┘
+```
diff --git a/docs/functions/extract-subdomain.md b/docs/functions/extract-subdomain.md
@@ -0,0 +1,35 @@
+---
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Extract Subdomain
+
+This function extracts the sub-domain from a URL. This function will use the public suffix list to extract the TLD. Check the [Extracting The Main Domain](https://github.com/hatamiarash7/duckdb-netquack#extracting-the-main-domain) section for more information about the public suffix list.
+
+```sql
+D SELECT extract_subdomain('http://a.b.example.com/path') as dns_record;
+┌────────────┐
+│ dns_record │
+│  varchar   │
+├────────────┤
+│ a.b        │
+└────────────┘
+
+D SELECT extract_subdomain('test.example.com.ac') as dns_record;
+┌────────────┐
+│ dns_record │
+│  varchar   │
+├────────────┤
+│ test       │
+└────────────┘
+```
diff --git a/docs/functions/extract-tld.md b/docs/functions/extract-tld.md
@@ -0,0 +1,35 @@
+---
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Extract TLD
+
+This function extracts the top-level domain from a URL. This function will use the public suffix list to extract the TLD. Check the [Extracting The Main Domain](https://github.com/hatamiarash7/duckdb-netquack#extracting-the-main-domain) section for more information about the public suffix list.
+
+```sql
+D SELECT extract_tld('https://example.com.ac/path/path') as tld;
+┌─────────┐
+│   tld   │
+│ varchar │
+├─────────┤
+│ com.ac  │
+└─────────┘
+
+D SELECT extract_tld('a.example.com') as tld;
+┌─────────┐
+│   tld   │
+│ varchar │
+├─────────┤
+│ com     │
+└─────────┘
+```
diff --git a/docs/functions/tranco/README.md b/docs/functions/tranco/README.md
@@ -0,0 +1,17 @@
+---
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Tranco
+
+Work with the [Tranco](https://tranco-list.eu/) database in your DuckDB database.
diff --git a/docs/functions/tranco/download-update-tranco.md b/docs/functions/tranco/download-update-tranco.md
@@ -0,0 +1,37 @@
+---
+layout:
+  title:
+    visible: true
+  description:
+    visible: false
+  tableOfContents:
+    visible: true
+  outline:
+    visible: true
+  pagination:
+    visible: true
+---
+
+# Download / Update Tranco
+
+This function returns the [Tranco](https://tranco-list.eu/) rank of a domain. You have a `update_tranco` function to update the Tranco list manually.
+
+```sql
+D SELECT update_tranco(true);
+┌─────────────────────────────────────┐
+│ update_tranco(CAST('f' AS BOOLEAN)) │
+│               varchar               │
+├─────────────────────────────────────┤
+│ Tranco list updated                 │
+└─────────────────────────────────────┘
+```
+
+This function will get the latest Tranco list and save it into the `tranco_list` table. There will be a `tranco_lit_%Y-%m-%d.csv` file in the current directory after the function is called. The extension will use this file to prevent downloading the list again.
+
+You can ignore the file and force the extension to download the list again by calling the function with `true` as a parameter. If you don't want to download the list again, you can call the function with `false` as a parameter.
+
+```sql
+D SELECT update_tranco(false);
+```
+
+As the latest Tranco list is for the last day, you can download your list manually and rename it to `tranco_lit_%Y-%m-%d.csv` to use it with the extension too.
diff --git a/docs/functions/tranco/get-tranco-rank.md b/docs/functions/tranco/get-tranco-rank.md
diff --git a/docs/getting-started/publish-your-docs.md b/docs/getting-started/publish-your-docs.md
diff --git a/docs/getting-started/quickstart.md b/docs/getting-started/quickstart.md
diff --git a/docs/why-netquack.md b/docs/why-netquack.md