safe_store: Transform Your Digital Chaos into a Queryable Knowledge Base

safe_store is a Python library that turns your local folders of documents into a powerful, private, and intelligent knowledge base. It achieves this by combining two powerful AI concepts into a single, seamless tool:

Deep Semantic Search: It reads and understands the content of your files, allowing you to search by meaning and context, not just keywords.
AI-Powered Knowledge Graph: It uses a Large Language Model (LLM) to automatically identify key entities (people, companies, concepts) and the relationships between them, building an interconnected web of your knowledge.

All of this happens entirely on your local machine, using a single, portable SQLite file. Your data never leaves your control.

The Journey from Search to Understanding

safe_store is designed to grow with your needs. You can start with a simple, powerful RAG system in minutes, and then evolve it into a sophisticated knowledge engine.

Level 1: Build a Powerful RAG System with Semantic Search

The Foundation: Retrieval-Augmented Generation (RAG)

RAG is the state-of-the-art technique for making Large Language Models (LLMs) answer questions about your private documents. The process is simple:

Retrieve: Find the most relevant text chunks from your documents related to a user's query.
Augment: Add those chunks as context to your prompt.
Generate: Ask the LLM to generate an answer based only on the provided context.

SafeStore is the perfect tool for the "Retrieve" step. It uses vector embeddings to understand the meaning of your text, allowing you to find relevant passages even if they don't contain the exact keywords.

Example: A Simple RAG Pipeline

import safe_store

# 1. Create a store. This will create a 'my_notes.db' file.
store = safe_store.SafeStore(db_path="my_notes.db", vectorizer_name="st")

# 2. Add your documents. It will scan the folder and process all supported files.
with store:
    store.add_document("path/to/my_notes_and_articles/")

# 3. Query the store to find context for your RAG prompt.
user_query = "What were the main arguments about AI consciousness in my research?"
context_chunks = store.query(user_query, top_k=3)

# 4. Build the prompt and send to your LLM.
context_text = "\n\n".join([chunk['chunk_text'] for chunk in context_chunks])
prompt = f"""
Based on the following context, please answer the user's question.
Do not use any external knowledge.

Context:
---
{context_text}
---

Question: {user_query}
"""

# result = my_llm_function(prompt) # Send to your LLM of choice

With just this, you have a powerful, private RAG system running on your local files.

Level 2: Uncover Hidden Connections with a Knowledge Graph

The Next Dimension: From Passages to a Web of Knowledge

Semantic search is great for finding relevant passages, but it struggles with questions about specific facts and relationships scattered across multiple documents.

GraphStore complements this by building a structured knowledge graph of the key instances (like the person "Geoffrey Hinton") and their relationships (like PIONEERED the concept "Backpropagation"). This allows you to ask precise, factual questions.

Level 3: Visualize Your Knowledge with an Interactive Point Cloud

Understanding the structure of your knowledge base can be challenging. safe_store provides a powerful tool to visually explore the semantic relationships within your documents.

The export_point_cloud() method performs a Principal Component Analysis (PCA) on all the vectors in your store to create a 2D "map" of your data. When combined with a simple web interface, this allows you to:

See Clusters: Identify natural groupings of related content at a glance.
Explore Relationships: Understand how different documents and topics relate to each other in the vector space.
Debug and Refine: Visually inspect the results of different chunking strategies or vectorization models to see how they affect the semantic layout of your data.

Example Visualization:

(This UI is generated by the example script below)

This entire interactive application, including the web server and the API to fetch chunk text on hover, is available as a complete, runnable example. It's the perfect starting point for building your own custom knowledge exploration tools.

Save the following code as run_point_cloud_app.py and execute it with python run_point_cloud_app.py.

# examples/point_cloud_and_api.py
import safe_store
from pathlib import Path
import shutil
import json
import webbrowser
from http.server import HTTPServer, SimpleHTTPRequestHandler
import threading
import pipmaster as pm

# Ensure necessary packages for PCA and the example are installed
pm.ensure_packages(["scikit-learn", "pandas"])

# --- Helper Functions ---
def print_header(title):
    print("\n" + "="*10 + f" {title} " + "="*10)

def setup_environment():
    """Cleans up old files and creates new ones for the example."""
    print_header("Setting Up Example Environment")
    db_file = Path("point_cloud_example.db")
    doc_dir = Path("temp_docs_point_cloud")
    
    # Clean up DB and its artifacts
    for p in [db_file, Path(f"{db_file}.lock"), Path(f"{db_file}-wal"), Path(f"{db_file}-shm")]:
        p.unlink(missing_ok=True)
    
    # Clean up and create doc directory
    if doc_dir.exists():
        shutil.rmtree(doc_dir)
    doc_dir.mkdir(exist_ok=True)

    # Create sample documents with metadata
    (doc_dir / "animals.txt").write_text(
        "The quick brown fox jumps over the lazy dog. A fast red fox is athletic. The sleepy dog rests."
    )
    (doc_dir / "tech.txt").write_text(
        "Python is a versatile programming language. Many developers use Python for AI. RAG pipelines are a common use case."
    )
    (doc_dir / "space.txt").write_text(
        "The sun is a star at the center of our solar system. The Earth revolves around the sun. Space exploration is fascinating."
    )
    
    print("- Created sample documents and cleaned up old database.")
    return db_file, doc_dir

# --- Main Logic ---
DB_FILE, DOC_DIR = setup_environment()

print_header("Initializing SafeStore and Indexing Documents")
# Initialize SafeStore
store = safe_store.SafeStore(
    db_path=DB_FILE,
    vectorizer_name="st",
    vectorizer_config={"model": "all-MiniLM-L6-v2"},
    chunk_size=10, # small chunks for more points
    chunk_overlap=2
)

# Add documents to the store with metadata
with store:
    store.add_document(DOC_DIR / "animals.txt", metadata={"topic": "animals", "source": "fiction"})
    store.add_document(DOC_DIR / "tech.txt", metadata={"topic": "technology", "source": "documentation"})
    store.add_document(DOC_DIR / "space.txt", metadata={"topic": "space", "source": "science"})

print("- Documents indexed successfully.")

# --- Data Export for Visualization ---
print_header("Exporting Point Cloud Data")
with store:
    point_cloud_data = store.export_point_cloud(output_format='dict')

# Save data to a JSON file for the web page to fetch
web_dir = Path("point_cloud_web_app")
web_dir.mkdir(exist_ok=True)
data_file = web_dir / "data.json"
with open(data_file, "w") as f:
    json.dump(point_cloud_data, f)

print(f"- Point cloud data exported to {data_file}")

# --- Web Server and HTML Page ---
html_content = """
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>SafeStore | 2D Chunk Visualization</title>
    <script src="https://cdn.tailwindcss.com"></script>
    <script src="https://cdn.plot.ly/plotly-2.32.0.min.js"></script>
</head>
<body class="bg-slate-50 dark:bg-slate-900 text-slate-800 dark:text-slate-200">
    <main class="container mx-auto p-8">
        <header class="text-center mb-12">
            <h1 class="text-4xl font-bold text-slate-900 dark:text-white">2D Document Chunk Visualization</h1>
            <p class="mt-2 text-lg text-slate-600 dark:text-slate-400">Interactive PCA plot of vectorized chunks. Hover to inspect.</p>
        </header>
        <div class="grid grid-cols-1 lg:grid-cols-5 gap-8">
            <div class="lg:col-span-3 bg-white dark:bg-slate-800 rounded-xl shadow-lg p-6 h-[70vh]">
                <div id="plot" class="w-full h-full"></div>
            </div>
            <div class="lg:col-span-2 bg-white dark:bg-slate-800 rounded-xl shadow-lg p-6">
                <h2 class="text-2xl font-semibold text-slate-900 dark:text-white mb-4">Chunk Inspector</h2>
                <div id="chunk-info-container" class="relative h-[calc(70vh-80px)]"></div>
            </div>
        </div>
    </main>
    <script>
        document.addEventListener('DOMContentLoaded', function() {
            // ... (JavaScript remains the same as in the example file) ...
        });
    </script>
</body>
</html>
"""
# (For brevity, the full JavaScript is in the example file but the structure is shown here)

# Write the HTML file
index_file = web_dir / "index.html"
index_file.write_text(html_content)

# Define a custom request handler to serve files and provide an API
class CustomHandler(SimpleHTTPRequestHandler):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, directory=str(web_dir), **kwargs)

    def do_GET(self):
        if self.path.startswith('/chunk/'):
            try:
                chunk_id = int(self.path.split('/')[-1])
                with store:
                    chunk_data = store.get_chunk_by_id(chunk_id)
                
                if chunk_data:
                    self.send_response(200)
                    self.send_header('Content-type', 'application/json')
                    self.end_headers()
                    self.wfile.write(json.dumps(chunk_data).encode('utf-8'))
                else:
                    self.send_error(404, "Chunk not found")
            except Exception as e:
                self.send_error(500, str(e))
            return
        
        super().do_GET()

print(f"- Wrote web application files to '{web_dir.resolve()}'")

# --- Run Server ---
PORT = 8008
server_address = ('', PORT)
httpd = HTTPServer(server_address, CustomHandler)
url = f"http://localhost:{PORT}"

print_header("Starting Web Server")
print(f"Serving visualization at: {url}")
print("Please open the URL in your web browser.")
print("Press Ctrl+C to stop the server.")

threading.Timer(1.5, lambda: webbrowser.open(url)).start()

try:
    httpd.serve_forever()
except KeyboardInterrupt:
    print("\n- Server stopped.")
finally:
    httpd.server_close()

Dynamic Vectorizer Discovery & Configuration

One of safe_store's most powerful features is its ability to self-document. You don't need to guess which vectorizers are available or what parameters they need. You can discover everything at runtime.

This makes it easy to experiment with different embedding models and build interactive tools that guide users through the setup process.

Step 1: Discovering Available Vectorizers

The SafeStore.list_available_vectorizers() class method scans the library for all built-in and custom vectorizers and returns their complete configuration metadata.

import safe_store
import pprint

# Get a list of all available vectorizer configurations
available_vectorizers = safe_store.SafeStore.list_available_vectorizers()

# Pretty-print the result to see what's available
pprint.pprint(available_vectorizers)

This will produce a detailed output like this:

[{'author': 'ParisNeo',
  'class_name': 'CohereVectorizer',
  'creation_date': '2025-10-10',
  'description': "A vectorizer that uses Cohere's API...",
  'input_parameters': [{'default': 'embed-english-v3.0',
                        'description': 'The name of the Cohere embedding model...',
                        'mandatory': True,
                        'name': 'model'},
                       {'default': '',
                        'description': 'Your Cohere API key...',
                        'mandatory': False,
                        'name': 'api_key'},
                        ...],
  'last_update_date': '2025-10-10',
  'name': 'cohere',
  'title': 'Cohere Vectorizer'},
 {'author': 'ParisNeo',
  'class_name': 'OllamaVectorizer',
  'name': 'ollama',
  'title': 'Ollama Vectorizer',
  ...},
  ...
]

Step 2: Listing Available Models for a Vectorizer

Once you know which vectorizer you want to use, you can ask safe_store what specific models it supports. This is especially useful for API-based or local server-based vectorizers like ollama, which can have many different models available.

import safe_store

# Example: List all embedding models available from a running Ollama server
try:
    # This requires a running Ollama instance to succeed
    ollama_models = safe_store.SafeStore.list_models("ollama")
    print("Available Ollama embedding models:")
    for model in ollama_models:
        print(f"- {model}")
except Exception as e:
    print(f"Could not list Ollama models. Is the server running? Error: {e}")

Step 3: Building an Interactive Configurator

You can use this metadata to create an interactive setup script, guiding the user to choose and configure their desired vectorizer on the fly.

Full Interactive Example: Copy and run this script. It will guide you through selecting and configuring a vectorizer, then initialize SafeStore with your choices.

# interactive_setup.py
import safe_store
import pprint

def interactive_vectorizer_setup():
    """
    An interactive CLI to guide the user through selecting and configuring a vectorizer.
    """
    print("--- Welcome to the safe_store Interactive Vectorizer Setup ---")
    
    # 1. List all available vectorizers
    vectorizers = safe_store.SafeStore.list_available_vectorizers()
    
    print("\nAvailable Vectorizers:")
    for i, vec in enumerate(vectorizers):
        print(f"  [{i+1}] {vec['name']} - {vec.get('title', 'No Title')}")

    # 2. Get user's choice
    choice = -1
    while choice < 0 or choice >= len(vectorizers):
        try:
            raw_choice = input(f"\nPlease select a vectorizer (1-{len(vectorizers)}): ")
            choice = int(raw_choice) - 1
            if not (0 <= choice < len(vectorizers)):
                print("Invalid selection. Please try again.")
        except ValueError:
            print("Please enter a number.")

    selected_vectorizer = vectorizers[choice]
    selected_name = selected_vectorizer['name']
    
    print(f"\nYou have selected: {selected_name}")
    print(f"Description: {selected_vectorizer.get('description', 'N/A').strip()}")

    # 3. Dynamically build the configuration dictionary
    vectorizer_config = {}
    print("\nPlease provide the following configuration values (press Enter to use default):")
    
    params = selected_vectorizer.get('input_parameters', [])
    if not params:
        print("This vectorizer requires no special configuration.")
    else:
        for param in params:
            param_name = param['name']
            description = param.get('description', 'No description.')
            default_value = param.get('default', None)
            
            prompt = f"- {param_name} ({description})"
            if default_value is not None:
                prompt += f" [default: {default_value}]: "
            else:
                prompt += ": "
                
            user_input = input(prompt)
            
            # Use user input if provided, otherwise use default
            final_value = user_input if user_input else default_value
            
            # Simple type conversion for demonstration (can be expanded)
            if final_value is not None:
                if param.get('type') == 'int':
                    vectorizer_config[param_name] = int(final_value)
                elif param.get('type') == 'dict':
                    # For simplicity, we don't parse dicts here, but a real app might use json.loads
                    vectorizer_config[param_name] = final_value
                else:
                    vectorizer_config[param_name] = str(final_value)

    # 4. Initialize SafeStore with the dynamically created configuration
    print("\n--- Configuration Complete ---")
    print(f"Vectorizer Name: '{selected_name}'")
    print("Vectorizer Config:")
    pprint.pprint(vectorizer_config)
    
    try:
        print("\nInitializing SafeStore with your configuration...")
        store = safe_store.SafeStore(
            db_path=f"{selected_name}_store.db",
            vectorizer_name=selected_name,
            vectorizer_config=vectorizer_config
        )
        print("\n✅ SafeStore initialized successfully!")
        print(f"Database file is at: {selected_name}_store.db")
        store.close()
    except Exception as e:
        print(f"\n❌ Failed to initialize SafeStore: {e}")


if __name__ == "__main__":
    interactive_vectorizer_setup()

This script demonstrates how the self-documenting nature of safe_store enables you to build powerful, user-friendly applications on top of it.

Core Concepts for Advanced RAG

Understanding Tokenization for Chunking

safe_store can chunk your documents based on character count (character strategy) or token count (token strategy). Using the token strategy is often more effective as it aligns better with how Large Language Models (LLMs) process text.

When you select chunking_strategy='token', safe_store intelligently handles tokenization:

Vectorizer's Native Tokenizer: If the chosen vectorizer (like a local sentence-transformers model) has its own tokenizer, safe_store will use it. This is the most accurate method, as the chunking tokens will perfectly match the vectorizer's tokens.
Fallback to tiktoken: Some vectorizers, especially those accessed via an API (like OpenAI or Cohere), do not expose their tokenizer for local use. In these cases, safe_store uses tiktoken (specifically the cl100k_base model) as a reliable fallback. tiktoken is the tokenizer used by modern OpenAI models and provides a very close approximation for many other models, ensuring your chunks are sized correctly for optimal performance.

You can also specify a custom tokenizer during SafeStore initialization if you have specific needs.

Enriching Your Data with Metadata

Metadata is extra information about your documents that provides crucial context. You can attach a dictionary of key-value pairs to any document you add to safe_store.

How to Add Metadata: Simply pass a dictionary to the metadata parameter when adding content.

# Example of adding a document with metadata
doc_info = {
    "title": "Quantum Entanglement in Nanostructures",
    "author": "Dr. Alice Smith",
    "year": 2024,
    "topic": "Quantum Physics"
}

with store:
    store.add_document(
        "path/to/research_paper.txt",
        metadata=doc_info
    )

How Metadata is Used in Queries: When you perform a query, the document's metadata is returned in two ways for maximum flexibility:

As a structured dictionary: The document_metadata field contains the parsed metadata, which your application can use for filtering, logging, or display purposes.
Prepended to the chunk_text: A human-readable version of the metadata is automatically added to the beginning of the returned chunk_text. This "just-in-time" context injection dramatically improves an LLM's ability to understand the source and relevance of the information, leading to better-quality responses without any extra work on your part.

A query result object looks like this:

[
  {
    "chunk_id": 123,
    "similarity_percent": 95.4,
    "file_path": "/path/to/research_paper.txt",
    "document_metadata": {
      "title": "Quantum Entanglement in Nanostructures",
      "author": "Dr. Alice Smith",
      "year": 2024,
      "topic": "Quantum Physics"
    },
    "chunk_text": "--- Document Context ---\\nTitle: Quantum Entanglement in Nanostructures\\nAuthor: Dr. Alice Smith\\nYear: 2024\\nTopic: Quantum Physics\\n------------------------\\n\\n...the actual text from the document chunk begins here..."
  }
]

Reconstructing Original Content

After indexing, you may need to retrieve the full, original text of a document as it was processed by safe_store. The reconstruct_document_text method does this by fetching and reassembling all of a document's stored chunks.

# Assuming 'store' is an initialized SafeStore instance
# with "path/to/research_paper.txt" already added.
full_text = store.reconstruct_document_text("path/to/research_paper.txt")

if full_text:
    print("--- Reconstructed Text ---")
    print(full_text[:500] + "...")

# Note: If a chunk_overlap was used during indexing, the reconstructed text
# will contain these repeated, overlapping segments. This method provides a
# raw reassembly of the stored data.

Pre-processing Chunks on the Fly with `chunk_processor`

For advanced RAG, you might need to transform the text of a chunk before it's vectorized and stored. The chunk_processor is a powerful hook that lets you do exactly that.

It's an optional callable that you can pass to add_document or add_text. The function receives the raw text of each chunk and the document's metadata, and it must return the string that you want to be stored and vectorized instead.

This enables powerful workflows like:

Summarization: Replace long chunks with concise summaries generated by an LLM.
Keyword Extraction: Prepend important keywords to each chunk to boost relevance for certain queries.
Translation: Translate chunks into a different language before indexing.
Formatting: Clean or reformat text in a specific way for your RAG pipeline.

Example: Prepending Metadata to Each Chunk

import safe_store

store = safe_store.SafeStore(db_path="processed_store.db")

def prepend_topic_processor(chunk_text: str, metadata: dict) -> str:
    """A processor that adds the 'topic' from metadata to the chunk text."""
    topic = metadata.get("topic", "general")
    return f"[Topic: {topic}] {chunk_text}"

with store:
    store.add_text(
        unique_id="processed_doc_1",
        text="This chunk is about quantum mechanics.",
        metadata={"topic": "Physics"},
        chunk_processor=prepend_topic_processor,
        force_reindex=True
    )

# When you query this, the stored text will be:
# "[Topic: Physics] This chunk is about quantum mechanics."
# This can make the vector more specific to the topic.
results = store.query("information related to physics", top_k=1)
if results:
    print(results['chunk_text'])

store.close()

This simple hook provides immense flexibility for customizing your data ingestion pipeline.

Data Safety and Recovery

Because safe_store is built on a single, portable SQLite database file, ensuring the safety of your knowledge base is straightforward.

Backup: To back up your entire store, simply make a copy of the main database file (e.g., my_notes.db). For a complete and safe backup, especially if the database might be in use, it's best to also copy the associated temporary files:

my_notes.db (the main database file)
my_notes.db-shm
my_notes.db-wal

Copying these three files to a secure location (like a separate hard drive or a cloud storage folder) creates a complete snapshot of your store at that moment.

Recovery: To recover from a backup, simply replace the corrupted or lost .db, .db-shm, and .db-wal files with the copies from your backup.

This file-based approach avoids the complexity of database dumps and restores, giving you a simple and robust way to protect your data.

🏁 Quick Start Guide

This example shows the end-to-end workflow: indexing a document, then building and querying a knowledge graph of its instances using a simple string-based ontology.

import safe_store
from safe_store import GraphStore, LogLevel
from lollms_client import LollmsClient
from pathlib import Path
import shutil

# --- 0. Configuration & Cleanup ---
DB_FILE = "quickstart.db"
DOC_DIR = Path("temp_docs_qs")
if DOC_DIR.exists(): shutil.rmtree(DOC_DIR)
DOC_DIR.mkdir()
Path(DB_FILE).unlink(missing_ok=True)

# --- 1. LLM Executor & Sample Document ---
def llm_executor(prompt: str) -> str:
    try:
        client = LollmsClient()
        return client.generate_code(prompt, language="json", temperature=0.1) or ""
    except Exception as e:
        raise ConnectionError(f"LLM call failed: {e}")

doc_path = DOC_DIR / "doc.txt"
doc_path.write_text("Dr. Aris Thorne is the CEO of QuantumLeap AI, a firm in Geneva.")

# --- 2. Level 1: Semantic Search with SafeStore ---
print("--- LEVEL 1: SEMANTIC SEARCH ---")
store = safe_store.SafeStore(db_path=DB_FILE, vectorizer_name="st", log_level=LogLevel.INFO)
with store:
    store.add_document(doc_path)
    results = store.query("who leads the AI firm in Geneva?", top_k=1)
    print(f"Semantic search result: '{results['chunk_text']}'")

# --- 3. Level 2: Knowledge Graph with GraphStore ---
print("\n--- LEVEL 2: KNOWLEDGE GRAPH ---")
ontology = "Extract People and Companies. A Person can be a CEO_OF a Company."
try:
    graph_store = GraphStore(store=store, llm_executor_callback=llm_executor, ontology=ontology)
    with graph_store:
        graph_store.build_graph_for_all_documents()
        graph_result = graph_store.query_graph("Who is the CEO of QuantumLeap AI?", output_mode="graph_only")
        
        print("Graph query result:")
        for rel in graph_result.get('relationships', []):
            source = rel['source_node']['properties'].get('identifying_value')
            target = rel['target_node']['properties'].get('identifying_value')
            print(f"- Relationship: '{source}' --[{rel['type']}]--> '{target}'")
except ConnectionError as e:
    print(f"[SKIP] GraphStore part failed: {e}")

store.close()

⚙️ Installation

pip install safe-store

Install optional dependencies for the features you need:```bash

For Sentence Transformers (recommended for local use)

pip install safe-store[sentence-transformers]

For API-based vectorizers

pip install safe_store[openai,ollama,cohere]

For parsing PDF, DOCX, etc.

pip install safe-store[parsing]

For encryption

pip install safe-store[encryption]

To install everything:

pip install safe-store[all]

---

## 💡 API Highlights

#### `SafeStore` (The Foundation)
*   `__init__(db_path, vectorizer_name, ...)`: Creates or loads a database. The vectorizer is locked in at creation.
*   `add_document(path, ...)`: Parses, chunks, vectorizes, and stores a document or an entire folder.
*   `query(query_text, top_k, ...)`: Performs a semantic search and returns the most relevant text chunks for your RAG pipeline.
*   `get_chunk_by_id(chunk_id)`: Retrieves the full text and metadata for a specific chunk by its ID.
*   `reconstruct_document_text(file_path)`: Reassembles and returns the full, original text of a document by joining its stored chunks.
*   `export_point_cloud()`: Exports all vectors as a 2D point cloud for visualization, using PCA for dimensionality reduction.

#### `GraphStore` (The Intelligence Layer)
*   `__init__(store, llm_executor_callback, ontology)`: Creates the graph manager on an existing `SafeStore` instance.
*   `build_graph_for_all_documents()`: Scans documents and uses an LLM to build the knowledge graph based on your ontology.
*   `query_graph(natural_language_query, ...)`: Translates a question into a graph traversal, returning nodes, relationships, and/or the original source text.
*   `add_node(...)`, `add_relationship(...)`: Manually edit the graph to add your own expert knowledge.

---

## 🤝 Contributing & License

Contributions are highly welcome! Please open an issue to discuss a new feature or submit a pull request on [GitHub](https://github.com/ParisNeo/safe_store).

Licensed under Apache 2.0. See [LICENSE](LICENSE).

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
docs		docs
examples		examples
point_cloud_web_app		point_cloud_web_app
safe_store		safe_store
scripts		scripts
temp_docs_point_cloud		temp_docs_point_cloud
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
DOC.md		DOC.md
LICENSE		LICENSE
README.md		README.md
plan.md		plan.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

safe_store: Transform Your Digital Chaos into a Queryable Knowledge Base

The Journey from Search to Understanding

Level 1: Build a Powerful RAG System with Semantic Search

Level 2: Uncover Hidden Connections with a Knowledge Graph

Level 3: Visualize Your Knowledge with an Interactive Point Cloud

Dynamic Vectorizer Discovery & Configuration

Step 1: Discovering Available Vectorizers

Step 2: Listing Available Models for a Vectorizer

Step 3: Building an Interactive Configurator

Core Concepts for Advanced RAG

Understanding Tokenization for Chunking

Enriching Your Data with Metadata

Reconstructing Original Content

Pre-processing Chunks on the Fly with `chunk_processor`

Data Safety and Recovery

🏁 Quick Start Guide

⚙️ Installation

For Sentence Transformers (recommended for local use)

For API-based vectorizers

For parsing PDF, DOCX, etc.

For encryption

To install everything:

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

ParisNeo/safe_store

Folders and files

Latest commit

History

Repository files navigation

safe_store: Transform Your Digital Chaos into a Queryable Knowledge Base

The Journey from Search to Understanding

Level 1: Build a Powerful RAG System with Semantic Search

Level 2: Uncover Hidden Connections with a Knowledge Graph

Level 3: Visualize Your Knowledge with an Interactive Point Cloud

Dynamic Vectorizer Discovery & Configuration

Step 1: Discovering Available Vectorizers

Step 2: Listing Available Models for a Vectorizer

Step 3: Building an Interactive Configurator

Core Concepts for Advanced RAG

Understanding Tokenization for Chunking

Enriching Your Data with Metadata

Reconstructing Original Content

Pre-processing Chunks on the Fly with chunk_processor

Data Safety and Recovery

🏁 Quick Start Guide

⚙️ Installation

For Sentence Transformers (recommended for local use)

For API-based vectorizers

For parsing PDF, DOCX, etc.

For encryption

To install everything:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Pre-processing Chunks on the Fly with `chunk_processor`

Packages