Skip to content

hiroshitashir/AIDataRoleGuard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AIDataRoleGuard πŸ§ πŸ›‘οΈ

AI-Powered SQL Governance & Query Rewriting Library

AIDataRoleGuard is an intelligent SQL rewriting engine designed for the age of autonomous data access. As AI agents increasingly generate and execute SQL queriesβ€”whether in analytics tools, automation scripts, or natural language interfacesβ€”AIDataRoleGuard ensures sensitive data stays protected through role-based masking and granular permission enforcement.


πŸ“£ Why AIDataRoleGuard?

AI systems are amazing at discovering patterns, asking complex questions, and generating SQL on the flyβ€”but they often lack contextual awareness of data sensitivity and organizational policy.

AIDataRoleGuard steps in as the safeguard between generated SQL and your database, rewriting queries to enforce access rules and protect sensitive information. No matter how the SQL is writtenβ€”by humans or AIβ€”the library ensures it only retrieves what the user's role allows.

🧠 AI is great at asking questions. AIDataRoleGuard makes sure it never asks for answers it shouldn't see.


πŸ—οΈ Architecture Overview

AIDataRoleGuard acts as an intelligent middleware layer between your application and database, intercepting and rewriting SQL queries based on role-based policies:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   AI Agent      β”‚    β”‚  Application    β”‚    β”‚ AIDataRoleGuard β”‚    β”‚    Database     β”‚
β”‚   Generated     │───▢│     Layer       │───▢│   SQL Rewriter  │───▢│   (MySQL/PG/   β”‚
β”‚     SQL         β”‚    β”‚                 β”‚    β”‚                 β”‚    β”‚   SQLite/etc)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                       β”‚
                                                       β–Ό
                                               β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                               β”‚  Role Config    β”‚
                                               β”‚  & Audit Log    β”‚
                                               β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

How It Works:

  1. Query Interception: Captures SQL before it reaches the database
  2. Role Resolution: Identifies the current user's role and permissions
  3. Policy Application: Applies masking, filtering, and access rules
  4. Query Rewriting: Transforms the original SQL to enforce policies
  5. Audit Logging: Records all transformations for compliance and monitoring

πŸš€ Key Features

  • AI-Powered SQL Rewriting: Dynamically transforms SELECT, INSERT, UPDATE, and DELETE based on role config.
  • Policy-Driven Architecture: Use clean JSON/YAML to declare access, masking, and mutations.
  • Stack-Agnostic Design: Plug it into any backend that emits SQLβ€”Node.js, Python, Java, Go.
  • Field-Level Masking: Replace sensitive columns with safe placeholders like '***'.
  • Audit Logging: Track every rewrite for compliance, debugging, and behavioral analytics.

🎯 Use Cases

πŸ€– AI-Powered Analytics Platforms

  • RAG Systems: Protect customer PII when AI agents query knowledge bases
  • Business Intelligence: Ensure AI-generated reports respect data governance
  • Natural Language Query: Shield sensitive data in conversational analytics tools
  • Automated Insights: Control what data AI can access for pattern discovery

πŸ”§ Enterprise Applications

  • Multi-Tenant SaaS: Automatically enforce tenant isolation in shared databases
  • Role-Based Dashboards: Dynamically filter data based on user permissions
  • API Gateways: Add data governance to microservices without code changes
  • Data Exploration Tools: Allow analysts to explore data within security boundaries

🏒 Compliance & Governance

  • GDPR Compliance: Automatically mask personal data for non-privileged users
  • SOX Controls: Ensure financial data access follows strict authorization rules
  • Healthcare (HIPAA): Protect patient information in medical record systems
  • Financial Services: Maintain data privacy in banking and trading applications

πŸ› οΈ Development & Testing

  • Data Anonymization: Automatically sanitize production data for development environments
  • A/B Testing: Control feature access based on user segments and roles
  • Staging Environments: Ensure test environments don't expose sensitive production data

πŸ“œ Role Config Example

roles:
  - name: ai-agent
    permissions:
      - resource: customer_data
        actions: [select]
        mask: ["email", "creditCard", "ssn"]
  - name: admin
    permissions:
      - resource: customer_data
        actions: [select, insert, update, delete]
        mask: []

πŸ” Query Rewrite Example

Input:

SELECT customerName, email FROM customer_data;

Output for ai-agent role:

SELECT customerName, '***' AS email FROM customer_data;

πŸ›οΈ Implementation Architecture

Core Components

1. SQL Processor & Rewriter

β”œβ”€β”€ SQLProcessor
β”‚   β”œβ”€β”€ processQuery(query: string, permissions: Permission[]) β†’ string
β”‚   β”œβ”€β”€ extractTableNames(query: string) β†’ string[]
β”‚   β”œβ”€β”€ extractColumns(query: string) β†’ string[]
β”‚   β”œβ”€β”€ maskColumns(sql: string, maskingRules: MaskingRule[]) β†’ string
β”‚   β”œβ”€β”€ addWhereFilters(sql: string, filters: string[]) β†’ string
β”‚   β”œβ”€β”€ injectRowLevelSecurity(sql: string, userContext: UserContext) β†’ string
β”‚   β”œβ”€β”€ validateBasicSyntax(query: string) β†’ boolean
β”‚   └── isSelectQuery(query: string) β†’ boolean
  • All-in-one SQL processing - extraction, validation, and rewriting
  • Direct string manipulation using regex patterns
  • Fast and simple - perfect for AI-generated queries
  • Focus on SELECT statements - primary use case for AI agents

2. Role & Permission Engine

β”œβ”€β”€ RoleEngine
β”‚   β”œβ”€β”€ resolveRole(context: UserContext) β†’ Role
β”‚   β”œβ”€β”€ getPermissions(role: Role, resource: string) β†’ Permission[]
β”‚   β”œβ”€β”€ canAccess(role: Role, action: string, resource: string) β†’ boolean
β”‚   └── getMaskingRules(role: Role, resource: string) β†’ MaskingRule[]
  • Manages role resolution from user context
  • Enforces permission checks
  • Provides masking rules for sensitive fields

3. Policy Configuration Manager

β”œβ”€β”€ PolicyManager
β”‚   β”œβ”€β”€ loadPolicies(source: string) β†’ Policy[]
β”‚   β”œβ”€β”€ validatePolicy(policy: Policy) β†’ ValidationResult
β”‚   β”œβ”€β”€ refreshPolicies() β†’ void
β”‚   └── getPolicyForResource(resource: string) β†’ Policy
  • Loads role configurations from YAML/JSON
  • Validates policy syntax and logic
  • Supports hot-reloading of policies

4. Audit & Logging System

β”œβ”€β”€ AuditLogger
β”‚   β”œβ”€β”€ logQueryRewrite(original: string, rewritten: string, context: AuditContext) β†’ void
β”‚   β”œβ”€β”€ logAccessAttempt(user: string, resource: string, action: string) β†’ void
β”‚   β”œβ”€β”€ logPolicyViolation(violation: PolicyViolation) β†’ void
β”‚   └── generateAuditReport(timeRange: TimeRange) β†’ AuditReport
  • Records all query transformations
  • Tracks access attempts and violations
  • Generates compliance reports

System Architecture Layers

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        Application Layer                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   AI Agents     β”‚  β”‚   Web Apps      β”‚  β”‚   APIs          β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    AIDataRoleGuard Core                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ SQL Processor   β”‚  β”‚  Role Engine    β”‚  β”‚ Policy Manager  β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Audit Logger    β”‚                       β”‚ Cache Layer     β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                                 β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                       Database Layer                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚     MySQL       β”‚  β”‚   PostgreSQL    β”‚  β”‚    SQLite       β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Models

Role Configuration Schema

# Enhanced role configuration with more granular controls
roles:
  - name: ai-agent
    permissions:
      - resource: customer_data
        actions: [select]
        conditions:
          - field: created_date
            operator: ">="
            value: "2024-01-01"
        mask:
          - field: email
            type: partial
            pattern: "***@domain.com"
          - field: ssn
            type: full
            replacement: "***-**-****"
          - field: creditCard
            type: tokenize
            algorithm: "sha256"
        row_filters:
          - "status = 'active'"
          - "tenant_id = ${user.tenant_id}"
      - resource: analytics_data
        actions: [select, insert]
        aggregate_only: true  # Only allow aggregate functions
        
  - name: data-analyst
    permissions:
      - resource: customer_data
        actions: [select]
        mask:
          - field: ssn
            type: full
            replacement: "***-**-****"
        time_restrictions:
          - days: ["monday", "tuesday", "wednesday", "thursday", "friday"]
          - hours: ["09:00", "17:00"]

Implementation Strategy

  1. Phase 1: Core String Processor (MVP)

    • Regex-based SQL parsing for SELECT statements
    • Column masking with simple replacements
    • Basic WHERE clause injection
    • Target: 90% of AI-generated queries
  2. Phase 2: Enhanced String Processing

    • Support for INSERT/UPDATE/DELETE operations
    • Advanced masking patterns (partial, tokenization)
    • Multi-table query support
    • Target: Handle edge cases and expand coverage
  3. Phase 3: Performance & Scale

    • Query result caching
    • Policy hot-reloading
    • Audit logging optimization
    • Target: Production-ready performance
  4. Phase 4: AI-Enhanced Features

    • Query pattern analysis
    • Automated policy recommendations
    • Anomaly detection
    • Target: Intelligent governance

Core Implementation Example

class AIDataRoleGuard {
  constructor(
    private sqlProcessor: SQLProcessor,
    private roleEngine: RoleEngine,
    private auditLogger: AuditLogger
  ) {}
  
  rewriteQuery(sql: string, userRole: string): string {
    // 1. Get permissions for user role
    const permissions = this.roleEngine.getPermissions(userRole);
    
    // 2. Process and rewrite query in one step
    const rewrittenSQL = this.sqlProcessor.processQuery(sql, permissions);
    
    // 3. Log the transformation
    this.auditLogger.log(sql, rewrittenSQL, userRole);
    
    return rewrittenSQL;
  }
}

class SQLProcessor {
  processQuery(sql: string, permissions: Permission[]): string {
    // 1. Extract table and column info
    const tables = this.extractTableNames(sql);
    const columns = this.extractColumns(sql);
    
    // 2. Apply masking rules
    let rewrittenSQL = this.maskColumns(sql, permissions.getMaskingRules());
    
    // 3. Add row-level filters
    rewrittenSQL = this.addWhereFilters(rewrittenSQL, permissions.getRowFilters());
    
    return rewrittenSQL;
  }
  
  private maskColumns(sql: string, maskingRules: MaskingRule[]): string {
    let maskedSQL = sql;
    
    for (const rule of maskingRules) {
      const pattern = new RegExp(`\\b${rule.column}\\b`, 'gi');
      maskedSQL = maskedSQL.replace(pattern, `'${rule.replacement}' AS ${rule.column}`);
    }
    
    return maskedSQL;
  }
  
  private addWhereFilters(sql: string, filters: string[]): string {
    if (filters.length === 0) return sql;
    
    // Simple approach: inject WHERE clause
    if (sql.toLowerCase().includes('where')) {
      return sql.replace(/where/i, `WHERE ${filters.join(' AND ')} AND`);
    } else {
      return sql.replace(/from\s+\w+/i, `$& WHERE ${filters.join(' AND ')}`);
    }
  }
}

Why This Approach Works for AI Queries

  1. AI queries are predictable: Most AI systems generate clean, simple SELECT statements
  2. Fast processing: String manipulation is much faster than AST parsing
  3. Easy to debug: Regex patterns are straightforward to understand and modify
  4. Sufficient coverage: Handles 90%+ of real-world AI-generated queries
  5. Simple testing: Easy to write unit tests for string transformations

Technology Stack Recommendations

  • Core Logic: TypeScript/Node.js for cross-platform compatibility
  • String Processing: Native regex with well-tested patterns
  • Configuration: YAML/JSON with simple validation
  • Caching: In-memory LRU cache for policies (Redis optional)
  • Logging: Simple structured logging (JSON format)
  • Testing: Jest with comprehensive regex pattern tests

Key Dependencies (Minimal):

  • js-yaml - YAML configuration parsing
  • joi or zod - Configuration validation
  • winston - Structured logging
  • No SQL parsing libraries needed!

About

AI-Powered SQL Governance & Query Rewriting Library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published