Skip to content
View Lona44's full-sized avatar

Block or report Lona44

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 250 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Lona44/README.md
Typing Animation

LinkedIn Website Location


πŸ”¬ About Me

Auckland-based AI safety researcher focused on systematic approaches to misalignment detection and mitigation. 2nd place winner of the prestigious Palisade Research AI Misalignment Bounty, demonstrating reproducible misalignment behaviors in advanced AI models including o3 and GPT-5.

Currently developing frameworks for comprehensive safety testing across multiple AI model architectures, with focus on boundary navigation and reward-hacking detection in constrained environments.

Current Research

Unified AI Misalignment Framework Comprehensive system for systematic AI safety testing across multiple model implementations and reasoning paradigms. Features independent validation/evaluation architecture to prevent self-assessment bias in safety testing.

LLM RAG Prompt Injections Security research examining prompt injection vulnerabilities in retrieval-augmented generation systems.

Technical Focus

  • AI Safety Testing: Framework development for systematic misalignment detection
  • Model Security: Prompt injection research and mitigation strategies
  • Research Infrastructure: Containerized, reproducible AI research environments
  • Multi-Model Analysis: Comparative safety evaluation across AI architectures

Technical Stack

Languages: Python, JavaScript, Bash AI/ML: OpenAI API, Anthropic API, LiteLLM Web Development: Full-stack web applications, responsive design Infrastructure: Docker, Docker Compose Research Tools: Systematic evaluation frameworks, automated testing pipelines Professional Development: Mission Ready HQ Full Stack Developer (August 2025)

Research Context

Contributing to AI safety research through Approxiom Research, focusing on boundary navigation behaviors and architectural vulnerabilities in AI safety systems. Work has contributed to findings on systematic approaches to identifying misalignment behaviors in autonomous agents.

πŸ† Research Achievements

Palisade Research Misalignment Bounty - 2nd Place Winner

  • Demonstrated reproducible misalignment behaviors in o3 and GPT-5 models
  • Identified AI agents' ability to overcome permission constraints and perform reward-hacking
  • Developed systematic methodology for testing boundary navigation in constrained environments
  • Contributed to understanding of architectural vulnerabilities in advanced AI systems

Recent Technical Contributions

  • Built multi-provider AI testing framework - Architected Docker-based system supporting OpenAI, Anthropic APIs with independent validation architecture, eliminating self-evaluation bias across 3+ model implementations
  • Developed automated code analysis suite - Created Python tools for technical debt assessment, duplicate code detection, and complexity analysis, reducing manual code review time and improving codebase maintainability
  • Implemented secure API routing system - Designed environment-configurable model selection preventing security vulnerabilities, with containerized deployment supporting multiple AI providers and failover mechanisms

πŸ›  Technical Expertise

Award Winner Full Stack Python AI Safety Docker


🀝 Let's Connect

Research Website Full Resume Email

Building systematic approaches to AI safety through reproducible research and open methodologies

Pinned Loading

  1. LLM-RAG-Prompt-Injections LLM-RAG-Prompt-Injections Public

    Python

  2. MissionReadyHQ-Missions MissionReadyHQ-Missions Public

    HTML

  3. RWResearch RWResearch Public

    Purely educational research

    Python

  4. unified-ai-misalignment-framework unified-ai-misalignment-framework Public

    Single framework for testing AI misalignment across OpenAI and Anthropic models. Automatic routing between reasoning/non-reasoning APIs, standardised outputs, shared scenarios. Supports GPT-5, o3, …

    Python