Lona Lona44

🔬 About Me

Auckland-based AI safety researcher focused on systematic approaches to misalignment detection and mitigation. 2nd place winner of the prestigious Palisade Research AI Misalignment Bounty, demonstrating reproducible misalignment behaviors in advanced AI models including o3 and GPT-5.

Currently developing frameworks for comprehensive safety testing across multiple AI model architectures, with focus on boundary navigation and reward-hacking detection in constrained environments.

Current Research

Unified AI Misalignment Framework Comprehensive system for systematic AI safety testing across multiple model implementations and reasoning paradigms. Features independent validation/evaluation architecture to prevent self-assessment bias in safety testing.

LLM RAG Prompt Injections Security research examining prompt injection vulnerabilities in retrieval-augmented generation systems.

Technical Focus

AI Safety Testing: Framework development for systematic misalignment detection
Model Security: Prompt injection research and mitigation strategies
Research Infrastructure: Containerized, reproducible AI research environments
Multi-Model Analysis: Comparative safety evaluation across AI architectures

Technical Stack

Languages: Python, JavaScript, Bash AI/ML: OpenAI API, Anthropic API, LiteLLM Web Development: Full-stack web applications, responsive design Infrastructure: Docker, Docker Compose Research Tools: Systematic evaluation frameworks, automated testing pipelines Professional Development: Mission Ready HQ Full Stack Developer (August 2025)

Research Context

Contributing to AI safety research through Approxiom Research, focusing on boundary navigation behaviors and architectural vulnerabilities in AI safety systems. Work has contributed to findings on systematic approaches to identifying misalignment behaviors in autonomous agents.

🏆 Research Achievements

Palisade Research Misalignment Bounty - 2nd Place Winner

Demonstrated reproducible misalignment behaviors in o3 and GPT-5 models
Identified AI agents' ability to overcome permission constraints and perform reward-hacking
Developed systematic methodology for testing boundary navigation in constrained environments
Contributed to understanding of architectural vulnerabilities in advanced AI systems

Recent Technical Contributions

Built multi-provider AI testing framework - Architected Docker-based system supporting OpenAI, Anthropic APIs with independent validation architecture, eliminating self-evaluation bias across 3+ model implementations
Developed automated code analysis suite - Created Python tools for technical debt assessment, duplicate code detection, and complexity analysis, reducing manual code review time and improving codebase maintainability
Implemented secure API routing system - Designed environment-configurable model selection preventing security vulnerabilities, with containerized deployment supporting multiple AI providers and failover mechanisms

🛠 Technical Expertise

🤝 Let's Connect

Building systematic approaches to AI safety through reproducible research and open methodologies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly