Closed
Description
Objective
Verify and debug Clustrix functionality with university-managed clusters that do not require API keys:
- SLURM cluster - Traditional HPC scheduler
- SSH cluster - Direct SSH-based execution
Context
Before investigating cloud provider issues, we need to establish that core cluster functionality works correctly with controlled university infrastructure.
Testing Approach
Local Jupyter Testing Required: University servers require VPN access, so testing must be done via local Jupyter server (not Colab notebooks).
SLURM Cluster Testing
Configuration Requirements
- Cluster hostname and SSH access
- SLURM partition and account information
- SSH key authentication setup
- Module loading requirements (if any)
Test Cases
- Basic job submission and execution
- Resource specification (cores, memory, time)
- Job status monitoring and result retrieval
- Parallel loop execution
- Error handling and cleanup
SSH Cluster Testing
Configuration Requirements
- Remote server hostname and SSH access
- Python environment setup on remote server
- Working directory permissions
- SSH key authentication
Test Cases
- Basic function execution over SSH
- File transfer and cleanup
- Environment replication
- Error handling and connection recovery
- Multiple concurrent jobs
Implementation Plan
- Setup Local Jupyter Environment: Configure local development environment with VPN access
- SLURM Configuration: Create and test university SLURM cluster configuration
- SSH Configuration: Create and test university SSH server configuration
- Systematic Testing: Execute comprehensive test suite for both cluster types
- Documentation: Document any configuration requirements or limitations discovered
Success Criteria
- SLURM cluster executes jobs successfully with proper resource allocation
- SSH cluster executes jobs successfully with proper environment setup
- Both configurations handle errors gracefully
- Performance is reasonable for university network latency
- Configuration examples are documented for other university users
Related Files
- clustrix/executor.py (core execution logic)
- clustrix/config.py (configuration management)
- clustrix/utils.py (job script generation)
- tests/test_executor.py (execution tests)
Priority
High - Establishes baseline functionality before cloud provider debugging