LLM-MusicTheory

A production-ready toolkit for designing and testing music theory prompts for large language models (LLMs). Features a modular architecture for composing reusable prompt components.

August 2025 migration: primary dataset now fux-counterpoint with unified --file/--files identifiers (stems of encoded filenames). Legacy --question/--questions flags still accepted (hidden) for backward compatibility; treat them as aliases of --file/--files.

Quick environment bootstrap:

# Install Poetry (if not already)
curl -sSL https://install.python-poetry.org | python3 -
export PATH="$HOME/.local/bin:$PATH"
# Use in-project virtualenvs
poetry config virtualenvs.in-project true
# Install base dependencies
poetry install
# (Optionally) add providers: poetry install --with google --with anthropic
# Run tests
poetry run pytest -q

Python compatibility: tested on CPython 3.11–3.13 (any ^3.11 per pyproject).

📚 Documentation

For detailed information, see our comprehensive documentation:

📖 User Guide - Complete usage instructions and examples
🏗️ Architecture - System design and components
📚 API Reference - Detailed API documentation
⚙️ Development Guide - Setup and contribution guidelines
💡 Examples - Usage examples and tutorials
🔧 Scripts - Development and automation scriptsomated querying across multiple LLM providers. Includes comprehensive testing suite and support for various music encoding formats.
📌 Project Status - Current state and next steps

🎯 Built for researchers and developers working on AI music theory applications

✨ Key Features

🔧 Modular Prompt Architecture: Compose prompts from reusable, testable components
🤖 Multi-LLM Provider Support: ChatGPT, Claude, and Gemini APIs
🎵 Comprehensive Music Format Support: MEI, MusicXML, ABC notation, and Humdrum **kern
🧪 Production-Grade Testing: 84% test coverage with comprehensive mock API validation
📊 Context-Aware Prompts: Toggle between contextual and non-contextual prompt modes
💾 Built-in Data Management: Integrated support for RCM exam questions and encoded music
🛠️ Developer Experience: Poetry dependency management, proper Python packaging, comprehensive documentation

🚀 Features

🔧 Modular Architecture: Compose prompts from reusable components
🤖 Multi-LLM Support: ChatGPT, Claude, and Gemini integration
🎵 Music Format Support: MEI, MusicXML, ABC notation, and Humdrum
🧪 Comprehensive Testing: 47/56 tests passing with mock API validation
📊 Context Learning: Toggle between contextual and non-contextual prompts
💾 Data Management: Built-in support for RCM exam questions and encoded music
🛠️ Developer Friendly: Poetry-managed dependencies, proper packaging

📋 Table of Contents

Quick Start
Installation
Configuration
Usage
- Command Line Interface
- Python API
Architecture
Testing
Development
API Documentation
Troubleshooting
License

⚡ Quick Start

Get up and running in under 2 minutes:

# 1. Clone and install
git clone https://github.com/liampond/LLM-MusicTheory.git
cd LLM-MusicTheory
poetry install

# 2. Configure API keys
cp .env.example .env
# Edit .env with your API keys (see Configuration section)

# 3. Test installation
poetry run pytest tests/test_models.py -v

# 4. Run your first prompt (new flags)
poetry run run-single --model chatgpt --file Q1b --datatype mei --context --dataset fux-counterpoint

# (Legacy alias still works) --question Q1b

# 5. Run batch processing (supports provider names or specific model names)
poetry run run-batch --models chatgpt,claude --files Q1b Q1c --datatypes mei,abc --dataset fux-counterpoint
# Or use specific model names:
poetry run run-batch --models gpt-4o,claude-3-sonnet --files Q1b --datatypes mei --dataset fux-counterpoint

🎉 That's it! You're ready to start experimenting with music theory prompts.

🔧️ Installation

Prerequisites

Python 3.11+ (Download here)
Poetry for dependency management (Installation guide)

Step-by-Step Setup

1. Clone the Repository

git clone https://github.com/liampond/LLM-MusicTheory.git
cd LLM-MusicTheory

2. Install Poetry (if needed)

Using pipx (recommended):

pip install pipx
pipx install poetry

Using official installer:

curl -sSL https://install.python-poetry.org | python3 -

3. Install Dependencies

# Install all dependencies in a virtual environment
poetry install

# Verify installation
poetry run run-single --model chatgpt --file Q1b --datatype mei --context --dataset fux-counterpoint

4. Verify Setup

# Run a quick test to ensure everything works
poetry run pytest tests/test_path_utils.py -v

If you see tests passing, you're ready to go! 🎉

⚙️ Configurationnstall Poetry (if you don't have it)**

Or see [Poetry's official installation guide](https://python-poetry.org/docs/main/#installing-with-the-official-installer).

**Alternative installation via pipx (recommended):**
```bash
python3 -m pip install --user pipx
pipx install poetry

Add Poetry to your PATH:

export PATH="$HOME/.local/bin:$PATH"

You may need to add this line to your ~/.bashrc or ~/.zshrc file and restart your terminal.y

A modular toolkit for designing and testing music theory prompts for large language models (LLMs). Write modular prompt components, then use this tool to flexibly combine them and automate querying ChatGPT, Claude, and Gemini. Built for experimentation and evaluation on official Royal Conservatory of Music (RCM) exam questions.

Setup

Clone the repository

git clone https://github.com/liampond/LLM-MusicTheory.git
cd LLM-MusicTheory

Install Poetry (if you don’t have it)
```
curl -sSL https://install.python-poetry.org | python3 -
```
Or see Poetry’s official installation guide.
Install dependencies
```
poetry install
```
If you get a poetry: command not found error, make sure Poetry is in your PATH. You may need to restart your terminal or run:
```
export PATH="$HOME/.local/bin:$PATH"
```
Activate the Poetry environment
```
poetry shell
```
Check your Python version Results: Up-to-date tests passing (see STATUS.md for current counts).
```
python --version
```
Troubleshooting
- If you get errors about missing dependencies, try running poetry lock --no-update then poetry install again.
- If you have issues with conflicting Python versions, ensure your virtual environment uses the correct Python version:
```
poetry env use python3.11  # or your preferred version >=3.11
```

Environment Variables

API Keys Setup

You need to provide your own API keys for the LLM providers you want to use.

1. Copy the Environment Template

cp .env.example .env

2. Add Your API Keys

Edit .env and add your API keys:

# Add your actual API keys (one or more required)
OPENAI_API_KEY=sk-your-openai-key-here
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here  
GOOGLE_API_KEY=your-google-api-key-here

3. Get API Keys

Provider	Sign Up	Pricing	Free Tier
OpenAI	platform.openai.com	$0.002/1K tokens	$5 credit
Anthropic	console.anthropic.com	$0.003/1K tokens	$5 credit
Google	ai.google.dev	$0.001/1K tokens	1M tokens/day

💰 Cost Management: Use Google's generous free tier for development. Monitor usage in provider dashboards.

Model Configuration

Default models are optimized for cost and performance. Customize in src/llm_music_theory/config/settings.py:

# Current defaults (cost-effective)
OPENAI_MODEL = "gpt-4o-mini"      # $0.0002/1K tokens
ANTHROPIC_MODEL = "claude-3-haiku" # $0.0003/1K tokens  
GOOGLE_MODEL = "gemini-1.5-flash"  # Free tier available

🎯 Usage

Command Line Interface

The toolkit provides two CLI commands for different use cases:

Single Prompt Execution

Run one prompt at a time for testing and development:

# Basic usage
poetry run run-single --model chatgpt --file Q1b --datatype mei --context --dataset fux-counterpoint

# Advanced usage with all parameters
poetry run run-single \
  --model claude \
    --file Q1a \
  --datatype musicxml \
  --context \
  --temperature 0.7 \
  --max-tokens 1000 \
  --save

Batch Processing

Run multiple prompts automatically for experiments:

# Test multiple models on same prompt (provider names)
poetry run run-batch --models chatgpt,claude,gemini --questions Q1b --datatypes mei

# Or use specific model names with auto-detection
poetry run run-batch --models gpt-4o,claude-3-sonnet,gemini-1.5-pro --questions Q1b --datatypes mei

# Full experiment across all combinations
poetry run run-batch \
  --models chatgpt,claude \
  --questions Q1a,Q1b,Q2a \
  --datatypes mei,musicxml \
  --context \
  --temperature 0.0

Available Options

Option	Required	Description	Example Values
`--model(s)`	✅	LLM provider(s)	`chatgpt`, `claude`, `gemini`
`--question(s)`	✅	Question ID(s)	`Q1a`, `Q1b`, `Q2a`
`--datatype(s)`	✅	Music encoding(s)	`mei`, `musicxml`, `abc`, `humdrum`
`--context`	❌	Include context guides	flag (present = with context)
`--temperature`	❌	Sampling creativity	`0.0` to `2.0` (default: `0.0`)
`--max-tokens`	❌	Response length limit	`500`, `1000`, `2000`
`--save`	❌	Save responses to files	flag

Discovery Commands

Explore available data before running prompts:

# List available resources
poetry run run-single --list-questions    # Shows: Q1a, Q1b, Q2a, ...
poetry run run-single --list-datatypes    # Shows: mei, musicxml, abc, humdrum
poetry run run-single --list-guides       # Shows: harmonic_analysis, intervals, ...

# See everything at once
poetry run run-single --list-all

Python API

For programmatic usage and custom experiments:

from llm_music_theory.core.runner import PromptRunner
from llm_music_theory.core.dispatcher import get_llm

# Initialize LLM
llm = get_llm("chatgpt")

# Create and run prompt
runner = PromptRunner(
    model=llm,
    question_number="Q1b",
    datatype="mei", 
    context=True,
    temperature=0.0,
    save=True
)

response = runner.run()
print(f"LLM Response: {response}")

Advanced Python Usage

from llm_music_theory.prompts.prompt_builder import PromptBuilder
from llm_music_theory.models.base import PromptInput

# Custom prompt building
builder = PromptBuilder()
prompt_input = builder.build_prompt_input(
    question_number="Q1a",
    datatype="musicxml",
    context=True,
    temperature=0.5
)

# Direct LLM querying
llm = get_llm("claude")
response = llm.query(prompt_input)

🧪 Testing

Settings and Configuration

Settings and configurations can be changed in src/llm_music_theory/config/settings.py. Currently, the only settings that you can change are the models. Each has a different price, performance, and niche. You can find information about the models, pricing, and their string identifiers here:

Run a Single Prompt

You can run a single music theory prompt against any supported LLM using the run_single.py script. This script combines your modular prompt components, sends the query to the selected API, and prints the model’s response.

Example command (new syntax):

poetry run run-single --model gemini --file Q1b --datatype mei --context --dataset fux-counterpoint

Legacy still accepted (alias): --question Q1b.

Common Flags (updated)

--model (required): LLM provider: chatgpt, claude, gemini
--file (required): File ID (stem of encoded file, e.g. Q1b)
--datatype (required): Encoding format: mei, musicxml, abc, humdrum
--context: Include contextual guides
--dataset: Dataset folder inside --data-dir (default: fux-counterpoint)
--temperature: Sampling temperature (default: 0.0)
--max-tokens: Optional max tokens
--save: Persist response under outputs
--data-dir: Root data directory (default: ./data)
--outputs-dir: Output root (default: ./outputs)

Legacy aliases: --question maps to --file (hidden), --examdate retained for old RCM layout but ignored for new dataset.

Listing Available Resources

You can list available files (new) plus legacy questions, datatypes, or guides:

--list-files (preferred)
--list-questions (legacy alias → same as list-files)
--list-datatypes
--list-guides

Example:

poetry run run-single --list-files --dataset fux-counterpoint

🏗️ Architecture

Dataset Layout (new)

data/
    fux-counterpoint/
        encoded/
            mei/        # MEI files (Q1b.mei, ...)
            musicxml/
            abc/
            humdrum/
        prompts/
            base/       # base_<datatype>.txt templates
            prompt.md   # unified question text (replaces per-question files)
        guides/       # optional contextual guide .txt/.md files

Legacy RCM layout (now renamed RCM6, still minimally supported for tests) used: data/RCM6/encoded/<ExamDate>/<datatype>/<Q>.mei and per-question prompt files under prompts/questions/<context|no_context>/<datatype>/Qx.txt.

Project Structure

LLM-MusicTheory/
├── src/llm_music_theory/           # Main package
│   ├── cli/                        # Command-line interfaces
│   │   ├── run_single.py          # Single prompt execution
│   │   └── run_batch.py           # Batch processing
│   ├── config/                     # Configuration management
│   │   └── settings.py            # Environment and model settings
│   ├── core/                       # Core business logic
│   │   ├── dispatcher.py          # LLM provider selection
│   │   └── runner.py              # Prompt execution engine
│   ├── models/                     # LLM provider implementations
│   │   ├── base.py                # Abstract base classes
│   │   ├── chatgpt.py             # OpenAI ChatGPT
│   │   ├── claude.py              # Anthropic Claude
│   │   └── gemini.py              # Google Gemini
│   ├── prompts/                    # Prompt building system
│   │   └── prompt_builder.py      # Modular prompt composition
│   └── utils/                      # Utility functions
│       ├── logger.py              # Logging configuration
│       └── path_utils.py          # File and path utilities
├── data/RCM6/                      # Legacy data (read-only, formerly LLM-RCM)
│   ├── encoded/                    # Music files in various formats
│   ├── prompts/                    # Base prompt templates
│   ├── guides/                     # Context guides for prompts
│   └── questions/                  # Question templates
├── tests/                          # Comprehensive test suite
├── docs/                           # Additional documentation
├── examples/                       # Usage examples and tutorials
└── scripts/                       # Development and automation scripts

Design Principles

🧩 Modular Architecture: Each component has a single responsibility
🔌 Provider Abstraction: Easy to add new LLM providers
🧪 Testable Design: Comprehensive mocking for cost-free testing
📦 Clean Packaging: Standard Python project structure
⚙️ Configuration-Driven: Environment-based settings management

Data Flow

Input: User specifies model, question, datatype, and context
Discovery: System locates required files using path utilities
Composition: Prompt builder assembles modular components
Dispatch: Core dispatcher selects and initializes LLM provider
Execution: Runner sends prompt and handles response
Output: Response returned to user, optionally saved to file

🧪 Testing

Comprehensive test suite with 84% coverage and zero API costs during testing.

Quick Test Commands

# Run all tests (recommended)
poetry run pytest

# Run with verbose output
poetry run pytest -v

# Run specific test categories
poetry run pytest tests/test_models.py        # LLM provider tests
poetry run pytest tests/test_path_utils.py    # File handling tests  
poetry run pytest tests/test_runner.py        # Core logic tests
poetry run pytest tests/test_integration.py   # CLI integration tests

# Quick Make targets
make test                                     # All tests
make test-models                              # Just model tests
make test-fast                                # Skip slow tests

Test Categories

Test Suite	Purpose	Coverage
`test_models.py`	LLM provider implementations	Mock API validation
`test_path_utils.py`	File discovery and data loading	Path resolution, data integrity
`test_runner.py`	Core prompt execution logic	Prompt building, parameterization
`test_integration.py`	CLI command workflows	End-to-end argument processing
`test_comprehensive.py`	Real data validation	Legacy data compatibility

Testing Philosophy

🚫 No Real API Calls: All LLM interactions are mocked to avoid costs
📊 Comprehensive Coverage: Tests validate prompt construction, not LLM responses
🏃‍♂️ Fast Execution: Full test suite runs in <1 second
🔄 Continuous Integration: Tests run automatically on all changes

🛠 Development

Setting Up Development Environment

# Clone and setup
git clone https://github.com/liampond/LLM-MusicTheory.git
cd LLM-MusicTheory
poetry install

# Install development dependencies
poetry install --with dev

# Activate shell
poetry shell

# Run pre-commit checks
poetry run pytest
poetry run black --check src/
poetry run flake8 src/

Project Workflow

# 1. Make changes to source code
# 2. Run tests to ensure nothing breaks
poetry run pytest

# 3. Format code (if using black)
poetry run black src/

# 4. Test specific changes
poetry run pytest tests/test_your_change.py -v

# 5. Commit changes
git add -A
git commit -m "feat: describe your changes"

Adding New LLM Providers

Create new provider in src/llm_music_theory/models/your_provider.py:

from .base import LLMInterface, PromptInput

class YourProvider(LLMInterface):
    def query(self, input: PromptInput) -> str:
        # Implement your API integration
        pass

Register in src/llm_music_theory/core/dispatcher.py:

def get_llm(model_name: str) -> LLMInterface:
    if model_name == "your_provider":
        from ..models.your_provider import YourProvider
        return YourProvider()

Add tests in tests/test_models.py
Update documentation

Code Style

Formatting: Python Black (auto-formatting)
Imports: isort for import organization
Type Hints: Required for public APIs
Docstrings: Google style for functions and classes
Testing: Pytest with comprehensive mocking

📚 API Documentation

Core Classes

`PromptRunner`

Main class for executing prompts across LLM providers.

class PromptRunner:
    def __init__(self, model, question_number, datatype, context, **kwargs):
        """Initialize prompt runner with configuration."""
        
    def run(self) -> str:
        """Execute prompt and return LLM response."""

`LLMInterface`

Abstract base class for all LLM providers.

class LLMInterface(ABC):
    @abstractmethod
    def query(self, input: PromptInput) -> str:
        """Send prompt to LLM and return response."""

`PromptBuilder`

Modular prompt composition system.

class PromptBuilder:
    def build_prompt_input(self, question_number, datatype, context, **kwargs) -> PromptInput:
        """Build complete prompt from modular components."""

CLI Commands

run-single: Execute single prompt
run-batch: Execute multiple prompts in batch

For complete API documentation, see docs/ directory.

🔧 Troubleshooting

Common Issues

1. Import Error: No module named 'llm_music_theory'

# Solution: Ensure Poetry virtual environment is active
poetry shell
poetry install

2. API Key Not Found

# Solution: Check your .env file
cat .env
# Ensure no extra spaces around = sign
OPENAI_API_KEY=your-key-here

3. FileNotFoundError for data files

# Solution: Check data directory structure
ls -la data/RCM6/
# Should contain: encoded/, prompts/, guides/, questions/

4. Tests failing with "system prompt not found"

# This is expected - comprehensive tests are skipped when legacy data is incomplete
# Core functionality tests should pass:
poetry run pytest tests/test_models.py tests/test_runner.py -v

Getting Help

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: docs/ directory
Examples: examples/ directory

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Royal Conservatory of Music for official exam materials
OpenAI, Anthropic, and Google for LLM API access
Python Poetry for excellent dependency management
The open-source community for inspiring this project's architecture

🎵 Happy prompting! Build something amazing with music theory and AI.

Overall: 47/56 tests passing (84% success rate)

What Tests Validate

✅ No API Costs: All tests use mock responses
✅ Prompt Correctness: Validates proper prompt compilation
✅ Data Loading: Tests file discovery and loading
✅ Error Handling: Verifies graceful failure handling
✅ CLI Interface: Tests command-line tools without API calls
✅ Parameter Passing: Ensures settings are correctly transmitted
✅ Multi-Format Support: Tests all music encoding formats

Running Tests in CI

Tests automatically run on GitHub Actions for:

✅ Push to main branch
✅ Pull request creation
✅ Multiple Python versions (3.11, 3.12, 3.13)

� Documentation

For detailed information, see our comprehensive documentation:

📖 User Guide - Complete usage instructions and examples
🏗️ Architecture - System design and components
📚 API Reference - Detailed API documentation
⚙️ Development Guide - Setup and contribution guidelines

�👨‍💻 Development

Quick Start

# Clone and setup
git clone https://github.com/liampond/LLM-MusicTheory.git
cd LLM-MusicTheory
poetry install

# Run tests (no API calls made)
poetry run pytest

# Try an example
poetry run python -m llm_music_theory.cli.run_single --question Q1b --datatype mei --model ChatGPT

Testing

# Run all tests - comprehensive coverage, no API calls
poetry run pytest

# Run specific test categories (Make targets)
make test-models      # Model implementations
make test-runner      # Core functionality
make test-integration # CLI workflows

Results: 47/56 tests passing with comprehensive coverage of core functionality.

Project Structure

LLM-MusicTheory/
├── src/llm_music_theory/        # Main package code
│   ├── cli/                     # Command-line interfaces
│   ├── config/                  # Configuration and settings
│   ├── core/                    # Core logic (dispatcher, runner)
│   ├── models/                  # LLM model wrappers
│   ├── prompts/                 # Prompt building utilities  
│   └── utils/                   # Utility functions
├── data/RCM6/                  # Legacy evaluation data (formerly LLM-RCM)
│   ├── encoded/                # Music files (MEI, MusicXML, etc.)
│   ├── prompts/                # Prompt templates
│   └── guides/                 # Context guides
├── tests/                      # Comprehensive test suite
└── docs/                       # All documentation
    ├── user-guide.md           # Usage instructions
    ├── architecture.md         # System design
    ├── api-reference.md        # API documentation
    ├── development.md          # Development setup
    ├── examples.md             # Usage examples
    └── scripts.md              # Automation scripts

For detailed development setup, architecture details, and contribution guidelines, see Development Guide.

🤝 Contributing

We welcome contributions! For detailed guidelines, see our Development Guide.

Quick Contribution Checklist

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Royal Conservatory of Music (RCM) for exam question data
OpenAI, Anthropic, Google for LLM APIs
Music encoding communities for MEI, MusicXML, ABC, and Humdrum formats

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: [email protected]

Happy music theory prompting! 🎵🤖

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.github/workflows		.github/workflows
data		data
docs		docs
outputs/fux-counterpoint/Gemini		outputs/fux-counterpoint/Gemini
src/llm_music_theory		src/llm_music_theory
tests		tests
.coverage		.coverage
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

License

liampond/LLM-MusicTheory

Folders and files

Latest commit

History

Repository files navigation

LLM-MusicTheory

📚 Documentation

✨ Key Features

🚀 Features

📋 Table of Contents

⚡ Quick Start

🔧️ Installation

Prerequisites

Step-by-Step Setup

1. Clone the Repository

2. Install Poetry (if needed)

3. Install Dependencies

4. Verify Setup

⚙️ Configurationnstall Poetry (if you don't have it)**

Setup

Environment Variables

API Keys Setup

1. Copy the Environment Template

2. Add Your API Keys

3. Get API Keys

Model Configuration

🎯 Usage

Command Line Interface

Single Prompt Execution

Batch Processing

Available Options

Discovery Commands

Python API

Advanced Python Usage

🧪 Testing

Settings and Configuration

Run a Single Prompt

Common Flags (updated)

Listing Available Resources

🏗️ Architecture

Dataset Layout (new)

Project Structure

Design Principles

Data Flow

🧪 Testing

Quick Test Commands

Test Categories

Testing Philosophy

🛠 Development

Setting Up Development Environment

Project Workflow

Adding New LLM Providers

Code Style

📚 API Documentation

Core Classes

PromptRunner

LLMInterface

PromptBuilder

CLI Commands

🔧 Troubleshooting

Common Issues

Getting Help

📄 License

🙏 Acknowledgments

What Tests Validate

Running Tests in CI

� Documentation

�👨‍💻 Development

Quick Start

Testing

Project Structure

🤝 Contributing

Quick Contribution Checklist

📄 License

🙏 Acknowledgments

📞 Support

About

Resources

License

`PromptRunner`

`LLMInterface`

`PromptBuilder`

Packages